# Two-dimensional data structure: DataFrame

In [1]:
import numpy as np
import pandas as pd



`DataFrame` is a two-dimensional data structure in `pandas`, which can be seen as one sheet of `excel` or a SQL table, or a dictionary saving `series`.  
`DataFrame(data, index, columns)`, data can receive many data types as below:  
- a dictionary which saves 1-dimensional data, list or `Series`  
- 2d list   
- structured array  
- a `Series`   
- another `DataFrame`  
`index`: the label of row   
`columns`: the label of column

Create from `Series` dictionary.

In [2]:
d = {
    'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
    'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
}

In [3]:
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


If specify the value of `Index`:

In [4]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

Unnamed: 0,two,three
d,4.0,
b,2.0,
a,1.0,


In [5]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [6]:
df.columns

Index(['one', 'two'], dtype='object')

Create from ndarray or list dictionary   
If the value of dictionary is ndarray or list, the length of all values must be the same.

In [7]:
d = {
    'one': [1, 2, 3, 4],
    'two': [4, 3, 2, 1],
}
pd.DataFrame(d)

Unnamed: 0,one,two
0,1,4
1,2,3
2,3,2
3,4,1


If given `index`, the length of it must equals to the length of list.

In [8]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1,4
b,2,3
c,3,2
d,4,1


Construct a `DataFrame` from a structured array.

In [9]:
data = np.zeros((2,), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data

array([(0, 0., b''), (0, 0., b'')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

In [10]:
data[:] = [(1, 2., 'Hello'), (2, 3., 'World')]
data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

In [11]:
pd.DataFrame(data)

Unnamed: 0,A,B,C
0,1,2.0,b'Hello'
1,2,3.0,b'World'


In [12]:
pd.DataFrame(data, index=['first', 'second'])

Unnamed: 0,A,B,C
first,1,2.0,b'Hello'
second,2,3.0,b'World'


In [13]:
pd.DataFrame(data, columns=['C', 'A', 'B'])

Unnamed: 0,C,A,B
0,b'Hello',1,2.0
1,b'World',2,3.0


Create from a list of dictionaries.

In [14]:
data2 = [
    {'a': 1, 'b': 2},
    {'a': 5, 'b': 10, 'c': 20},
]
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [15]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [16]:
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


Other approaches

In [18]:
pd.DataFrame(data)

Unnamed: 0,A,B,C
0,1,2.0,b'Hello'
1,2,3.0,b'World'


In [19]:
pd.DataFrame.from_records(data, index='C')

Unnamed: 0_level_0,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1
b'Hello',1,2.0
b'World',2,3.0


In [22]:
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
}
df = pd.DataFrame.from_dict(data)
df

Unnamed: 0,A,B,C
0,1,4,7
1,2,5,8
2,3,6,9


## Some operations

In [25]:
df["A"]

0    1
1    2
2    3
Name: A, dtype: int64

In [26]:
df['C'] = df['A'] * df['B']
df['flag'] = df['A'] > 2
df

Unnamed: 0,A,B,C,flag
0,1,4,4,False
1,2,5,10,False
2,3,6,18,True


In [27]:
del df['B']
C = df.pop('C')
df

Unnamed: 0,A,flag
0,1,False
1,2,False
2,3,True


In [28]:
C

0     4
1    10
2    18
Name: C, dtype: int64

In [29]:
df['foo'] = 'bar'
df

Unnamed: 0,A,flag,foo
0,1,False,bar
1,2,False,bar
2,3,True,bar


In [30]:
df['A_trunc'] = df['A'][:2]
df

Unnamed: 0,A,flag,foo,A_trunc
0,1,False,bar,1.0
1,2,False,bar,2.0
2,3,True,bar,


In [31]:
df.insert(1, 'bar', df['A'])
df

Unnamed: 0,A,bar,flag,foo,A_trunc
0,1,1,False,bar,1.0
1,2,2,False,bar,2.0
2,3,3,True,bar,


In [33]:
df.assign(test=df['A'] + df['bar'])

Unnamed: 0,A,bar,flag,foo,A_trunc,test
0,1,1,False,bar,1.0,2
1,2,2,False,bar,2.0,4
2,3,3,True,bar,,6


## Index and selection

Select `column`: df[col]   
Select `row` by label: df.loc[label]   
Select `row` by integer location: df.iloc[loc]   
Select `rows`: df[5:10]  
Select `rows` by boolean vector: df[bool_vec]