# Pandas Documentation on DataFrame

In this notebook, you will work through the Pandas documentation on DataFrames.

## Imports

In [1]:
import numpy as np
import pandas as pd

## DataFrame

In this notebook, you are going to learn how to use `pandas.DataFrame` by typing the code from the Pandas documentation into this notebook.

* Go to the Pandas [DataFrame Documentation](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe).
* Type all of the code from that section of the documentation into this notebook and get it working.
* **To learn this API well, you must type the code rather than copy and pasting it**.
* Create a new cell in this section for each `In[]` prompt in the documentation.
* Ignore the cells in the **Grading** section below.
* No Markdown comments are needed.
* Skip the following sub-sections:
  - From structured or record array
  - Alternate Constructors
  - Assigning New Columns in Method Chains
  - Console display
  - DataFrame column attribute access and IPython completion

## From dict of Series or dicts

In [2]:
d = {'one' : pd.Series([1., 2., 3.], index=['a','b','c']),
     'two' : pd.Series([1., 2.,3., 4.], index=['a', 'b', 'c', 'd'])}

In [3]:
df = pd.DataFrame(d)

In [4]:
df

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [5]:
pd.DataFrame(d, index = ['d', 'b', 'a'])

Unnamed: 0,one,two
d,,4
b,2.0,2
a,1.0,1


In [6]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

Unnamed: 0,two,three
d,4,
b,2,
a,1,


In [7]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [8]:
df.columns

Index(['one', 'two'], dtype='object')

## From dict of ndarrays

In [9]:
d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}

In [10]:
pd.DataFrame(d)

Unnamed: 0,one,two
0,1,4
1,2,3
2,3,2
3,4,1


In [11]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1,4
b,2,3
c,3,2
d,4,1


## From list of dicts

In [12]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

In [13]:
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [14]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [15]:
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


## From dict of tuples

In [16]:
pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
              ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
              ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
              ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
              ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

Unnamed: 0_level_0,Unnamed: 1_level_0,a,a,a,b,b
Unnamed: 0_level_1,Unnamed: 1_level_1,a,b,c,a,b
A,B,4.0,1.0,5.0,8.0,10.0
A,C,3.0,2.0,6.0,7.0,
A,D,,,,,9.0


## Column selection, addition, deletion

In [17]:
df['one']

a     1
b     2
c     3
d   NaN
Name: one, dtype: float64

In [18]:
df['three'] = df['one'] * df['two']

In [19]:
df['flag'] = df['one'] > 2

In [20]:
df

Unnamed: 0,one,two,three,flag
a,1.0,1,1.0,False
b,2.0,2,4.0,False
c,3.0,3,9.0,True
d,,4,,False


In [21]:
del df['two']

In [22]:
three = df.pop('three')

In [23]:
df

Unnamed: 0,one,flag
a,1.0,False
b,2.0,False
c,3.0,True
d,,False


In [24]:
df['foo'] = 'bar'

In [25]:
df

Unnamed: 0,one,flag,foo
a,1.0,False,bar
b,2.0,False,bar
c,3.0,True,bar
d,,False,bar


In [26]:
df['one_trunc'] = df['one'][:2]

In [27]:
df

Unnamed: 0,one,flag,foo,one_trunc
a,1.0,False,bar,1.0
b,2.0,False,bar,2.0
c,3.0,True,bar,
d,,False,bar,


In [28]:
df.insert(1, 'bar', df['one'])

In [29]:
df

Unnamed: 0,one,bar,flag,foo,one_trunc
a,1.0,1.0,False,bar,1.0
b,2.0,2.0,False,bar,2.0
c,3.0,3.0,True,bar,
d,,,False,bar,


## Indexing / Selection

In [30]:
df.loc['b']

one              2
bar              2
flag         False
foo            bar
one_trunc        2
Name: b, dtype: object

In [31]:
df.iloc[2]

one             3
bar             3
flag         True
foo           bar
one_trunc     NaN
Name: c, dtype: object

## Data alignment and arithmetic

In [32]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

In [33]:
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

In [34]:
df + df2

Unnamed: 0,A,B,C,D
0,1.839123,1.246286,-0.393196,
1,0.202456,-0.992338,0.896705,
2,0.04462,-0.523588,0.507538,
3,0.123123,3.964527,-0.418759,
4,2.520953,0.270771,-0.411378,
5,-0.73672,-1.952422,0.677513,
6,-1.607109,-1.583712,-0.19153,
7,,,,
8,,,,
9,,,,


In [35]:
df - df.iloc[0]

Unnamed: 0,A,B,C,D
0,0.0,0.0,0.0,0.0
1,-1.24856,-1.832644,0.893507,-2.569188
2,-2.402031,-2.686742,-0.145093,-2.805839
3,-2.518585,0.297974,-0.506548,-1.588353
4,-0.675558,-1.589622,-0.038875,-2.034099
5,-1.92286,-2.749988,0.055508,-2.113599
6,-2.895713,-1.489143,-1.194578,-1.69251
7,-1.846082,-1.8664,-0.972399,-2.695091
8,-2.696603,-2.69914,0.340195,-0.145562
9,-2.7225,-3.27071,-0.845804,-3.197238


In [36]:
index = pd.date_range('1/1/2000', periods=8)

In [37]:
df = pd.DataFrame(np.random.randn(8,3), index=index, columns=list('ABC'))

In [38]:
df

Unnamed: 0,A,B,C
2000-01-01,0.658993,-0.138895,-0.033133
2000-01-02,-1.561755,-0.716665,0.311015
2000-01-03,-0.599065,-0.430632,0.300453
2000-01-04,-0.883402,0.80471,0.830939
2000-01-05,1.238324,-0.374542,-0.010781
2000-01-06,-0.266382,0.481335,-1.392291
2000-01-07,-0.591382,0.765984,-1.251365
2000-01-08,-0.075534,-0.369203,-0.645124


In [39]:
type(df['A'])

pandas.core.series.Series

In [40]:
df - df['A']

Unnamed: 0,2000-01-01 00:00:00,2000-01-02 00:00:00,2000-01-03 00:00:00,2000-01-04 00:00:00,2000-01-05 00:00:00,2000-01-06 00:00:00,2000-01-07 00:00:00,2000-01-08 00:00:00,A,B,C
2000-01-01,,,,,,,,,,,
2000-01-02,,,,,,,,,,,
2000-01-03,,,,,,,,,,,
2000-01-04,,,,,,,,,,,
2000-01-05,,,,,,,,,,,
2000-01-06,,,,,,,,,,,
2000-01-07,,,,,,,,,,,
2000-01-08,,,,,,,,,,,


In [41]:
df * 5 + 2

Unnamed: 0,A,B,C
2000-01-01,5.294965,1.305524,1.834337
2000-01-02,-5.808776,-1.583327,3.555077
2000-01-03,-0.995324,-0.153161,3.502267
2000-01-04,-2.417008,6.023548,6.154693
2000-01-05,8.19162,0.127288,1.946095
2000-01-06,0.66809,4.406675,-4.961456
2000-01-07,-0.95691,5.829922,-4.256823
2000-01-08,1.622332,0.153984,-1.225619


In [42]:
1/df

Unnamed: 0,A,B,C
2000-01-01,1.517467,-7.199675,-30.181669
2000-01-02,-0.640305,-1.395351,3.215275
2000-01-03,-1.669269,-2.322168,3.328304
2000-01-04,-1.131988,1.242684,1.203458
2000-01-05,0.807543,-2.669924,-92.756468
2000-01-06,-3.754007,2.077555,-0.718241
2000-01-07,-1.690954,1.30551,-0.799128
2000-01-08,-13.23915,-2.708535,-1.55009


In [43]:
df ** 4

Unnamed: 0,A,B,C
2000-01-01,0.188592,0.000372,1.205111e-06
2000-01-02,5.949108,0.263795,0.009356801
2000-01-03,0.128794,0.034389,0.008149074
2000-01-04,0.609022,0.419331,0.4767337
2000-01-05,2.351458,0.019679,1.3509e-08
2000-01-06,0.005035,0.053677,3.757685
2000-01-07,0.122313,0.344255,2.452084
2000-01-08,3.3e-05,0.018581,0.1732098


In [44]:
df1 = pd.DataFrame({'a': [1,0,1], 'b' : [0,1,1] }, dtype=bool)

In [45]:
df2 = pd.DataFrame({'a' : [0,1,1], 'b' : [1,1,0] }, dtype=bool)

In [46]:
df1 & df2

Unnamed: 0,a,b
0,False,False
1,False,True
2,True,False


In [47]:
df1 | df2

Unnamed: 0,a,b
0,True,True
1,True,True
2,True,True


In [48]:
df1 ^ df2

Unnamed: 0,a,b
0,True,True
1,True,False
2,False,True


In [49]:
-df1

Unnamed: 0,a,b
0,False,True
1,True,False
2,False,False


## Transposing

In [50]:
df[:5].T

Unnamed: 0,2000-01-01 00:00:00,2000-01-02 00:00:00,2000-01-03 00:00:00,2000-01-04 00:00:00,2000-01-05 00:00:00
A,0.658993,-1.561755,-0.599065,-0.883402,1.238324
B,-0.138895,-0.716665,-0.430632,0.80471,-0.374542
C,-0.033133,0.311015,0.300453,0.830939,-0.010781


## DataFrame interoperability with Numpy functions

In [51]:
np.exp(df)

Unnamed: 0,A,B,C
2000-01-01,1.932845,0.870319,0.96741
2000-01-02,0.209768,0.488378,1.36481
2000-01-03,0.549325,0.650098,1.350471
2000-01-04,0.413374,2.236047,2.295472
2000-01-05,3.449827,0.687604,0.989277
2000-01-06,0.766146,1.618233,0.248505
2000-01-07,0.553562,2.151111,0.286114
2000-01-08,0.927249,0.691285,0.524598


In [52]:
np.asarray(df)

array([[ 0.65899298, -0.13889515, -0.03313269],
       [-1.56175519, -0.71666542,  0.31101537],
       [-0.5990647 , -0.43063214,  0.30045336],
       [-0.88340162,  0.80470958,  0.83093865],
       [ 1.23832404, -0.37454247, -0.01078092],
       [-0.26638206,  0.48133502, -1.39229125],
       [-0.59138198,  0.76598443, -1.25136457],
       [-0.07553355, -0.36920329, -0.64512389]])

In [53]:
df.T.dot(df)

Unnamed: 0,A,B,C
A,5.972472,-0.442306,-0.275314
B,-0.442306,2.460911,-1.065477
C,-0.275314,-1.065477,4.799249


In [54]:
s1 = pd.Series(np.arange(5,10))

In [55]:
s1.dot(s1) 

255

* Skip the following sub-sections:
  - From structured or record array
  - Alternate Constructors
  - Assigning New Columns in Method Chains
  - Console display
  - DataFrame column attribute access and IPython completion

## Grading

YOUR ANSWER HERE