# DataFrame
A dataframe consists of a named, ordered collection of columns. As such it is used with rectangular data. Each dataframe has a row and a column index. The following diagram describes the Series I created with the below Python code. 

  <img alt="" src="./images/dataframe.png">

In [4]:
import pandas as pd
import numpy as np

columns = {
    "Age" : [45,55,30,20,10],
    "Sex" : ["M", "M", "F", "M", "F"]
}

index = ["Bob", "Dave", "Anna", "John", "Sally"]

df = pd.DataFrame(columns, index)
df.index.name = "Name"
df.columns.name = "Attributes"

## Cheat Sheets
### DataFrame Constructor Options
| Task     | Example| Notes |
| -------- | ------- |------- 
|
| 2D Numpy Array |```pd.DataFrame(np.arange(16).reshape(4,4))```| Optionally pass row and/or column labels|
| Dictionay of Sequences (List, Tuple or Array) |```pd.DataFrame({"List" : [1,2], "Tuple" : (3,4), "Numpy" : np.array([5,6])})```| Each key becomes a column label. Optionally pass row label index|
| Dictionary Of Series |```pd.DataFrame({ "One" : pd.Series([1,2], ['a', 'b']), "Two" : pd.Series([1,2], ['a', 'c'])})```|If we dont pass an index the union of the series indices are used |
| Dictionary Of Dictionaries |```pd.DataFrame({ "One":{'a':1,'b':2},"Two":{'b':9,'c':12}})```|Keys in outer dict form column labels, keys in inner dictionaries are unioned to form row index|
| List of Dictionaries |```pd.DataFrame([{'One':1.0,"Two":5.0},{'One':2.0,"Two":10.0}])```| Each Dictionary forms a row|
| List of Sequences |```pd.DataFrame([[1,2,3],(4,5,6),np.array([7,8,9])])```| Each sequence forms a row. Optionally pass column and/or row labels|

### Indexing Columns
| Task     | Example| Notes |
| -------- | ------- |------- 
| Select Single Column By Label (1) |```df1['ColA']```| Selects the column with label 'ColA' as a Series object|
| Select Single Column By Label (2) |```df1.loc[:,'ColA']```| Selects the column with label 'ColA' as a Series object|
| Select Multi Columns By Label (1) |```df1[['ColA','ColC']]```| Selects a subset of columns as new DataFrame|
| Select Multi Columns By Label (2) |```df1.loc[:,['ColA','ColB']]```| Selects a subslice of columns as new DataFrame|
| Select Label Slice of Columns|```df1.loc[:,'ColB':'ColC']```| Selects a subset of columns as new DataFrame|
| Select Single Column By Index |```df1.iloc[:,0]```|Selects the column with index 0 as a Series object|
| Select Single Multi Columns By Index |```df1.iloc[:,[0,1]]```|Selects a subset of columns as new DataFrame|
| Select Index Slice of Columns|```df1.iloc[:,1:3]```| Selects a subset of columns as new DataFrame|


### Indexing Rows
| Task     | Example| Notes |
| -------- | ------- |------- 
| Select Single Row By Label |```df1.loc['RowTwo']```| Selects the row with label 'RowTwo' as a Series object. Index is the DataFrame's column index|
| Select Multi Rows By Label |```df1.loc[['RowTwo','RowFour']]```| Selects the row with given labels as a DataFrame object.|
| Select Label Slice Of Rows |```df1.loc[:'RowThree']```| Selects a sub-slice of rows.|
| Select rows by integer position |```df1.iloc[[2,3]]```| Selects a sub-set of rows.|

### Indexing Row and Columns
| Task     | Example| Notes |
| -------- | ------- |------- 
| Select Subslice of rows and columns |```df1.iloc[1:3,1:3]```| Selects subset of rows and columns|

### Accessing Single Cell As Scalar
| Task     | Example| Notes |
| -------- | ------- |------- 
| By label |```df1.at['RowTwo','ColB']```| Selects cell as scalar|
| By index |```df1.iat[1,1]```| Selects cell as scalar|

### Deleting Data
| Task     | Example| Notes |
| -------- | ------- |------- 
|Delete Column |```del df1["b"]```| |
|Drop Rows |```df1.drop(["Two", "Three"])```| |
|Drop Columns |```df1.drop(columns=["a", "d"])```| |


### Reindexing
| Task     | Example| Notes |
| -------- | ------- |------- 
|Rows |```df1.reindex(['Five','Four', 'Three', 'Two', "One"])```| |
|Columns |```df1.reindex(columns=['d','c','b'])```| |



## DataFrame Constructor Options
### Numpy 2D array
Notice we optionally pass in row and column labels

In [10]:
rowIndex = ["RowOne", "RowTwo", "RowThree", "RowFour"]
colIndex = ['ColA','ColB','ColC','ColD']
pd.DataFrame(np.arange(16).reshape(4,4), index=rowIndex, columns=colIndex)

Unnamed: 0,ColA,ColB,ColC,ColD
RowOne,0,1,2,3
RowTwo,4,5,6,7
RowThree,8,9,10,11
RowFour,12,13,14,15


### Dictionary of Sequences (Lists, arrays or tuples) 
Note the optional row index argument. We also pass in a column index to create a different column order from the order of the keys in the dictionary

In [14]:
data = {"List" : [1,2,3], "Tuple" : (4,5,6), "Numpy" : np.array([7,8,9])}

pd.DataFrame(data, index = ["RowOne", "RowTwo", "RowThree"], columns=["Numpy","Tuple"])

Unnamed: 0,Numpy,Tuple
RowOne,7,4
RowTwo,8,5
RowThree,9,6


### Dictionary of Series
Note in the absense of an index, the union of the indices from each series is used as an index. We also provide an optional columns argument to re-arrange the columns in a different order from the keys in the dictionary

In [17]:
seriesDict = { "One" : pd.Series([1,2], ['a', 'b']), "Two" : pd.Series([1,2], ['a', 'c'])}

pd.DataFrame(seriesDict, columns=["Two", "One"])

Unnamed: 0,Two,One
a,1.0,1.0
b,,2.0
c,2.0,


### Dictionary of Dictionaries
In this example we use an explicit row index to filter out some rows included in data. We also pass the options columns argument to re-order the columns

In [19]:
data = {
    "One" : {'a': 1, 'b' : 2},
    "Two" : {'b' : 9, 'c':12}
}

pd.DataFrame(data, index=['c','b'], columns=["Two", "One"])

Unnamed: 0,Two,One
c,12,
b,9,2.0


### List of Dictionaries
Each dictionary forms a row and the dictionary keys are the column labels. We optionally pass in an array of labels to form the row index. 

In [22]:
rows = [
    {'One' : 1.0, "Two" : 5.0},
    {'One' : 2.0, "Two" : 10.0},
]

pd.DataFrame(rows, index = ['a','b'])

Unnamed: 0,One,Two
a,1.0,5.0
b,2.0,10.0


### List of Sequences
Each sequence forms a row and can be a tuple, list or numpy array

In [26]:
pd.DataFrame([[1,2,3],(4,5,6),np.array([7,8,9])])

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [25]:
data = [
    [1,2,3],
    (4,5,6),
   np.array([7,8,9])
]

pd.DataFrame(data, index=['One','Two',"Three"], columns=['a','b', 'c'])

Unnamed: 0,a,b,c
One,1,2,3
Two,4,5,6
Three,7,8,9


## Indexing Columns
Indexing enables us to select subset of rows and or columns from a DataFrame. We consider the options in this sub section.
### Select Single Column By Label (1)
We can retrieve a single column from a DataFrame using dictionary like, square bracket notation. The column is retrieved as a Pandas Series object whose index is the same as the DataFrame index. The name of the series is the name of the column in the DataFrame. Setting values on the Series sets the values on the original DataFrame. 

In [27]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1['ColA']

RowOne       0
RowTwo       4
RowThree     8
RowFour     12
Name: ColA, dtype: int32

### Select Single Column By Label (2)
Here we use the loc method with two arguments. The first argument uses the universal slice notation ```:``` to return all rows and the second argument takes a single string specifiying the label of the column to return. The result is a Series object whose index is the same as the DataFrame index

In [199]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc[:,'ColA']

RowOne       0
RowTwo       4
RowThree     8
RowFour     12
Name: ColA, dtype: int32

### Select Multiple Columns (1)
We can pass a list of column labels inside the square brackets. The result is a DataFrame with the specified subset of columns

In [31]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1[['ColA','ColC']]

Unnamed: 0,ColA,ColC
RowOne,0,2
RowTwo,4,6
RowThree,8,10
RowFour,12,14


### Select Multiple Columns (2)
Here we use the loc method with two arguments. The first argument uses the universal slice notation ```:``` to return all rows and the second argument takes a single string specifiying the label of the column to return. The result is a DataFrame with a subset of the columns in the original

In [39]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc[:,['ColA','ColB']]

Unnamed: 0,ColA,ColB
RowOne,0,1
RowTwo,4,5
RowThree,8,9
RowFour,12,13


### Select Column Label Slice
Here we use the loc method with two arguments. The first argument uses the universal slice notation ```:``` to return all rows and the second argument takes slice argument to return a slice of columns. **Note:** Unlike with index based slicing, the stop location with a label based slice is inclusive of that value

In [43]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc[:,'ColB':'ColC']

Unnamed: 0,ColB,ColC
RowOne,1,2
RowTwo,5,6
RowThree,9,10
RowFour,13,14


### Select Single Column By Index

In [47]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iloc[:,0]

RowOne       0
RowTwo       4
RowThree     8
RowFour     12
Name: ColA, dtype: int32

### Select Multi Column By Index

In [51]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iloc[:,[0,1]]

Unnamed: 0,ColA,ColB
RowOne,0,1
RowTwo,4,5
RowThree,8,9
RowFour,12,13


### Select Column Slice By Index

In [52]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iloc[:,1:3]

Unnamed: 0,ColB,ColC
RowOne,1,2
RowTwo,5,6
RowThree,9,10
RowFour,13,14


## Indexing Rows

### Select Single Row By Label 

In [32]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc['RowTwo']

ColA    4
ColB    5
ColC    6
ColD    7
Name: RowTwo, dtype: int32

### Select Multiple Rows By Label (loc)

In [2]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc[['RowTwo','RowFour']]

Unnamed: 0,ColA,ColB,ColC,ColD
RowTwo,4,5,6,7
RowFour,12,13,14,15


### Select subset of rows by integer position (iloc)

In [206]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iloc[[2,3]]

Unnamed: 0,ColA,ColB,ColC,ColD
RowThree,8,9,10,11
RowFour,12,13,14,15


### Select Row Label Slice

In [4]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.loc[:'RowThree']

Unnamed: 0,ColA,ColB,ColC,ColD
RowOne,0,1,2,3
RowTwo,4,5,6,7
RowThree,8,9,10,11


## Indexing Rows And Columns
We can combine most of the columns and row indexing options from the previous two sections

### Select subslice of rows and columns by integer position (iloc) slice
**Note:** the difference between iloc and loc when using slice notation. With iloc the stop index is exclusing and with loc it is inclusive. 

In [6]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iloc[1:3,1:3]

Unnamed: 0,ColB,ColC
RowTwo,5,6
RowThree,9,10


## Accessing Single Cell As Scalar
### Select single scalar by column and row label (at)

In [215]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.at['RowTwo','ColB']

5

### Select single scalar by column and row integer index (iat)

In [217]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df1.iat[1,1]

5

## Reindexing
### Rows
Creates a new dataframe with the values re-arranged as per the new index. If the new index has values missing from the old index  index empty rows will be added. The following are all equivalent

 * ```df1.reindex(['Five','Four', 'Three', 'Two', "One"])```
 * ```df1.reindex(index=['Five','Four', 'Three', 'Two', "One"])```
 * ```df.reindex(['Five','Four', 'Three', 'Two', "One"], axis="index")```

In [105]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
df1

Unnamed: 0,a,b,c,d
One,0,1,2,3
Two,4,5,6,7
Three,8,9,10,11
Four,12,13,14,15


In [109]:
df1.reindex(['Five','Four', 'Three', 'Two', "One"])

Unnamed: 0,a,b,c,d
Five,,,,
Four,12.0,13.0,14.0,15.0
Three,8.0,9.0,10.0,11.0
Two,4.0,5.0,6.0,7.0
One,0.0,1.0,2.0,3.0


In [110]:
df1.reindex(index=['Five','Four', 'Three', 'Two', "One"])

Unnamed: 0,a,b,c,d
Five,,,,
Four,12.0,13.0,14.0,15.0
Three,8.0,9.0,10.0,11.0
Two,4.0,5.0,6.0,7.0
One,0.0,1.0,2.0,3.0


In [113]:
df1.reindex(['Five','Four', 'Three', 'Two', "One"], axis="index")

Unnamed: 0,a,b,c,d
Five,,,,
Four,12.0,13.0,14.0,15.0
Three,8.0,9.0,10.0,11.0
Two,4.0,5.0,6.0,7.0
One,0.0,1.0,2.0,3.0


### Columns
If the new index has a label missing from the original dataframe column list a new empty colum is added. If the new index is missing any columns from the original column index those columns are omitted from the result

In [114]:
df1.reindex(columns=['d','c','b'])

Unnamed: 0,d,c,b
One,3,2,1
Two,7,6,5
Three,11,10,9
Four,15,14,13


In [116]:
df1.reindex(['d','c','b','f'], axis="columns")

Unnamed: 0,d,c,b,f
One,3,2,1,
Two,7,6,5,
Three,11,10,9,
Four,15,14,13,


## Deletion
### Delete Column

In [11]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
del df1["b"]
df1

Unnamed: 0,a,c,d
One,0,2,3
Two,4,6,7
Three,8,10,11
Four,12,14,15


### Dropping Rows
Creates a new DataFrame with the specified values dropped. All three methods below are equivalent
 
  * ```df1.drop(["Two", "Three"])```
  * ```df1.drop(["Two", "Three"], axis="index")```
  * ```df1.drop(index=["Two", "Three"])```
  * ```df1.drop(["Two", "Three"], axis=0)```
 

In [121]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
df1.drop(["Two", "Three"])

Unnamed: 0,a,b,c,d
One,0,1,2,3
Four,12,13,14,15


In [7]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
df1.drop(["Two", "Three"], axis="index")

Unnamed: 0,a,b,c,d
One,0,1,2,3
Four,12,13,14,15


In [8]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
df1.drop(index=["Two", "Three"])

Unnamed: 0,a,b,c,d
One,0,1,2,3
Four,12,13,14,15


In [129]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=["One", "Two", "Three", "Four"], columns=['a','b','c','d'])
df1.drop(["Two", "Three"], axis=0)

Unnamed: 0,a,b,c,d
One,0,1,2,3
Four,12,13,14,15


### Dropping Columns
All three below methods are equivalent

 * ```df1.drop(columns=["Two", "Three"])```
 * ```df1.drop(["Two", "Three"], axis="columns")```
 * ```df1.drop(["Two", "Three"], axis=1)```

In [126]:
df1.drop(columns=["a", "d"])

Unnamed: 0,b,c
One,1,2
Two,5,6
Three,9,10
Four,13,14


In [127]:
df1.drop(["a", "d"], axis="columns")

Unnamed: 0,b,c
One,1,2
Two,5,6
Three,9,10
Four,13,14


In [128]:
df1.drop(["a", "d"], axis=1)

Unnamed: 0,b,c
One,1,2
Two,5,6
Three,9,10
Four,13,14


## Arithmetic 
If we add two DataFrames with different indices the resulting frame's index will be the Union of the source frame's indices. Any index labels that don't exist in both source frames will be NaN

In [218]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df2 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColC','ColD','ColE','ColF'] )

df1 + df2

Unnamed: 0,ColA,ColB,ColC,ColD,ColE,ColF
RowOne,,,2,4,,
RowTwo,,,10,12,,
RowThree,,,18,20,,
RowFour,,,26,28,,


### Fill Value


In [220]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColA','ColB','ColC','ColD'] )
df2 = pd.DataFrame(np.arange(16).reshape(4,4), index = ['RowOne','RowTwo','RowThree','RowFour'],columns=['ColC','ColD','ColE','ColF'] )

df1.add(df2, fill_value=0)

Unnamed: 0,ColA,ColB,ColC,ColD,ColE,ColF
RowOne,0.0,1.0,2,4,2.0,3.0
RowTwo,4.0,5.0,10,12,6.0,7.0
RowThree,8.0,9.0,18,20,10.0,11.0
RowFour,12.0,13.0,26,28,14.0,15.0


### Transpose - Swapping Rows and Columns
Note that in the example not all columns in the original data frame have the same type so column type will be lost in the transpose as the new columns have just Python object as the type. In this kind of situation, Transpose and Transpose back will lose the type information. 

In [40]:
df

Attributes,Age,Sex,Age2,John
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bob,12,M,21,22
Dave,21,M,21,22
Anna,12,F,21,22
John,23,M,21,22
Sally,21,F,21,22


In [39]:
df.T

Name,Bob,Dave,Anna,John,Sally
Attributes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Age,12,21,12,23,21
Sex,M,M,F,M,F
Age2,21,21,21,21,21
John,22,22,22,22,22


## Arithmetic With DataFrame and Series
By default the index values in the series are used to match column labels and the values are broadcast down the rows

In [20]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=list('1234'), columns=list('abcd'))
print(df1)

rowSeries = df1.loc['1']
print(rowSeries)

df1 - rowSeries

    a   b   c   d
1   0   1   2   3
2   4   5   6   7
3   8   9  10  11
4  12  13  14  15
a    0
b    1
c    2
d    3
Name: 1, dtype: int32


Unnamed: 0,a,b,c,d
1,0,0,0,0
2,4,4,4,4
3,8,8,8,8
4,12,12,12,12


If we want to match on row lables and broadcast across the rows we need to use the arithmetic methods

 * add, radd
 * sub, rsub
 * div, rdiv
 * floordiv, rfloordiv
 * mul, rmul
 * pow, rpow

In [24]:
df1 = pd.DataFrame(np.arange(16).reshape(4,4), index=list('1234'), columns=list('abcd'))
print(df1)

colSeries = df1['a']
print(colSeries)

df1.add(colSeries, axis="index")


    a   b   c   d
1   0   1   2   3
2   4   5   6   7
3   8   9  10  11
4  12  13  14  15
1     0
2     4
3     8
4    12
Name: a, dtype: int32


Unnamed: 0,a,b,c,d
1,0,1,2,3
2,8,9,10,11
3,16,17,18,19
4,24,25,26,27


### Element Wise Operations
#### Numpy Unary Funcs
All numpy unary universal ufuncs can be used element wise on a DataFrame

In [80]:
df1 = pd.DataFrame(np.arange(9).reshape(3,3), index=list('123'), columns=list('abc'))
print(df1)
np.square(df1)

   a  b  c
1  0  1  2
2  3  4  5
3  6  7  8


Unnamed: 0,a,b,c
1,0,1,4
2,9,16,25
3,36,49,64


#### ApplyMap
We can write our own element wise functions and use them in the **applymap** function. We could write our own square function as follows

In [82]:
square = lambda a : a ** 2
df1.apply(square )

Unnamed: 0,a,b,c
1,0,1,4
2,9,16,25
3,36,49,64


### One Dimensional Array Functions - apply()
We can write functions that operate on one dimensional arrays and **apply** them to a DataFrame. The axis argument can be used to specify whether the rows or columns form the one dimensional arrays. Our function sums the elements in the one dimensional array and then squares it. 

In [83]:
df1 = pd.DataFrame(np.arange(1,10).reshape(3,3), index=list('123'), columns=list('abc'))
print(df1)

sum_squared = lambda a : a.sum() ** 2

print(df1.apply(sum_squared ))

   a  b  c
1  1  2  3
2  4  5  6
3  7  8  9
a    144
b    225
c    324
dtype: int64


Now we do it on the rows

In [49]:
print(df1.apply(sum_squared, axis="columns" ))

1     36
2    225
3    576
dtype: int64


Apply can also create a series in stead of a scalar value

In [53]:
series_producer = lambda a : pd.Series([a.min(), a.max(), a.mean()], index=["min", "max", "mean"])
print(df1.apply(series_producer ))
print(df1.apply(series_producer, axis="columns" ))


        a    b    c
min   1.0  2.0  3.0
max   7.0  8.0  9.0
mean  4.0  5.0  6.0
   min  max  mean
1  1.0  3.0   2.0
2  4.0  6.0   5.0
3  7.0  9.0   8.0


In [56]:
square = lambda a : a ** 2
print(df1.apply(square ))

    a   b   c
1   1   4   9
2  16  25  36
3  49  64  81


### Reductions
A reduction is used to obtain a single value from a Series or a Series of values from the columns or rows in a DataFrame

#### sum

In [87]:
df1

Unnamed: 0,a,b,c
1,1,2,3
2,4,5,6
3,7,8,9


In [88]:
df1.sum()

a    12
b    15
c    18
dtype: int64

In [91]:
df1.sum(axis="columns")

1     6
2    15
3    24
dtype: int64

#### Mean
Notice that NaN entries are not treated as zero. They are exclused from the sum and the number of entries. 

In [77]:
df1.mean()

a    3.0
b    8.5
dtype: float64

### Accumulations

#### cumsum

In [95]:
df1.cumsum()

Unnamed: 0,a,b,c
1,1,2,3
2,5,7,9
3,12,15,18


In [96]:
df1.cumsum(axis="columns")

Unnamed: 0,a,b,c
1,1,3,6
2,4,9,15
3,7,15,24


#### cumprod

In [98]:
df1.cumprod()

Unnamed: 0,a,b,c
1,1,2,3
2,4,10,18
3,28,80,162
