# Understanding the `axis` arugment in pandas 

## Sum and Mean of Data Frame
* To add all the columns in a data frame, use `df1.sum(x, axis=1)`
* The variable axis here is to indicate the direction of the operation.   
    * 0 represent rows, 1 represents columns

### Analogy 
* To sum with `axis=1` means you want to sum from left to right, i.e. jump from column to column 
* So, the `axis` argument tells pandas in which direction to *jump*
* If you want to add/get the mean of the rows, you need pandas to *jump* from row to row - so you set `axis=0`

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display

In [2]:
# Create data
df1 = pd.DataFrame({"a":[1.,3.,5.,2.],
                      "b":[4.,8.,3.,7.],
                      "c":[5.,45.,67.,34]})
display(df1)

Unnamed: 0,a,b,c
0,1.0,4.0,5.0
1,3.0,8.0,45.0
2,5.0,3.0,67.0
3,2.0,7.0,34.0


In [3]:
# Add the columns, so you jump from col to col while adding
df1['a+b+c'] = df1.sum(axis=1) # sum across columns
display(df1)

Unnamed: 0,a,b,c,a+b+c
0,1.0,4.0,5.0,10.0
1,3.0,8.0,45.0,56.0
2,5.0,3.0,67.0,75.0
3,2.0,7.0,34.0,43.0


In [4]:
# Add the rows, so you jump from row to row while adding
df1.loc['0+1+2+3', :] = df1.sum(axis=0)  # sum across rows 
display(df1)

Unnamed: 0,a,b,c,a+b+c
0,1.0,4.0,5.0,10.0
1,3.0,8.0,45.0,56.0
2,5.0,3.0,67.0,75.0
3,2.0,7.0,34.0,43.0
0+1+2+3,11.0,22.0,151.0,184.0


## Division of Data Frame
For the pandas methods `.div()` and `.mul()`, the `axis` argument means something else 

* Division and multiplication of DataFrames is done element-wise. 
* So, if you would like to divide a DataFrame with a series, you should tell pandas how to broadcast (i.e. expand the Series into a DataFrame by copying the Series multiple times to match the index. 
* Setting `df.div(Series, axis=0)` tells pandas to match the index of the Series to the *rows* of `df`. 
* Setting `df.div(Series, axis=1)` tells pandas to match the index of the Series to the *columns* of `df`.

### Analogy 
 * E.g. if you would like to divide `df1` with `series = df1['a+b+c']`, you would need the index of series to be matched with the index of df1 (i.e. the rows of `df1`). Therfore you would need `axis=0` 

In [5]:
# Element-wise division
df1.div(df1)

Unnamed: 0,a,b,c,a+b+c
0,1.0,1.0,1.0,1.0
1,1.0,1.0,1.0,1.0
2,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,1.0
0+1+2+3,1.0,1.0,1.0,1.0


In [6]:
# Dividing each column
series = df1['a+b+c']

print(series.index)
print(df1.index)
print("\nYou can match the index of series to the index of the rows of df1")
print("\nSo you set axis=0")
df1.div(series, axis=0)

Index([0, 1, 2, 3, '0+1+2+3'], dtype='object')
Index([0, 1, 2, 3, '0+1+2+3'], dtype='object')

You can match the index of series to the index of the rows of df1

So you set axis=0


Unnamed: 0,a,b,c,a+b+c
0,0.1,0.4,0.5,1.0
1,0.053571,0.142857,0.803571,1.0
2,0.066667,0.04,0.893333,1.0
3,0.046512,0.162791,0.790698,1.0
0+1+2+3,0.059783,0.119565,0.820652,1.0


In [7]:
# Dividing each row
series = df1.loc['0+1+2+3', :]
print(series.index)

print(df1.T.index)
print("\nYou can only match the index of series to the index of the columns of df1")
print("\nSo you set axis=1")

df1.div(series, axis=1)

Index(['a', 'b', 'c', 'a+b+c'], dtype='object')
Index(['a', 'b', 'c', 'a+b+c'], dtype='object')

You can only match the index of series to the index of the columns of df1

So you set axis=1


Unnamed: 0,a,b,c,a+b+c
0,0.090909,0.181818,0.033113,0.054348
1,0.272727,0.363636,0.298013,0.304348
2,0.454545,0.136364,0.443709,0.407609
3,0.181818,0.318182,0.225166,0.233696
0+1+2+3,1.0,1.0,1.0,1.0


## Also works with bfill, ffill and fillna 

In [8]:
# Create data
df2 = pd.DataFrame({"a":[np.nan,np.nan,5.,2.],
                      "b":[4.,8.,np.nan,7.],
                      "c":[5.,45.,67.,np.nan]})

In [9]:
# Fill across columns (i.e. jump from column ot column filling in the nan with the right most non-nan value)
display(df2)
df2.bfill(axis=1)

Unnamed: 0,a,b,c
0,,4.0,5.0
1,,8.0,45.0
2,5.0,,67.0
3,2.0,7.0,


Unnamed: 0,a,b,c
0,4.0,4.0,5.0
1,8.0,8.0,45.0
2,5.0,67.0,67.0
3,2.0,7.0,


In [10]:
# Fill across columns (i.e. jump from row  to row filling in the nan with the bottom most non-nan value)
display(df2)
df2.bfill(axis=0)

Unnamed: 0,a,b,c
0,,4.0,5.0
1,,8.0,45.0
2,5.0,,67.0
3,2.0,7.0,


Unnamed: 0,a,b,c
0,5.0,4.0,5.0
1,5.0,8.0,45.0
2,5.0,7.0,67.0
3,2.0,7.0,
