# About


In [None]:
import pandas as pd

In [2]:
from tqdm import tqdm_notebook
tqdm_notebook().pandas()

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  tqdm_notebook().pandas()


0it [00:00, ?it/s]

## Modin
* python module built to enhance pandas by making better use of your hardware
* Modin DataFrames don’t require any extra code and in most cases will speed up everything you do to DataFrames by 3x or more.
* Modin acts as more of a plugin than a library since it uses Pandas as a fallback and cannot be used on its own.
* The only line of code most people will need is `import modin.pandas as pd` replacing your normal `import pandas as pd`

## cross section of rows and column
* grab all levels/rows where Num=22
```
df.xs(22,level='Num)
```


## concat

* default axis = 0 (rowwise)
* Since we did not specify the axis on which we want the concatenation to occur on, by default, Pandas uses the row axis. It joined the rows together. 
* If we want the concatenation to occur along the column, we have to set axis=1:


## joining

* Joining is a more convenient method for combining the columns of two potentially differently-indexed DataFrames into a single DataFrame. 
* Joining is similar to merging but differs in that the join method occurs on the index key instead of the column.

In [7]:
left = pd.DataFrame({'A': ['A0','A1','A2'],
                    'B':['B0','B1','B2']},
                    index=['K0','K1','K2']
)

right = pd.DataFrame({'C': ['C0','C1','C2'],
                    'D':['D0','D1','D2']},
                    index=['K0','K2','K3']

                    )

In [8]:
# prioritizes the keys of left, doesnt keep keys only found in right
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C1,D1


In [9]:
# keeps both keys
left.join(right,how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C1,D1
K3,,,C2,D2


# 1. Dealing with index and axis
## Simple concatenation 

In [10]:
df1 = pd.DataFrame({
    'name': ['A', 'B', 'C', 'D'],
    'math': [60,89,82,70],
    'physics': [66,95,83,66],
    'chemistry': [61,91,77,70]
})
df2 = pd.DataFrame({
    'name': ['E', 'F', 'G', 'H'],
    'math': [66,95,83,66],
    'physics': [60,89,82,70],
    'chemistry': [90,81,78,90]
})


* The simplest concatenation with concat() is by passing a list of DataFrames, for example[df1, df2]. And by default, it is concatenating vertically along the axis 0 and preserving all existing indices.

In [11]:
pd.concat([df1,df2])

Unnamed: 0,name,math,physics,chemistry
0,A,60,66,61
1,B,89,95,91
2,C,82,83,77
3,D,70,66,70
0,E,66,60,90
1,F,95,89,81
2,G,83,82,78
3,H,66,70,90


* however, we can see the indexes are repeated/not unique
* we would want to ignore the indices

In [12]:
pd.concat([df1,df2],ignore_index=True)

Unnamed: 0,name,math,physics,chemistry
0,A,60,66,61
1,B,89,95,91
2,C,82,83,77
3,D,70,66,70
4,E,66,60,90
5,F,95,89,81
6,G,83,82,78
7,H,66,70,90


## Concatenate horizontally

In [13]:
pd.concat([df1, df2], axis=1)


Unnamed: 0,name,math,physics,chemistry,name.1,math.1,physics.1,chemistry.1
0,A,60,66,61,E,66,60,90
1,B,89,95,91,F,95,89,81
2,C,82,83,77,G,83,82,78
3,D,70,66,70,H,66,70,90


# 2. Avoiding duplicate indices

* If you’d like to verify that the indices in the result of pd.concat() do not overlap, you can set the argument verify_integrity=True. 
* With this set to True, it will raise an exception if there are duplicate indices.


In [14]:
try:
    pd.concat([df1,df2], verify_integrity=True)
except ValueError as e:
    print('ValueError', e)


ValueError Indexes have overlapping values: Int64Index([0, 1, 2, 3], dtype='int64')


# 3.  Adding a hierarchical index with `keys` and `names` options
* In this case, let’s add index Year 1 and Year 2 for df1 and df2 respectively. To do that, we can simply specify the keys argument.

In [15]:
res = pd.concat([df1, df2], keys=['Year 1','Year 2'])
res


Unnamed: 0,Unnamed: 1,name,math,physics,chemistry
Year 1,0,A,60,66,61
Year 1,1,B,89,95,91
Year 1,2,C,82,83,77
Year 1,3,D,70,66,70
Year 2,0,E,66,60,90
Year 2,1,F,95,89,81
Year 2,2,G,83,82,78
Year 2,3,H,66,70,90


## Specify a group of values

In [16]:
res.loc['Year 1']

Unnamed: 0,name,math,physics,chemistry
0,A,60,66,61
1,B,89,95,91
2,C,82,83,77
3,D,70,66,70


## Add names argument
* In addition, the argument names can be used to add names for the resulting hierarchical index. 
* For example: add name Class to the outermost index we just created.


In [17]:
pd.concat(
    [df1, df2], 
    keys=['Year 1', 'Year 2'],
    names=['Class', None],
)

Unnamed: 0_level_0,Unnamed: 1_level_0,name,math,physics,chemistry
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Year 1,0,A,60,66,61
Year 1,1,B,89,95,91
Year 1,2,C,82,83,77
Year 1,3,D,70,66,70
Year 2,0,E,66,60,90
Year 2,1,F,95,89,81
Year 2,2,G,83,82,78
Year 2,3,H,66,70,90


# 4. Columns matching and sorting
* The concat() function is able to concatenate DataFrames with the columns in a different order. 
* By default, the resulting DataFrame would have the same sorting as the first DataFrame.

In [18]:
pd.concat([df1, df2], sort=True)


Unnamed: 0,chemistry,math,name,physics
0,61,60,A,66
1,91,89,B,95
2,77,82,C,83
3,70,70,D,66
0,90,66,E,60
1,81,95,F,89
2,78,83,G,82
3,90,66,H,70


In [19]:
## if you prefer a custom sort
custom_sort = ['math', 'chemistry', 'physics', 'name']
res = pd.concat([df1, df2])
res[custom_sort]


Unnamed: 0,math,chemistry,physics,name
0,60,61,66,A
1,89,91,95,B
2,82,77,83,C
3,70,70,66,D
0,66,90,60,E
1,95,81,89,F
2,83,78,82,G
3,66,90,70,H
