# Pandas miscellaneous

## Setup libraries

In [1]:
import pandas as pd
import numpy as np
import functools

Create data

https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#creating-example-data

Column operations

1. Add a column
2. Drop a column
3. Transform a column
    a. Into one column
    b. Into multiple columns
4. Combine multiple columns into one

Transform operation with split

https://pbpython.com/pandas_transform.html

Python data science notebooks

https://github.com/jakevdp/PythonDataScienceHandbook

Pandas aggregation, transformation, filtering

https://www.drawingfromdata.com/filter-transform-group-with-pandas

Pandas comparison with other software

* [SQL](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html?highlight=aggregation)
* [R](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_r.html?highlight=aggregation)

Modern pandas

https://tomaugspurger.github.io/modern-8-scaling.html

## Processing groups separately

Create dataframe

In [2]:
data = [['Jan', 'Alice', 3], ['Jan', 'Bob', 2], ['Jan', 'Carol', 3],
        ['Feb', 'Alice', 1], ['Feb', 'Bob', 3], ['Mar', 'Carol', 1],
        ['Feb', 'Alice', 1], ['Mar', 'Bob', 1], ['Mar', 'Carol', 1],
        ['Apr', 'Alice', 3], ['May', 'Bob', 2], ['Jun', 'Carol', 1],
        ['Sep', 'Alice', 4], ['Oct', 'Bob', 3], ['Oct', 'Carol', 3]]

df = pd.DataFrame(data, columns=['month', 'person', 'points'])
df.head()

Unnamed: 0,month,person,points
0,Jan,Alice,3
1,Jan,Bob,2
2,Jan,Carol,3
3,Feb,Alice,1
4,Feb,Bob,3


Create groups

In [3]:
grouped = df.groupby('person')
persons = []
for name, group in grouped:
    # remove months with few points
    few_points_filter = lambda df: df.points.sum() > 2
    df3 = group.groupby(['month', 'person']).filter(few_points_filter).copy()
    # add the total number of points
    df3['total'] = [df3.points.sum()] * df3.shape[0]
    # pivot data in preparation of merging
    df4 = df3.pivot(index='month', columns='person', values=['points', 'total']).reset_index()
    persons.append(df4)

Merge data

In [4]:
result = functools.reduce(lambda df1, df2: pd.merge(df1, df2, how='outer'), persons)
result

Unnamed: 0_level_0,month,points,total,points,total,points,total
person,Unnamed: 1_level_1,Alice,Alice,Bob,Bob,Carol,Carol
0,Apr,3.0,10.0,,,,
1,Jan,3.0,10.0,,,3.0,6.0
2,Sep,4.0,10.0,,,,
3,Feb,,,3.0,6.0,,
4,Oct,,,3.0,6.0,3.0,6.0
