# "Method chaining in python"
> "Example of how using method chaining in python can make code more readable and less error-prone"
- toc: true
- branch: master
- badges: true
- comments: true
- categories: [eda, ml, jupyter]
- image: images/python/method_chainin.png
- hide: false
- search_exclude: true


# Use case

This notebook is to demonstrate how method chaining can be used in python to make code more readable

Links to other resources:
* [tomaugspurger's blog on method-chaining.html](https://tomaugspurger.github.io/method-chaining.html)
* [towardsdatascience post on method chaining](https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chaining-in-pandas-15c2109e3c69)
* [R's dplyr code to python](https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07)
* [quora thread](https://www.quora.com/I-love-the-flexibility-of-pandas-dataframes-but-I-feel-like-they-can-make-code-harder-to-read-and-maintain-What-are-some-pandas-best-practices-that-address-this-issue)
* [tidyverse-style-pandas](https://stmorse.github.io/journal/tidyverse-style-pandas.html)


# Imports

In [4]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [5]:
# conventional way to import pandas
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

## The data

The example data comes from Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available at their [github page](https://github.com/CSSEGISandData/COVID-19).

This dataset is extensively used during the Corona outbreak to e.g. visualize the latest numbers of infected people as plots.


In [8]:
corona_data_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

In [9]:
## The classical notebook way

In [10]:
df = pd.read_csv(corona_data_url,index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,...,3/20/20,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20
Country/Region,Province/State,Lat,Long,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Afghanistan,,33.0,65.0,0,0,0,0,0,0,0,0,0,0,...,24,24,40,40,74,84,94,110,110,120
Albania,,41.1533,20.1683,0,0,0,0,0,0,0,0,0,0,...,70,76,89,104,123,146,174,186,197,212


In [11]:
# columns to lower case and renaming
df.columns.name = 'date'
df.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,date,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,...,3/20/20,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20
Country/Region,Province/State,Lat,Long,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Afghanistan,,33.0,65.0,0,0,0,0,0,0,0,0,0,0,...,24,24,40,40,74,84,94,110,110,120
Albania,,41.1533,20.1683,0,0,0,0,0,0,0,0,0,0,...,70,76,89,104,123,146,174,186,197,212


In [16]:
df['type'] = 'confirmed'
df.columns.name = 'date'


In [17]:
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )

KeyError: 'Level Lat not found'

In [None]:
base_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_{name}_global.csv'
df = pd.read_csv(url, 
                     index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df['type'] = name.lower()
df.columns.name = 'date'
    
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )
df.index = pd.to_datetime(df.index)
df.columns = ['country', 'state', 'type', 'cases']
    
# Move HK to country level
df.loc[df.state =='Hong Kong', 'country'] = 'Hong Kong'
df.loc[df.state =='Hong Kong', 'state'] = np.nan
    
# Aggregate large countries split by states
df = pd.concat([df, 
                    (df.loc[~df.state.isna()]
                     .groupby(['country', 'date', 'type'])
                     .sum()
                     .rename(index=lambda x: x+' (total)', level=0)
                     .reset_index(level=['country', 'type']))
    ])

# Chaining