In [4]:
import pandas as pd
from random import randint, choices

# Chaining Methods in Pandas
The `pandas.Series` and `pandas.DataFrame` methods do not modify objects *in-place*.  While many of these methods do provide `in_place` kwargs to allow the object to be modified in place, this typically leads to code that is harder to read.  

This notebook demonstrates three different approaches to modifying (transforming) Pandas objects:

1. Re-assignment of the modified object
2. In-place modification of the object
3. Method chaining

It is usually best to chain methods--especially when more than one or two transformations are performed. 

## DataFrame 
This small dataframe will be used for the examples.

In [21]:
categories = list('ABCABC')
colors = choices(
    ['red', 'green', 'blue', 'orange', 'purple', 'yellow'],
    k=len(categories)
)
sizes = [randint(1,10) for _ in categories]

orig_df = pd.DataFrame(dict(
    category=categories,
    color=colors,
    size=sizes
))

df = orig_df.copy()

# Reassignment
Suppose we would like to perform the following modifications to the DataFrame `df`:

1. Upcase the columns
2. Upcase the values of the `color` column
3. Divide the `size` column by a maximum size and rename to pct_of_max

The following steps would accomplish this.  Note that we must reassign the result of any modification because methods do not operate in place by default. 

In [22]:
df = df.rename(columns=dict(zip(df.columns, df.columns.str.upper())))
df

Unnamed: 0,CATEGORY,COLOR,SIZE
0,A,orange,10
1,B,red,3
2,C,green,2
3,A,purple,1
4,B,orange,7
5,C,yellow,10


In [23]:
# Upcase the values of the COLOR column
# Use assign
df = df.assign(COLOR=df.COLOR.str.upper())
df

Unnamed: 0,CATEGORY,COLOR,SIZE
0,A,ORANGE,10
1,B,RED,3
2,C,GREEN,2
3,A,PURPLE,1
4,B,ORANGE,7
5,C,YELLOW,10


In [24]:
# Divide size values by 10 and set the column name
# to PCT_OF_MAX
df = df.assign(PCT_OF_MAX=df.SIZE / 10.0)
df = df.drop(columns=['SIZE'])
df

Unnamed: 0,CATEGORY,COLOR,PCT_OF_MAX
0,A,ORANGE,1.0
1,B,RED,0.3
2,C,GREEN,0.2
3,A,PURPLE,0.1
4,B,ORANGE,0.7
5,C,YELLOW,1.0


### All Reassignment Expressions Collected
Here are all the expressions we used in this section combined into a single cell. 

In [38]:
df = orig_df.copy()
df = df.rename(columns=dict(zip(df.columns, df.columns.str.upper())))
df = df.assign(COLOR=df.COLOR.str.upper())
df = df.assign(PCT_OF_MAX=df.SIZE / 10.0)
df = df.drop(columns=['SIZE'])
df

Unnamed: 0,CATEGORY,COLOR,PCT_OF_MAX
0,A,ORANGE,1.0
1,B,RED,0.3
2,C,GREEN,0.2
3,A,PURPLE,0.1
4,B,ORANGE,0.7
5,C,YELLOW,1.0


## Inplace Modifications
Recall that we need to use the original dataframe and make the following modifications:
- Upcase the columns
- Upcase the values of the color column
- Divide the size column by a maximum size and rename to pct_of_max

The following steps would accomplish this. Here, we do not need to reassign because we are using the `inplace=True` kwarg.

In [39]:
# Start over with the original df
df = orig_df.copy()
df

Unnamed: 0,category,color,size
0,A,orange,10
1,B,red,3
2,C,green,2
3,A,purple,1
4,B,orange,7
5,C,yellow,10


In [40]:
df.rename(columns=dict(zip(df.columns, df.columns.str.upper())), inplace=True)
df

Unnamed: 0,CATEGORY,COLOR,SIZE
0,A,orange,10
1,B,red,3
2,C,green,2
3,A,purple,1
4,B,orange,7
5,C,yellow,10


In [41]:
# Upcase the values of the COLOR column
# Use assign
df.assign(COLOR=df.COLOR.str.upper(), inplace=True)
df

Unnamed: 0,CATEGORY,COLOR,SIZE
0,A,orange,10
1,B,red,3
2,C,green,2
3,A,purple,1
4,B,orange,7
5,C,yellow,10


In [37]:
# Divide size values by 10 and set the column name
# to PCT_OF_MAX

# Note, assign() does not have an inplace option
df = df.assign(PCT_OF_MAX=df.SIZE / 10.0)
df.drop(columns=['SIZE'], inplace=True)
df

Unnamed: 0,CATEGORY,COLOR,PCT_OF_MAX
0,A,orange,1.0
1,B,red,0.3
2,C,green,0.2
3,A,purple,0.1
4,B,orange,0.7
5,C,yellow,1.0


### Inplace Expressions Combined


In [44]:
df = orig_df.copy()
df.rename(columns=dict(zip(df.columns, df.columns.str.upper())), inplace=True)
df = df.assign(COLOR=df.COLOR.str.upper())
df = df.assign(PCT_OF_MAX=df.SIZE / 10.0)
df.drop(columns=['SIZE'], inplace=True)
df

Unnamed: 0,CATEGORY,COLOR,PCT_OF_MAX
0,A,ORANGE,1.0
1,B,RED,0.3
2,C,GREEN,0.2
3,A,PURPLE,0.1
4,B,ORANGE,0.7
5,C,YELLOW,1.0


## Method Chaining
Recall that we need to use the original dataframe and make the following modifications:
- Upcase the columns
- Upcase the values of the color column
- Divide the size column by a maximum size and rename to pct_of_max

Pandas methods return copies of the object they operate on.  Therefore we can *chain* methods to combine multiple operations.  Recall also that Python ignores whitespace inside parentheses, so we can cleanly format our transformations within them.

In the case of method chaining, there are no intermediate expressions to include in a cell on their own, so this is a single cell solution.

In [49]:
# Note that the orig_df columns are referenced within chained methods
df = (
    orig_df.copy()
    .rename(columns=dict(zip(orig_df.columns, orig_df.columns.str.upper())))
    .assign(COLOR=orig_df.color.str.upper()) 
    .assign(PCT_OF_MAX=orig_df['size'] / 10.0)
    .drop(columns=['SIZE'])
)
df

Unnamed: 0,CATEGORY,COLOR,PCT_OF_MAX
0,A,ORANGE,1.0
1,B,RED,0.3
2,C,GREEN,0.2
3,A,PURPLE,0.1
4,B,ORANGE,0.7
5,C,YELLOW,1.0


**Note**: `orig_df['size']` is used to access the `size` column instead of `orig_df.size` because dataframes have a `size` attribute that cannot be overwritten by a column name.  

One might argue that the chained method approach is visually similar to the inplace modification approach. The big differences in the chained method approach are:

- improved readability
- a single assignment statement (no confusion about what is stored to `df`)

# Summary
The method chaining approach is strongly encouraged as it encourages us to keep our modifications/tranformations in one place within a module or notebook.  The inplace and reassignment methods both allow for modifications to be made in multiple places within our code which can lead to confusion for readers/reviewers and problems debugging. 

The method chaining approach leads to transformations that *look like recipes* and are readable *at the location where the assignment is made.*