This document is a Python exploration of this R-based document: http://m-clark.github.io/data-processing-and-visualization/pipes.html. Code is not optimized for anything but learning. In addition, all the content is located with the main document, not here, so many sections may not be included. I only focus on reproducing the code chunks.

# Pipes

I'm so accustomed to piping in R it'd be nice to carry it over into Python. Pandas has some (very recent) functionality, but it's not as straightforward.  Then again, the general .method approach in Python/pandas is useful in its own right. Another option is the dfply package, which is basically trying to duplicate R's dplyr, but it doesn't look like it's in very widespread  usage at this point.  I will just stick to the pandas approach.

In [1]:
import pandas as pd

As a reminder, here is what a pipe approach looks like for R

In [2]:
## ----pipes---------------------------------------------------------------
# R
# mydf %>% 
#   select(var1, var2) %>% 
#   filter(var1 == 'Yes') %>% 
#   summary()


With dataFrame methods we can use the '.'.

In [3]:
# (data
#  .groupby('var')
#  .sum() 
#  .mean()
#  .var2
# )

More recently, pandas has incorporated the `.pipe` method.  The following shows how this would work. Generally you need a function that expects a dataFrame and would return a dataFrame, or whatever object you intend to work with in your next pipe.

In [4]:
mydf = pd.read_csv('../data/cars.csv')

def select_col(data, col):
    df = data.loc[:,col]
    return df

def select_row(data, row):
    df = data.iloc[row]
    return df


(
    mydf
    .pipe(select_col, ['mpg', 'vs'])
    .pipe(select_row, [2,3,4])
)


Unnamed: 0,mpg,vs
2,22.8,1
3,21.4,1
4,18.7,0


For more on piping with pandas, see [this](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-pipe).