# What is 'piper'?

Piper is an attempt to simplify the process of exploring and assembling data transformations of tabular or Excel spreadsheet like data.<br>It's built on top of pandas, numpy and Jupyter.

# Example

Consider the following data from gapminder project on numbers of internet users by country and year.

In [1]:
import pandas as pd
import numpy as np

url = 'inputs/internet_users.csv'
df = pd.read_csv(url)

# How many rows, columns in the dataset?
print(df.shape)

# Show me the first 5 rows
df.head()

(194, 61)


Unnamed: 0,country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Afghanistan,,,,,,,,,,...,4.0,5.0,5.45,5.9,7.0,8.26,,11.4,,
1,Albania,,,,,,,,,,...,45.0,49.0,54.7,57.2,60.1,63.3,66.4,71.8,,69.6
2,Algeria,,,,,,,,,,...,12.5,14.9,18.2,22.5,29.5,38.2,42.9,47.7,49.0,
3,Andorra,,,,,,,,,,...,81.0,81.0,86.4,94.0,95.9,96.9,97.9,91.6,,
4,Angola,,,,,,,,,,...,2.8,3.1,6.5,8.9,21.4,12.4,13.0,14.3,,


# Pandas solution using .pipe

What are the top 5 countries with the highest cumulative count since 1990?<br>
Provide a 'total' count as well

In [2]:
df2 = (df.melt(id_vars='country').dropna()
         .rename(columns={'variable': 'year'})
         .assign(year= lambda x: x.year.astype(int))
         .query("year > 1990")
         .groupby('country')
         .agg(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
         .sort_values('Total_count_since_1990', ascending=False)
         .head())

total_line = pd.DataFrame([df2['Total_count_since_1990'].sum()],
                          index=['Total'],
                          columns=['Total_count_since_1990'])
df3 = pd.concat([df2, total_line], axis=0, ignore_index=False)
df3

Unnamed: 0,Total_count_since_1990
Norway,1854.88
Iceland,1806.505
Denmark,1753.739
Netherlands,1735.601
Sweden,1718.33
Total,8869.055


# Pandas solution using %%piper

## standard piper imports

First, import the piper magic module and the 'verbs' module which provide a more 'SQL' like interface

In [3]:
from piper import piper
from piper.verbs import *

piper version 0.0.9, last run: Tuesday, 09 March 2021 20:04:43


## piper solution

Notes:
<ul>
    <li>panda functions can be used interchangeably
    <li>adorn (add totals) by default gives row totals, can also generate column totals
    <li>lines can be commented out using #
</ul>

In [4]:
%%piper

df.melt(id_vars='country').dropna()
>> rename(columns={'variable': 'year'})
>> assign(year= lambda x: x.year.astype(int))
>> where("year > 1990")
>> group_by('country')
>> summarise(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
>> order_by('Total_count_since_1990', ascending=False)
>> head(5)
>> adorn()

194 rows, 1 columns


Unnamed: 0,Total_count_since_1990
Norway,1854.88
Iceland,1806.505
Denmark,1753.739
Netherlands,1735.601
Sweden,1718.33
All,8869.055


## Alternative solution (using where clause)

In [5]:
%%piper

df.melt(id_vars='country')
>> where("~value.isna()")
>> rename(columns={'variable': 'year'})
>> assign(year= lambda x: x.year.astype(int))
>> where("year > 1990")
>> group_by('country')
>> summarise(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
>> order_by('Total_count_since_1990', ascending=False)
>> head(5)
>> adorn(axis=0)

194 rows, 1 columns


Unnamed: 0,Total_count_since_1990
Norway,1854.88
Iceland,1806.505
Denmark,1753.739
Netherlands,1735.601
Sweden,1718.33
All,8869.055
