# What is 'piper'?

Piper is an attempt to simplify the process of exploring and assembling data transformations of tabular or Excel spreadsheet like data.<br>It's built on top of pandas, numpy and Jupyter.

# Example

Consider the following data from gapminder project on numbers of internet users by country and year.

In [None]:
import pandas as pd
import numpy as np

url = 'inputs/internet_users.csv'
df = pd.read_csv(url)

# How many rows, columns in the dataset?
print(df.shape)

# Show me the first 5 rows
df.head()

# Pandas solution using .pipe

What are the top 5 countries with the highest cumulative count since 1990?<br>
Provide a 'total' count as well

In [None]:
df2 = (df.melt(id_vars='country').dropna()
         .rename(columns={'variable': 'year'})
         .assign(year= lambda x: x.year.astype(int))
         .query("year > 1990")
         .groupby('country')
         .agg(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
         .sort_values('Total_count_since_1990', ascending=False)
         .head())

total_line = pd.DataFrame([df2['Total_count_since_1990'].sum()],
                          index=['Total'],
                          columns=['Total_count_since_1990'])
df3 = pd.concat([df2, total_line], axis=0, ignore_index=False)
df3

# Pandas solution using %%piper

## standard piper imports

First, import the piper magic module and the 'verbs' module which provide a more 'SQL' like interface

In [None]:
from piper import piper
from piper.verbs import *

## piper solution

Notes:
<ul>
    <li>panda functions can be used interchangeably
    <li>adorn (add totals) by default gives row totals, can also generate column totals
    <li>lines can be commented out using #
</ul>

In [None]:
%%piper

df.melt(id_vars='country').dropna()
>> rename(columns={'variable': 'year'})
>> assign(year= lambda x: x.year.astype(int))
>> where("year > 1990")
>> group_by('country')
>> summarise(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
>> order_by('Total_count_since_1990', ascending=False)
>> head(5)
>> adorn()

## Alternative solution (using where clause)

In [None]:
%%piper

df.melt(id_vars='country')
>> where("~value.isna()")
>> rename(columns={'variable': 'year'})
>> assign(year= lambda x: x.year.astype(int))
>> where("year > 1990")
>> group_by('country')
>> summarise(Total_count_since_1990=pd.NamedAgg('value', 'sum'))
>> order_by('Total_count_since_1990', ascending=False)
>> head(5)
>> adorn(axis=0)