Skip to content
Data Analysis with Python. Pivot tables with Pandas
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img img Mar 17, 2018
.gitignore initial import Mar 17, 2018
Pivot tables.ipynb
README.md img Mar 17, 2018
requirements.txt initial import Mar 17, 2018

README.md

Data Analysis with Python. Pivot tables with Pandas

One of the first post in my blog was about Pivot tables. I'd created a library to pivot tables in my PHP scripts. The library is not very beautiful (it throws a lot of warnings), but it works. These days I'm playing with Python Data Analysis and I'm using Pandas. The purpose of this post is something that I like a lot: Learn by doing. So I want to do the same operations that I did eight years ago in the post but now with Pandas. Let's start.

I'll start with the same datasource that I used almost ten years ago. One simple recordset with cliks and number of users

I create a dataframe with this data

import numpy as np
import pandas as pd

data = pd.DataFrame([
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 1, 'clicks': 123, 'users': 4},
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 2, 'clicks': 134, 'users': 5},
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 3, 'clicks': 341, 'users': 2},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 1, 'clicks': 113, 'users': 4},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 2, 'clicks': 234, 'users': 5},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 3, 'clicks': 421, 'users': 2},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 4, 'clicks': 22, 'users': 3},
    {'host': 2, 'country': 'es', 'year': 2010, 'month': 1, 'clicks': 111, 'users': 2},
    {'host': 2, 'country': 'es', 'year': 2010, 'month': 2, 'clicks': 2, 'users': 4},
    {'host': 3, 'country': 'es', 'year': 2010, 'month': 3, 'clicks': 34, 'users': 2},
    {'host': 3, 'country': 'es', 'year': 2010, 'month': 4, 'clicks': 1, 'users': 1}
])

Dashboard

Now we want to do a simple pivot operation. We want to pivot on host

pd.pivot_table(data,
   index=['host'], 
   values=['users', 'clicks'], 
   columns=['year', 'month'],
   fill_value=''
  )

Dashboard

We can add totals

pd.pivot_table(data,
               index=['host'], 
               values=['users', 'clicks'], 
               columns=['year', 'month'],
               fill_value='',
               aggfunc=np.sum, 
               margins=True, 
               margins_name='Total'
              )

Dashboard

We can also pivot on more than one column. For example host and country

pd.pivot_table(data,
               index=['host', 'country'], 
               values=['users', 'clicks'], 
               columns=['year', 'month'],
               fill_value=''
              )

Dashboard

and also with totals

pd.pivot_table(data,
               index=['host', 'country'], 
               values=['users', 'clicks'], 
               columns=['year', 'month'],
               aggfunc=np.sum, 
               fill_value='',
               margins=True, 
               margins_name='Total'
              )

Dashboard

We can group by dataframe and calculate subtotals

data.groupby(['host', 'country'])[('clicks', 'users')].sum()

Dashboard

data.groupby(['host', 'country'])[('clicks', 'users')].mean()

Dashboard

And finally we can mix totals and subtotals.

out = data.groupby('host').apply(lambda sub: sub.pivot_table(
    index=['host', 'country'], 
    values=['users', 'clicks'], 
    columns=['year', 'month'],
    aggfunc=np.sum, 
    margins=True,
    margins_name='SubTotal',
))

out.loc[('', 'Max', '')] = out.max()
out.loc[('', 'Min', '')] = out.min()
out.loc[('', 'Total', '')] = out.sum()

out.index = out.index.droplevel(0)

out.fillna('', inplace=True)
out

Dashboard

And that's all. A lot of to learn yet about data analysis, but Pandas will be definitely a good friend of mine.

You can’t perform that action at this time.