# Comparing actuals with benchmark, using pandas with SQL

We can achieve the same results we had in the ActualsVsBenchmark notebook, leveraging on the SQL skills we already acquired. In pandas, we can use the sqlite engine to do some dataframe processing. The sqlite engine is not as standards compliant as we might wish, but nevertheless it can come to goo use. Let's bring it on.

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
from pandasql import *
import numpy as np
import pandas as pd
import seaborn as sns

sns.set()

In [2]:
import html5lib

df_bm = pd.read_html('dummytxt.html')[0]
df_bm.head()

Unnamed: 0,Plaats,Maand,18-24,25-39,40-58,59-69,70-100
0,Best,jan,68,76,76,122,108
1,Eindhoven,jan,93,60,60,72,137
2,Helmond,jan,70,99,68,70,124
3,Nuenen,jan,48,68,85,37,44
4,Veldhoven,jan,91,105,68,112,144


First we have to unpivot the age segement values. In plain pandas we used a melt() for this. In SQL we can do this using unions of partial tables. Less fancy but equally effective.

Note the sqlite does not provide the standard SQL unpivot facility.

In [14]:
qry = """

select Plaats, Maand,
    '18-24' as Segment,
    [18-24] as Value
from df_bm
union all
select Plaats, Maand,
    '25-39' as Segment,
    [25-39] as Value
from df_bm
union all
select Plaats, Maand,
    '40-58' as Segment,
    [40-58] as Value
from df_bm
union all
select Plaats, Maand,
    '59-69' as Segment,
    [59-69] as Value
from df_bm
union all
select Plaats, Maand,
    '70-100' as Segment,
    [70-100] as Value
from df_bm

"""

sqldf(qry).head()

Unnamed: 0,Plaats,Maand,Segment,Value
0,Best,jan,18-24,68
1,Eindhoven,jan,18-24,93
2,Helmond,jan,18-24,70
3,Nuenen,jan,18-24,48
4,Veldhoven,jan,18-24,91


In [15]:
df_ord = pd.read_excel('opdracht.xlsx', sheetname='sales')
df_ord.head(3)

Unnamed: 0,nr,datum,bedrag
0,1009,2017-01-03,106
1,1012,2017-01-03,55
2,1006,2017-01-09,37
