## Art of Statistics: 00-2-shipman-times-x

### Altair prefers long-form (a.k.a. tidy-form) over wide-form
#### see: https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data

In [1]:
import altair as alt
import pandas as pd

In [2]:
df = pd.read_csv("00-2-shipman-times-x.csv") 
df.head()

Unnamed: 0,Hour,Shipman,Comparison
0,0,2.6,1.1
1,1,1.0,3.0
2,2,2.6,3.1
3,3,3.0,3.8
4,4,0.3,4.0


### Pure Altair implementation transform_fold()

In [3]:
variable_domain = ["Shipman", "Comparison"]
variable_range = ['blue', 'red']

In [4]:
alt.Chart(df).transform_fold(
    ['Shipman', 'Comparison'],
    as_=['entity', 'percentage']
).mark_line().encode(
    alt.X("Hour", 
          title="Hour of Day"),
    alt.Y("percentage:Q", 
          title="% of Deaths"),
    color=alt.Color("entity:N",
                    scale=alt.Scale(domain=variable_domain, range=variable_range),
                    title=None)
)

### Tidy-form implementation, this is the prefered way to do it.

In [5]:
# rename column Comparison
renamed_df = df.rename(columns={"Comparison": "Comparison GP's"})

In [6]:
tidy_df = renamed_df.melt('Hour', var_name='entity', value_name='percentage')
tidy_df.head()

Unnamed: 0,Hour,entity,percentage
0,0,Shipman,2.6
1,1,Shipman,1.0
2,2,Shipman,2.6
3,3,Shipman,3.0
4,4,Shipman,0.3


In [8]:
variable_domain = ["Shipman", "Comparison GP's"]
variable_range = ['blue', 'red']

In [9]:
alt.Chart(tidy_df).mark_line().encode(
    alt.X("Hour", 
          title="Hour of Day"),
    alt.Y("percentage", 
          title="% of Deaths"),
    color=alt.Color("entity",
                    scale=alt.Scale(domain=variable_domain, range=variable_range),
                    title=None)
)