# Creating an interactive bar graph with Altair

# <a id = 'indice'> <a/> Sections
1.  <a href = '#paquetes'> Necessary packages<a/> 
2.  <a href = '#intro'> Load the data<a/>
3.  <a href = '#obj'> Objective of this Jupyter Notebook<a/>
4.  <a href = '#ej2'> Process the data <a/>
5.  <a href = '#ej5'> Create a new variable for Sex<a/>
6.  <a href = '#ej3'> Aesthetic details <a/>
7.  <a href = '#ej8'> Interactive data visualization <a/>


<a id = 'paquetes'><a/>
## 1) Import the necessary packages

In [1]:
import pandas as pd # to deal with dataframes
import altair as alt # to draw the graphic
pd.set_option('mode.chained_assignment', None) # hides a warning which is not necessary for our case

<a id = 'intro'><a/>
## 2) Load the data and Introduction

I work with the following dataframe (you can find YourData.xls in the GitHub):

In [2]:
df = pd.read_excel(r'C:\Users\ramon\Desktop\Mates de Jorge\UGR, Análisis datos Covid-19\YourData.xls'
                   , sheet_name=0, usecols='A:Q', header=1, nrows=30)

# Note: replace C:\Users\ramon\Desktop\Mates de Jorge\UGR, Análisis datos Covid-19\YourData.xls by
# YourWorkingDirectory\YourData.xls 
# and you will be able to compile this Jupyter Notebook in your own computer.


I show you how the dataframe looks like: it needs some processing and rearranging before a graphic may be created.

In [3]:
df

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Estimation,lim inf IC,lim sup IC,Estimation.1,lim inf IC.1,lim sup IC.1
0,All ages,Excellent,0.152182,0.133934,0.17242,0.100654,0.086984,0.1162
1,,Very Good,0.243303,0.221006,0.26708,0.210088,0.190977,0.23057
2,,Good,0.478433,0.451695,0.50529,0.492307,0.468007,0.51664
3,,Regular,0.111609,0.095756,0.12971,0.168296,0.150763,0.18742
4,,Bad,0.014472,0.009095,0.02296,0.028656,0.021486,0.03812
5,,Total,1.0,,,1.0,,
6,16-29,Excellent,0.365254,0.302048,0.43347,0.209061,0.166024,0.25978
7,,Very Good,0.342935,0.280595,0.41121,0.343133,0.290265,0.4002
8,,Good,0.273023,0.216241,0.33828,0.375451,0.32104,0.4332
9,,Regular,0.018789,0.007006,0.0494,0.072355,0.046945,0.10993


This table shows estimations of the percentage of people (Men and Women) which has a Health ranging from Bad to Excellent, as well as the Inferior and Superior limit of these estimates with a condidence interval of the 95%.

 <a id = 'obj'><a/>
## 3) Objective of this Jupyter Notebook

The preceeding data is easy to work with, but, how do we interpret it? This is usually done creating graphics which support the data, and making it easier to understand to an external user. 

In this Jupyter Notebook, I show you how to make an easy data visualization in order to compare the Health of different groups of people according to the variables Age and Sex. This graphic is part of a bigger project of creating graphics for a Web which I am currently working on.

 <a id = 'ej2'><a/>
## 4) Process the data:

Rename the columns

In [4]:
mylist = ['Age', 'Health', 'Men_Est', 'Men_lim_inf', 'Men_lim_sup', 'Women_Est', 'Women_lim_inf', 'Women_lim_sup']
df.columns = mylist

Fill the gaps in the variable 'Age'

In [5]:
age = df['Age']
for j in range(0, len(df)):
    if j % 6 != 0:
        df['Age'][j] = age[j - 1]

Remove NA's

In [6]:
df = df.dropna()
df.index = range(0, len(df))

Change '-' to 0

In [7]:
for col in df.columns:
    for k in range(0, len(df)):
        if df[col][k] == '-':
            df[col][k] = 0

<a id = 'ej5'><a/>
## 5) Create a new variable for Sex

Extract the information of the inferior and superior limit

In [8]:
lim_inf = []
lim_sup = []

for i in range(0, len(df)):
    lim_inf.append(df['Men_lim_inf'][i])
    lim_sup.append(df['Men_lim_sup'][i])

for i in range(0, len(df)):
    lim_inf.append(df['Women_lim_inf'][i])
    lim_sup.append(df['Women_lim_sup'][i])

df = df.drop(['Men_lim_inf', 'Men_lim_sup', 'Women_lim_inf', 'Women_lim_sup'],
             axis=1)

To depict the Sex in the graphic, I would like it to be a variable. I achieve this using the following function:

In [9]:
df = df.melt(id_vars=['Age', 'Health'], var_name='M', value_name='Est')

This means that, except for 'Age' and 'Health', the
rest of the variables are now values of a new variable named 'M', and
the values of the old variable are now in the new variable 'Est'


In [10]:
df['lim_inf'] = lim_inf
df['lim_sup'] = lim_sup

In [11]:
Sex = []
for k in range(0, len(df)):
    Sex.append(df['M'][k].replace('Men_Est', 'Men').replace('Women_Est', 'Women'))

df['Sex'] = Sex
df = df.drop('M', axis=1)

<a id = 'ej3'><a/>
## 6) Aesthetics details


Move 'Est', 'lim_inf' and 'lim_sup' to a 1-100 scale

In [12]:
est = []
inf = []
sup = []
for k in range(0, len(df)):
    est.append(round(df['Est'][k] * 100, 1))
    inf.append(round(df['lim_inf'][k] * 100, 1))
    sup.append(round(df['lim_sup'][k] * 100, 1))

df = df.drop(['Est', 'lim_inf', 'lim_sup'], axis=1)
df['Est'] = est
df['lim_inf'] = inf
df['lim_sup'] = sup


Delete 'All ages' rows (I do not want them for the graphic)


In [13]:
df = df[df.Age!= 'All ages']
df.index = range(0,len(df)) # Always remember to reindex!


In [14]:
age = []
for j in range(0, len(df)):
    age.append(df['Age'][j])

df.drop('Age', axis=1)

df['Age'] = age

And create a new variable which indicates that
all this values are for the Survey 1

In [15]:
df['M'] = 'Survey 1'

<a id = 'ej8'><a/>
## 7) Interactive data visualization

To select the stacking order in the bar graph, I create a categorical
order with the variable 'order'

In [16]:
cat_orden = ['Excellent', 'Very Good', 'Good', 'Regular', 'Bad']

df['order'] = df['Health'].replace({val: -i for i, val in enumerate(cat_orden)})


Now we are in conditions to create the graphic. First I create a selector to add interactivity.

In [17]:
selection = alt.selection_multi(fields=['Health'], bind='legend')

and finally I create the alt.Chart() which generates the graphic

In [18]:
bar_chart = alt.Chart(df).mark_bar().encode(
    x=alt.X('M:N', title=None, axis=alt.Axis(labelAngle=0)),
    y=alt.Y('Est:Q', title='Percentage Estimation'),
    color=alt.Color('Health:N', scale=alt.Scale(scheme='spectral', reverse=True),
                    # optional: make color order in legend match stack order
                    sort=alt.EncodingSortField('order', order='descending'),
                    legend=alt.Legend(title="", symbolSize=400, symbolType='square', labelLimit=0)),
    order='order',  # this controls stack order
    tooltip=[alt.Tooltip('Est', title='Estimation'), alt.Tooltip('lim_inf', title='Inferior Limit'), # this adds the hovering effect
             alt.Tooltip('lim_sup', title='Superior Limit')],
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(
    selection
).transform_filter(selection
                   ).properties(
    width=180,
    height=180
).facet(
    title='Health in the 1st survey with respect to Sex and Age group',
    row=alt.Row('Sex:N', title=None, header=alt.Header(labelFontSize=15)),
    column=alt.Column('Age:N', title=None, header=alt.Header(labelFontSize=15))
)

## scale = alt.Scale(range=['darkgreen', 'palegreen', 'khaki', 'lightcoral', 'firebrick'])

leyenda = alt.Chart(df).mark_text().encode(
    color=alt.Color('Health:N', scale=alt.Scale(scheme='spectral', reverse=True),
                    # optional: make color order in legend match stack order
                    sort=alt.EncodingSortField('order', order='descending'),
                    legend=alt.Legend(title="",
                                      symbolSize=400, symbolType='square', labelLimit=0, labelAlign='left',
                                      offset=-20)),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection)

example_for_application = alt.hconcat(bar_chart, leyenda).configure_view(strokeWidth=0).configure_title(
    color='black',
    dy=-20,
    dx=70)

example_for_application



In this graphic, you may contrast people's Health among different groups of Age and Sex.
You can see the specific details of the data by hovering on the bars. Moreover, if you click on
the right hand side legend, the graphic transforms and you can see in more detail a specific
kind of Health. This is particularly useful if you want to compare changes among different groups of age: see for example what happens when you click on 'Excellent' and in 'Bad'.

In our case, we only have one survey, but the same can be done with four survey in the
x axis, obtaining a total of 32 clearly distinguished bars in the same graphic, which proves itself
quite useful at the time of exploring the data.

Hope you enjoyed it!