# Storytelling with Data! in Altair

by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate the examples from Cole Nussbaumer's book, "Storytelling with Data - Let's Practice!", using `Python Altair`. Our primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with `Streamlit` and other web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of Altair compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`. Furthermore, tasks that might appear straightforward in Excel may require multiple iterations to translate effectively into the language.


## Imports

In [2]:
import pandas as pd
import numpy as np
import altair as alt

## Chapter 2 - Choose an effective visual

*"When I have some data I need to show, how do I do that in an effective way?"*

This chapter's exercises aim to incentivize evaluating different graphs for the same data in order to understand the strengths and constraints of each, helping in the process of finding the best medium to present the information you want to highlight.

# 

### Exercise 2.4 - practice in your tool

The data for this exercise can be found here: https://www.storytellingwithdata.com/letspractice/downloads

In [3]:
# Loading considering the NaN caused by Excel formatting
table = pd.read_excel(r"..\..\Data\2.4 EXERCISE.xlsx", usecols = [1, 2, 3], header = 4)
table


Unnamed: 0,DATE,CAPACITY,DEMAND
0,2019-04,29263,46193
1,2019-05,28037,49131
2,2019-06,21596,50124
3,2019-07,25895,48850
4,2019-08,25813,47602
5,2019-09,22427,43697
6,2019-10,23605,41058
7,2019-11,24263,37364
8,2019-12,24243,34364
9,2020-01,25533,34149


In [4]:
table['UNMET DEMAND'] = table['DEMAND'] - table['CAPACITY']
table

Unnamed: 0,DATE,CAPACITY,DEMAND,UNMET DEMAND
0,2019-04,29263,46193,16930
1,2019-05,28037,49131,21094
2,2019-06,21596,50124,28528
3,2019-07,25895,48850,22955
4,2019-08,25813,47602,21789
5,2019-09,22427,43697,21270
6,2019-10,23605,41058,17453
7,2019-11,24263,37364,13101
8,2019-12,24243,34364,10121
9,2020-01,25533,34149,8616


In [5]:
# Transforming data into long-format

melted_table = pd.melt(table, id_vars = ['DATE'], var_name = 'Metric', value_name = 'Value')
melted_table

Unnamed: 0,DATE,Metric,Value
0,2019-04,CAPACITY,29263
1,2019-05,CAPACITY,28037
2,2019-06,CAPACITY,21596
3,2019-07,CAPACITY,25895
4,2019-08,CAPACITY,25813
5,2019-09,CAPACITY,22427
6,2019-10,CAPACITY,23605
7,2019-11,CAPACITY,24263
8,2019-12,CAPACITY,24243
9,2020-01,CAPACITY,25533


In [6]:
melted_table['DATE'] = pd.to_datetime(melted_table['DATE'])

melted_table['year'] = melted_table['DATE'].dt.year

melted_table['month'] = melted_table['DATE'].apply(lambda x: x.strftime('%b'))

In [7]:
melted_table.drop('DATE', axis = 1, inplace = True)

In [8]:
melted_table

Unnamed: 0,Metric,Value,year,month
0,CAPACITY,29263,2019,Apr
1,CAPACITY,28037,2019,May
2,CAPACITY,21596,2019,Jun
3,CAPACITY,25895,2019,Jul
4,CAPACITY,25813,2019,Aug
5,CAPACITY,22427,2019,Sep
6,CAPACITY,23605,2019,Oct
7,CAPACITY,24263,2019,Nov
8,CAPACITY,24243,2019,Dec
9,CAPACITY,25533,2020,Jan


In [9]:
table_2019  = melted_table[melted_table['year'].isin([2019])]

demand_2019 = table_2019[table_2019['Metric'].isin(['DEMAND'])]
capacity_2019 = table_2019[table_2019['Metric'].isin(['CAPACITY'])]


In [25]:
# "filled" doesn't accept condition
# also the graph is bad so 

bar_table = table_2019[table_2019['Metric'].isin(['CAPACITY', 'DEMAND'])]

alt.Chart(bar_table, 
          title = alt.Title('Demand vs Capacity Over Time', anchor = 'start', offset = 20, fontSize = 16), 
          ).mark_bar().encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              scale = alt.Scale(domain = [0, 60000]), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 30, labelColor = '#888888', titleColor = '#888888', ticks = False), 
              title = "2019"),
    color = alt.Color('Metric', 
                      scale = alt.Scale(range = ['#b4c6e4', '#4871b7']),
                      sort = 'descending'),
    xOffset = alt.XOffset('Metric', sort = 'descending')
    ).configure_view(stroke = None)



![Alt text](\Images\2_4a.png)

In [40]:
alt.Chart(bar_table).mark_line().encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              scale = alt.Scale(domain = [0, 60000]), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 20, labelColor = "#888888", titleColor = '#888888', ticks = False), 
              title = "2019"),
    color = 'Metric',
    strokeWidth = alt.condition(
        "datum.Metric == 'CAPACITY'",
        alt.value(3),
        alt.value(1))
).properties(
    width=350,
    height=250
).configure_view(stroke = None)

![Alt text](\Images\2_4b.png)

In [51]:
demand = alt.Chart(demand_2019, width = alt.Step(50)).mark_bar(
    filled = False    
    ).encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              scale = alt.Scale(domain = [0, 60000]), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 20, labelColor = "#888888", titleColor = '#888888', ticks = False), 
              title = "2019")
    )

capacity = alt.Chart(capacity_2019).mark_bar(
    size = 40
    ).encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              scale = alt.Scale(domain = [0, 60000]), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 25, labelColor = "#888888", titleColor = '#888888', ticks = False), 
              title = "2019"),
    opacity = alt.value(0.5)
    )

final = capacity + demand
final.configure_scale(
    bandPaddingInner = 0.5
).configure_view(stroke = None)

![Alt text](\Images\2_4c.png)

In [61]:
stacked_table = table_2019[table_2019['Metric'].isin(['CAPACITY', 'UNMET DEMAND'])]

alt.Chart(stacked_table).mark_bar(size = 25).encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              scale = alt.Scale(domain = [0, 60000]), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 25, labelColor = "#888888", titleColor = '#888888', ticks = False), 
              title = "2019"),
    color = alt.Color('Metric'),
    order=alt.Order(
      'Metric',
      sort='ascending')
    ).configure_view(stroke = None).properties(
    width = 300,
    height = 200)

![Alt text](\Images\2_4d.png)

![Alt text](\Images\2_4e.png)

In [46]:
line_diff_table = table_2019[table_2019['Metric'].isin(['UNMET DEMAND'])]

alt.Chart(line_diff_table).mark_line().encode(
    y = alt.Y('Value', 
              axis = alt.Axis(grid = False, titleY = 75, labelColor = "#888888", titleColor = '#888888'), 
              title = "NUMBER OF PROJECT HOURS"),
    x = alt.X('month', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 20, labelColor = "#888888", titleColor = '#888888', ticks = False), 
              title = "2019")
              ).properties(
    width = 400,
    height = 250
).configure_view(stroke = None)

![Alt text](\Images\2_4f.png)