# Storytelling with Data! in Altair

by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate selected examples from Cole Nussbaumer's book, "Storytelling with Data - Let's Practice!", using `Python` library `Altair`. The primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with `Streamlit` and other web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of `Altair` compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`, and the difficulty to effectively translate tasks that might appear straightforward in Excel.

In addition to replicating the graphs from the book, the objective is to extend the functionality by creating interactive versions, fully leveraging Altair's capabilities.

## Imports

In [1]:
import pandas as pd
import numpy as np
import altair as alt

## Chapter 2 - Choose an effective visual

*"When I have some data I need to show, how do I do that in an effective way?"* - Cole Nussbaumer

### Exercise 2.5 - how would you show this data?

The data for this exercise can be found here: https://www.storytellingwithdata.com/letspractice/downloads

In [2]:
# Loading considering the NaN caused by Excel formatting
table = pd.read_excel(r"..\..\Data\2.5 EXERCISE.xlsx", usecols = [1, 2], header = 5)
table

Unnamed: 0,Year,Attrition Rate
0,2019,0.091
1,2018,0.082
2,2017,0.045
3,2016,0.123
4,2015,0.056
5,2014,0.151
6,2013,0.07
7,2012,0.01
8,2011,0.02
9,2010,0.097


In [3]:
table.drop(10, inplace = True)

In [4]:
alt.Chart(table).mark_point(filled = True).encode(
    x = alt.X('Year'),
    y = alt.Y('Attrition Rate')
    )

In [5]:
alt.Chart(table).mark_point(filled = True).encode(
    x = alt.X('Year:T'),
    y = alt.Y('Attrition Rate')
    )

In [6]:
alt.Chart(table).mark_point(filled = True).encode(
    x = alt.X('Year:O'),
    y = alt.Y('Attrition Rate')
    )

In [7]:
base = alt.Chart(table, title = alt.Title(
                 "Attrition rate over time",
                 fontSize = 18,
                 fontWeight = 'normal',
                 anchor = 'start',
                 offset = 10))

dots = base.mark_point(filled = True, size = 50, color = '#2c549d').encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              ),
    opacity = alt.value(1)
    ) 

rule = base.mark_rule(color = "#2c549d", strokeDash = [3,3]).encode(
    x = alt.value(0),
    x2 = alt.value(315),
    y = 'mean(Attrition Rate)'
)

label = alt.Chart({"values": 
                    [{"text":  ['AVERAGE: 7.5%']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = -170, dy = 0, 
                                color = '#2c549d',
                                fontWeight = 'bold'
                                ).encode(text = "text:N")

final = dots + rule + label
final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5b.png)

In [8]:
line = base.mark_line(color = '#2c549d').encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              )
    )

label = base.mark_text(align = 'left', dx = 3).encode(
    x= alt.X('Year', aggregate = 'max'),
    y = alt.Y('Attrition Rate', aggregate = {'argmax': 'Year'}),
    text = alt.Text('Attrition Rate')
)

final = line + rule + label

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

In [9]:
line = base.mark_line(color = '#2c549d').encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              )
    )

label = base.mark_text(align = 'left', dx = 3, color = '#2c549d').encode(
    x = alt.X('Year:O', aggregate = 'max'),
    y = alt.Y('Attrition Rate', aggregate = {'argmax': 'Year'}),
    text = alt.Text('Attrition Rate')
)

final = line + rule + label

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

In [14]:
line = base.mark_line(color = '#2c549d').encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              )
    )

label = base.mark_text(align = 'left', dx = 3, color = '#2c549d', fontWeight = 'bold').encode(
    x = alt.X('Year:O'),
    y = alt.Y('Attrition Rate'),
    text = alt.Text('Attrition Rate', format = ".1%"),
    xOffset = alt.value(-10),
    yOffset = alt.value(-10)
).transform_filter(
    alt.FieldEqualPredicate(field='Year', equal=2019)
    )

label2 = alt.Chart({"values": 
                    [{"text":  ['AVG: 7.5%']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = 96, dy = 15, 
                                color = '#2c549d',
                                fontWeight = 'bold'
                                ).encode(text = "text:N")

point = base.mark_point(filled = True).encode(
    x = alt.X('Year:O'),
    y = alt.Y('Attrition Rate', aggregate = {'argmax': 'Year'})
).transform_filter(
    alt.FieldEqualPredicate(field='Year', equal=2019)
    )

final = line + rule + label + label2 + point

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5c.png)


In [None]:
avg = table['Attrition Rate'].mean()

rect = alt.Chart(pd.DataFrame({'y': [0], 'y2':[avg]})).mark_rect(
    opacity = 0.2
).encode(y='y', y2='y2', x = alt.value(0), x2 = alt.value(315))

label2 = alt.Chart({"values": 
                    [{"text":  ['AVG:', '7.5%']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = 113, dy = 15, 
                                color = '#9fb5db',
                                fontWeight = 'bold'
                                ).encode(text = "text:N")

final = line + rect + label + label2 + point

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5d.png)

In [None]:
area = base.mark_area().encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              )
    )

rule_light = base.mark_rule(color = "#9fb5db", strokeDash = [3,3]).encode(
    x = alt.value(0),
    x2 = alt.value(315),
    y = 'mean(Attrition Rate)'
)

final = area + rule_light + label2
final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5e.png)

In [None]:
bar = base.mark_bar(size = 25).encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', ticks = False, domain = False),
              title = None,
              scale = alt.Scale(align = 0)
              ), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              tickCount = 9, format = "%", titleFontWeight = 'normal'), 
              title = "ATTRITION RATE"
              )
    )
    
label = alt.Chart({"values": 
                    [{"text":  ['AVG: 7.5%']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = -130, dy = 0, 
                                color = '#2c549d',
                                fontWeight = 'bold'
                                ).encode(text = "text:N")

final = bar + rule + label
final.properties(
    width = 320,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5f.png)