# Storytelling with Data! in Altair

by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate the examples from Cole Nussbaumer's book, "Storytelling with Data - Let's Practice!", using `Python Altair`. Our primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with `Streamlit` and other web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of Altair compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`. Furthermore, tasks that might appear straightforward in Excel may require multiple iterations to translate effectively into the language.


## Imports

In [2]:
import pandas as pd
import numpy as np
import altair as alt

## Chapter 2 - Choose an effective visual

*"When I have some data I need to show, how do I do that in an effective way?"*

This chapter's exercises aim to incentivize evaluating different graphs for the same data in order to understand the strengths and constraints of each, helping in the process of finding the best medium to present the information you want to highlight.

### Exercise 2.5 - how would you show this data?

The data for this exercise can be found here: https://www.storytellingwithdata.com/letspractice/downloads

In [3]:
# Loading considering the NaN caused by Excel formatting
table = pd.read_excel(r"..\..\Data\2.5 EXERCISE.xlsx", usecols = [1, 2], header = 5)
table

Unnamed: 0,Year,Attrition Rate
0,2019,0.091
1,2018,0.082
2,2017,0.045
3,2016,0.123
4,2015,0.056
5,2014,0.151
6,2013,0.07
7,2012,0.01
8,2011,0.02
9,2010,0.097


In [4]:
table.drop(10, inplace = True)

In [5]:
alt.Chart(table).mark_point(filled = True).encode(
    x = alt.X('Year'),
    y = alt.Y('Attrition Rate')
    )

In [6]:
alt.Chart(table).mark_point(filled = True).encode(
    x = alt.X('Year:O'),
    y = alt.Y('Attrition Rate')
    )

In [27]:
base = alt.Chart(table)

dots = base.mark_point(filled = True).encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', titleColor = '#888888'),
              title = None), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleY = 40, labelColor = "#888888", titleColor = '#888888'), 
              title = "ATTRITION RATE"
              )
    )

rule = base.mark_rule(color = 'blue', strokeDash = [4,4]).encode(
    y= 'mean(Attrition Rate)'
)

final = dots + rule
final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5b.png)

In [30]:
line = base.mark_line().encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', titleColor = '#888888'),
              title = None), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleY = 40, labelColor = "#888888", titleColor = '#888888'), 
              title = "ATTRITION RATE"
              )
    )

final = line + rule

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5c.png)


In [32]:
avg = table['Attrition Rate'].mean()

rect = alt.Chart(pd.DataFrame({'y': [0], 'y2':[avg]})).mark_rect(
    opacity=0.2
).encode(y='y', y2='y2')

final = line + rect

final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5d.png)

In [33]:
area = base.mark_area().encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', titleColor = '#888888'),
              title = None), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleY = 40, labelColor = "#888888", titleColor = '#888888'), 
              title = "ATTRITION RATE"
              )
    )

final = area + rule
final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5e.png)

In [34]:
bar = base.mark_bar().encode(
    x = alt.X('Year:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', titleColor = '#888888'),
              title = None), 
    y = alt.Y('Attrition Rate',
              axis = alt.Axis(grid = False, titleY = 40, labelColor = "#888888", titleColor = '#888888'), 
              title = "ATTRITION RATE"
              )
    )
    

final = bar + rule
final.properties(
    width = 350,
    height = 200
).configure_view(stroke = None)

![Alt text](\Images\2_5f.png)