# Storytelling with Data! in Altair

by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate the examples from Cole Nussbaumer's book, "Storytelling with Data - Let's Practice!", using `Python Altair`. Our primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with `Streamlit` and other web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of Altair compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`. Furthermore, tasks that might appear straightforward in Excel may require multiple iterations to translate effectively into the language.


## Imports

In [33]:
import pandas as pd
import numpy as np
import altair as alt

## Chapter 4 - Focus Attention

*Where do you want your audience to look?*

### Exercise 3 - direct attention many ways

The data for this exercise can be found here: https://www.storytellingwithdata.com/letspractice/downloads

In [34]:
table = pd.read_excel(r"..\..\Data\4.3 EXERCISE.xlsx", usecols = [1, 2, 3, 4], header = 5, skipfooter = 5)

table

Unnamed: 0,YEAR,Total,Organic,Referral
0,2005,0.087,0.033,0.054
1,2006,0.083,0.035,0.048
2,2007,0.086,0.037,0.049
3,2008,0.089,0.036,0.053
4,2009,0.084,0.034,0.05
5,2010,0.086,0.031,0.055
6,2011,0.075,0.032,0.043
7,2012,0.072,0.035,0.037
8,2013,0.069,0.032,0.037
9,2014,0.074,0.038,0.036


In [35]:
melted_table = pd.melt(table, id_vars = ['YEAR'], var_name = 'Metric', value_name = 'Value')
melted_table["Metric"] = melted_table["Metric"].str.upper()

melted_table

Unnamed: 0,YEAR,Metric,Value
0,2005,TOTAL,0.087
1,2006,TOTAL,0.083
2,2007,TOTAL,0.086
3,2008,TOTAL,0.089
4,2009,TOTAL,0.084
5,2010,TOTAL,0.086
6,2011,TOTAL,0.075
7,2012,TOTAL,0.072
8,2013,TOTAL,0.069
9,2014,TOTAL,0.074


In [36]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None)
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', dx = 20, size = 13, color = "#aaaaaa").encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric'
)

gray_line = line + text
gray_line.configure_view(stroke = None)

![Teste1](\Images\4_3a.png)

![Teste2](\Images\4_3b.png)

![Teste8](\Images\4_3c.png)

In [37]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
    opacity = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value(1), alt.value(0.5))
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', baseline='middle', dx = 20, size = 13, color = "#aaaaaa").encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    opacity = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value(1), alt.value(0.5))

)

final = line + text
final.configure_view(stroke = None)

![Alt text](\Images\4_3d.png)

![Alt text](\Images\4_3e.png)

In [38]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line().encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
    strokeWidth = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value(4), alt.value(2))
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', baseline='middle', dx = 20, size = 13, color = "#aaaaaa").encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    opacity = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value(1), alt.value(0.7))

)

final = line + text
final.configure_view(stroke = None)

![Alt text](\Images\4_3f.png)

In [39]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
    strokeDash = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value([5,3]), alt.value([1,0]))
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', dx = 20, size = 13, color = "#aaaaaa").encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric'
)

final = line + text
final.configure_view(stroke = None)

![Alt text](\Images\4_3g.png)

In [40]:
line_referral = alt.Chart(melted_table).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.value("black"),
    opacity =  alt.value(1)
    ).properties(width = 500).transform_filter(
    alt.FieldEqualPredicate(field='Metric', equal = 'REFERRAL')
    )

line_rest = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ['#aaaaaa']), legend = None),
    opacity =  alt.value(1)
    ).properties(width = 500).transform_filter(
    alt.FieldOneOfPredicate(field='Metric', oneOf = ['ORGANIC', 'TOTAL'])
    )

text_highlight = alt.Chart(melted_table).mark_text(align='left', dx = 20, size = 13).encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    color = alt.condition(alt.datum['Metric'] == "REFERRAL", alt.value("black"), alt.value("#aaaaaa"))
)

final = line_referral + line_rest + text_highlight
final.configure_view(stroke = None)

In [41]:
text_total = alt.Chart({"values": 
                    [{"text":  ['TOTAL']}]
                    }
                    ).mark_text(size = 13, 
                                align = "left", 
                                dx = 230, dy = -37, 
                                color = '#aaaaaa'
                                ).encode(text = "text:N")

text_organic = alt.Chart({"values": 
                    [{"text":  ['ORGANIC']}]
                    }
                    ).mark_text(size = 13, 
                                align = "left", 
                                dx = 230, dy = 37, 
                                color = '#aaaaaa'
                                ).encode(text = "text:N")

text_referral = alt.Chart({"values": 
                    [{"text":  ['REFERRAL']}]
                    }
                    ).mark_text(size = 13, 
                                align = "left", 
                                dx = 230, dy = 78, 
                                color = 'black'
                                ).encode(text = "text:N")

final = line_referral + line_rest + text_total + text_organic + text_referral
final.configure_view(stroke = None)

![Alt text](\Images\4_3h.png)

In [42]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa", 'black', "#aaaaaa"]), legend = None),
    ).properties(width = 500)

final = line + text_highlight
final.configure_view(stroke = None)

![Alt text](\Images\4_3i.png)

In [43]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa", '#d24b53', "#aaaaaa"]), legend = None),
    ).properties(width = 500)

text_red = alt.Chart(melted_table).mark_text(align='left', baseline='middle', dx = 20, size = 13).encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa", '#d24b53', "#aaaaaa"]), legend = None),

)

final = line + text_red
final.configure_view(stroke = None)

![Alt text](\Images\4_3j.png)

In [44]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time: Referral decreasing markedly since 2010', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', baseline='middle', dx = 20, size = 13).encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),

)

final = line + text
final.configure_view(stroke = None)

![Alt text](\Images\4_3k.png)

![Alt text](\Images\4_3l.png)

Animate to appear. Though difficult to show in a static book, motion is the
most attention-grabbing preattentive attribute and can work very well in a live
setting (where you are presenting the graph and can flip through various views).
Imagine we start with an empty graph that only has the x- and y-axes. Then we
could add a line representing the Total conversion rate and discuss. Next, I could
layer on the Organic conversion rate and talk about that. Finally, I could add the
Referral line. The simple fact of it not being there and then appearing would garner attention

In [45]:
line_referral_gray = alt.Chart(melted_table).mark_line(strokeWidth = 3, point = True).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.value("#aaaaaa"),
    opacity =  alt.value(1)
    ).properties(width = 500).transform_filter(
    alt.FieldEqualPredicate(field='Metric', equal = 'REFERRAL')
    )

text_referral_gray = alt.Chart({"values": 
                    [{"text":  ['REFERRAL']}]
                    }
                    ).mark_text(size = 13, 
                                align = "left", 
                                dx = 230, dy = 78, 
                                color = '#aaaaaa'
                                ).encode(text = "text:N")

final = line_rest + line_referral_gray + text_total + text_organic + text_referral_gray
final.configure_view(stroke = None)

![Alt text](\Images\4_3m.png)

In [46]:
label = alt.Chart(melted_table).mark_text(align = 'left', dx = 3, color = '#aaaaaa').encode(
    x = alt.X('YEAR:O'),
    y = alt.Y('Value'),
    text = alt.Text('Value', format = ".1%"),
    xOffset = alt.value(-10),
    yOffset = alt.value(-10)
).transform_filter(
    alt.FieldEqualPredicate(field='Metric', equal="REFERRAL")
    )

final = line_rest + line_referral_gray + text_total + text_organic + text_referral_gray + label
final.configure_view(stroke = None)


![Alt text](\Images\4_3n.png)

In [47]:
line = alt.Chart(melted_table, title = alt.Title('Conversion rate over time', 
                                          fontWeight = 'normal', 
                                          anchor = 'start', 
                                          fontSize = 17)).mark_line(strokeWidth = 3).encode(
    x = alt.X('YEAR:O',
              axis = alt.Axis(labelAngle = 0, labelColor = '#888888', 
                              titleColor = '#888888', titleAnchor = 'start', 
                              titleFontWeight = 'normal'),
              title = 'FISCAL YEAR',
              scale = alt.Scale(align = 0)), 
    y = alt.Y('Value',
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%"), 
              title = "CONVERSION RATE",
              scale = alt.Scale(domain = [0, 0.1])
              ),
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
    ).properties(width = 500)

text = alt.Chart(melted_table).mark_text(align='left', baseline='middle', dx = 55, size = 13).encode(
    x = alt.X('YEAR', aggregate = 'max', axis = None),
    y = alt.Y('Value', aggregate = {'argmax': 'YEAR'}),
    text = 'Metric',
    color = alt.Color("Metric", scale = alt.Scale(range = ["#aaaaaa"]), legend = None),
)

text2 = alt.Chart(melted_table).mark_text(align = 'left', color = '#aaaaaa').encode(
    x = alt.X('YEAR:O', axis = None),
    y = alt.Y('Value'),
    text = alt.Text('Value', format = ".1%"),
    xOffset = alt.value(225)
).transform_filter(
    alt.FieldEqualPredicate(field='YEAR', equal=2019)
    )

point = alt.Chart(melted_table).mark_point(filled = True, color = '#aaaaaa').encode(
    x = alt.X('YEAR:O', axis = None),
    y = alt.Y('Value'),
    xOffset = alt.value(217),
    opacity = alt.value(1)
).transform_filter(
    alt.FieldEqualPredicate(field='YEAR', equal='2019')
    )

final = line + text + text2 + point
final.configure_view(stroke = None)

![Alt text](\Images\4_3o.png)

![Alt text](\Images\4_3p.png)

In [48]:
import nbconvert
import nbformat

with open('Exercise 2.ipynb') as nb_file:
    nb_contents = nb_file.read()

# Convert using the ordinary exporter
notebook = nbformat.reads(nb_contents, as_version=4)
exporter = nbconvert.HTMLExporter()
body, res = exporter.from_notebook_node(notebook)

# Create a dict mapping all image attachments to their base64 representations
images = {}
for cell in notebook['cells']:
    if 'attachments' in cell:
        attachments = cell['attachments']
        for filename, attachment in attachments.items():
            for mime, base64 in attachment.items():
                images[f'attachment:{filename}'] = f'data:{mime};base64,{base64}'

# Fix up the HTML and write it to disk
for src, base64 in images.items():
    body = body.replace(f'src="{src}"', f'src="{base64}"')
with open('teste9.html', 'w') as output_file:
    output_file.write(body)