# Storytelling with Data

Storytelling is an important skill to communicate your analysis findings.  
Often you will not be able to talk to your audience, and your charts would have to talk for you.  

To get inspiration and excellent tutorials, we recommend [python-graph-gallery](https://python-graph-gallery.com/) Project.  
It contains **hundreds** of excellent charts and graphs, together with their python code - for reproducibility and learning.  

You can add arrows [add arrows](https://python-graph-gallery.com/drawarrow/) with the [DrawArrow Package](https://github.com/JosephBARBIERDARNAL/drawarrow); or [add text and captions](https://python-graph-gallery.com/advanced-custom-annotations-matplotlib/) with [flexitext](https://github.com/tomicapretto/flexitext) or [highlight_text](https://github.com/znstrider/highlight_text) packages. 

Here's a short example of the power of Storytelling with Data:

In [0]:
%pip install -q pandas matplotlib numpy pyfonts drawarrow highlight_text
dbutils.library.restartPython()

We load the packages - including the font, arrow drawing and annotation ones:

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pyfonts import load_font
from drawarrow import ax_arrow
from highlight_text import fig_text, ax_text
import matplotlib.font_manager as fm


Some data for this chart:

In [0]:
url = "../../Data/japan_pop.csv"
df = pd.read_csv(url)
df.head()

In [0]:
# style parameters
prop = fm.FontProperties(
   fname='/usr/share/fonts/truetype/groovygh.ttf'
)
font = fm.FontProperties(
   fname='../../Data/Fonts/LibreBaskervilleRegular.otf'
)
boldfont = fm.FontProperties(
   fname='../../Data/Fonts/LibreBaskervilleBold.otf'
)
digit_font = fm.FontProperties(fname='../../Data/Fonts/LatoBold.ttf')
arrow_props = dict(color='black', tail_width=0.05, linewidth=0.5, head_width=3, head_length=5, radius=0.2)

fig, ax = plt.subplots(dpi=300, figsize=(10,7))
ax.set_axis_off()

# before
color = '#335c67'
year_index = df[df['flag']].date.values[0]
before_df = df[df['date']<=year_index]
ax.plot(before_df['date'], before_df['pop_var'], color=color)
ax.fill_between(before_df['date'], before_df['pop_var'], alpha=0.3, color=color)
max_year = df[df['pop_var']==df['pop_var'].max()].date.values[0]
max_value = df[df['pop_var']==df['pop_var'].max()].pop_var.values[0]
ax.scatter(x=max_year, y=max_value, color=color, s=20)
# ax.text(x=max_year-7, y=max_value, s=f'+{max_value:.0f}', font=digit_font, size=8, color=color)
ax.text(x=max_year-7, y=max_value, s=f'+{max_value:.0f}', size=8, color=color)

# after
color = '#9e2a2b'
after_df = df[df['date']>=year_index]
ax.plot(after_df['date'], after_df['pop_var'], color=color)
ax.fill_between(after_df['date'], after_df['pop_var'], alpha=0.3, color=color)
min_year = df[df['pop_var']==df['pop_var'].min()].date.values[0]
min_value = df[df['pop_var']==df['pop_var'].min()].pop_var.values[0]
ax.scatter(x=min_year, y=min_value, color=color, s=20)
# ax.text(x=min_year-5, y=min_value-70000, s=f'{min_value:.0f}', font=digit_font, size=8, color=color)
ax.text(x=min_year-5, y=min_value-70000, s=f'{min_value:.0f}', size=8, color=color)

ax.plot([1952, 2024], [0,0], color='black', linewidth=0.6)
year_range = range(1960, 2021, 10)
for year in year_range:
#    ax.text(x=year+1, y=40000, s=f'{year}', font=boldfont, size=8, ha='center')
   ax.text(x=year+1, y=40000, s=f'{year}', size=8, ha='center')

s = "Japan's population is\nin <dramatic decline>."
# fig_text(x=0.45, y=0.8, s=s, font=font, highlight_textprops=[{'font':boldfont}], fontsize=25, ha='left', va='top')
fig_text(x=0.45, y=0.8, s=s, fontsize=25, ha='left', va='top')


s = "Population variation per year between 1952 and 2024"
# fig_text(x=0.45, y=0.68, s=s, font=font, fontsize=9.8, ha='left', va='top', alpha=0.5)
fig_text(x=0.45, y=0.68, s=s, fontsize=9.8, ha='left', va='top', alpha=0.5)

s = "<Graph>: barbierjoseph.com\n<Data>: macrotrends.net"
# ax_text(x=1952, y=-30000, s=s, font=font, fontsize=6, ha='left', highlight_textprops=[{'font':boldfont}]*2)
ax_text(x=1952, y=-30000, s=s, fontsize=6, ha='left')

s = "2010 is the first year to record\na <fall> in the population"
# ax_text(x=1978, y=-210000, s=s, font=font, fontsize=8, ha='left', highlight_textprops=[{'font':boldfont}])
ax_text(x=1978, y=-210000, s=s, fontsize=8, ha='left')
ax_arrow(tail_position=(1995, -300000), head_position=(2009, -50000), ax=ax, **arrow_props)

plt.savefig('../../web-area-chart-with-different-colors-for-positive-and-negative-values.png', dpi=300, bbox_inches='tight')
plt.show()

Outstandingly beautiful, isn't it?

## Now you :)

In [0]:
############################################ Your Turn ####################################################

movie_2_week_gross = pd.read_csv("../../Data/Movie Gross First 2 Weeks.csv")
movie_2_week_gross.head()

This table holds the daily profit of 3 historically important movies: **Avengers: Endgame**, **Deadpool & Wolverine**, and naturally, **Guardians of the Galaxy Vol. 3**.  
The data is curtesy of [https://www.boxofficemojo.com](https://www.boxofficemojo.com/), but please don't tell them we took it.

Some important columns here are:
- Date - the date in hand
- Movie - the movie name
- DOW - Day of Week
- Daily Gross in USD - Daily eraning for the movie on that specific date
- Day - Number of days in release
- Number of Theaters - In how many theaters was the movie played

And you can safely ignore the rest.

#### Your task
1. Explore the data, clean it, and make it ready for visualization.
2. Tell a visual story with this data.  
Write out a setup, a conflict, and a resolution and then sketch how you could present each part of the story.
3. Be proud and share your results with your peers!

(You can also work in groups of 2-3 people on this task)