<img src=images/gdd-logo.png align=right width=300px>

# People: Adding Characters that Propel the Story

When we talk about people in a novel, we're often talking about humans, but they could be anything. For example the house in The Shining, or the rose in Beauty and the Beast.

---

<font size=5 color='blue' align=center>P is for People</font>

The 3 Ps of storytelling: Place, People, Purpose.

---

Who, then, are the characters in our data stories? Think about these as what exist around the graph... what is there to tell the story when we (the narrator) are not around to do it?

Throughout this notebook we will focus on our characters.


- [Units and Ranges](#units)
- [Titles](#titles)
    - [<mark>Exercise: Pick a title</mark>](#ex-title)
- [Colours](#colours)
    - [<mark>Exercise: Branding</mark>](#ex-branding)
- [Draw the eye of the reader](#drawing)
- [<mark>Exercise: Putting it all together</mark>](#ex-all)



In [None]:
import pandas as pd
import matplotlib.pyplot as plt

<a id='units'></a>

----
## Units and Ranges

Although it may seem obvious, the limits for ranges can drastically change the story. For example if we look at the following data:

In [None]:
election_full = pd.read_csv('data/election_full.csv', index_col='candidate')
election_full

And say we want to zoom in on the race between Biden and Trump:

In [None]:
(
    election_full
    .head(2).div(1000)
    .plot(kind='bar',
          ylim=[72_000,83_000], rot=0,
          title='Joe Biden wins by landslide \nin 2020 US Election')
)

If we edit the y limit, does the title of the chart still make sense?

In [None]:
(
    election_full
    .head(2)
    .div(1000)
    .plot(kind='bar',
          ylim=[0,90_000], rot=0, ylabel='Votes in 1000s',
          title='Joe Biden wins by landslide \nin 2020 US Election')
);

**<font color='blue'>Note:</font>** Titles may not be the first thing you notice about a chart but can have a great deal of impact when telling a story. We will revisit titles but it's a good point to observe that even though the above chart no longer misleads us in visuals, the wording can still overwrite our initial thoughts.

**Percentages vs actual figures**

How about we use percentages instead? Percentages generally are a lot closer in numbers, and therefore make figures appear closer:

In [None]:
(
    election_full
    .div(election_full.sum()/100)
    .head(2)
    .plot(kind='bar',
          rot=0, ylim=[0,100],
          title='Couldn\'t be closer in 2020 US Election')
);

Even now we can change the choice of graph. A **stack bar chart** is great for demonstrating the total of multiple categories. Great if we also want to know what percentage **didn't** vote Biden/Trump:

In [None]:
ax = (
    election_full
    .div(election_full.sum()/100)
    .head(2).T.rename({'total_votes':''})
    .plot(kind='barh',
          rot=0, stacked=True, xticks=[0,50,100],
          title='Couldn\'t be closer in 2020 US Election')
)
ax.grid(axis='x')

---

<a id='titles'></a>

## Titles

Titles may be the first thing you want your audience to see. Let's load in the bikes data to demonstrate.

In [None]:
bikes = pd.read_csv('data/plots/bikes-by-month.csv', index_col=['Month'])
bikes.head()

They can drastically change the message of the chart:

In [None]:
ax = (
    bikes
    .plot(figsize=(12,6), lw=2, color='orange')
)
ax.set_title('Bike sales increased significantly compared to last year!', 
             weight='bold', size=20, color='darkgreen')

In [None]:
ax = (
    bikes
    .plot(figsize=(12,6), lw=2, color='orange')
)
ax.set_title('Bike sales STILL affected by seasonal lows!', 
             weight='bold', size=20, color='#c00')

**Note:** While a great tool, we have to care when coming up with our titles as they can drastically distort the key message of the data and use leading words that might not match up with what the data is actually displaying.

---

<a id='ex-title'></a>

## <mark>Exercise: Pick a suitable title</mark>

Explore the `chickweight` data and understand what the following graph is telling you:

In [None]:
chickweight = pd.read_csv('data/exercises/chickweight.csv')

In [None]:
# Plot graph

def plot_chickweight(df=chickweight, title='',color=None):
    (
        df
        .groupby(['days','diet'])['weight'].mean()
        .unstack()
        .plot(title=title, color=color)
    )
plot_chickweight(title='')

For the following graph, come up with three titles that are:

- Neutral 

In [None]:
plot_chickweight(title='')

- Positive

In [None]:
plot_chickweight(title='')

- Negative

In [None]:
plot_chickweight(title='')

Finished? Now try changing the colours to emphasize your title.

In [None]:
colors = ['r','g','r','g']

plot_chickweight(title='', color=colors)

---
<a id='colours'></a>

## Colours

Colouring can aid in storytelling. Take the above example, calm green for a good message, emergency red for a negative message. 

### Text Colours

Take care when changing colours of text! Text should:
- Be easy to read
- Not take all the attention from the graphic (otherwise, why have the graphic at all?)
- Match the theme of the graph (formal, whimsical)

### Default colours vs customised

The default colours in matplotlib, altair and dashboard design tools like Tableau are often chosen as these are the best combination of colours for those with colorbindness.

This doesn't mean you can't change them, just ensure you are not sacrificing readability for your own personal taste.

Shading (light/dark) can be really helpful when picking colours and showing contrast:

In [None]:
programming = (
    pd.read_csv('data/programming-trends.csv', index_col=[0], parse_dates=[0])
    .loc[:, lambda df: df.columns[::-1]]
)

programming.plot(color=['lightblue','lightblue','lightblue', '#306998'], 
                 title='Python becomes most Googled Programming Language');

<img src=images/python-logo.png align=right style='padding:20px'>

### Branding

Does your chart conform to brand? Above, the colour #306998 was chosen to match the Python colour use in its logo.



There are many websites that can help with colours and branding:

- [Simple hue picker](https://hslpicker.com/)
- [Generate key colours from an image](https://imagecolorpicker.com/)
- [Grab images with chosen colour palette](https://labs.tineye.com/)

<a id='ex-branding'></a>

## <mark>Exercise: Branding</mark>

There are two parts to this exercse:

1. Get the four main colours of your companies' logo 
2. Use these colours to find up to 5 suitable photos to use in a presentation

**Bonus:** Change the above cell to markdown and include them in the cell.

Eg. <font color=#306998>This blue is used in the Python</font>

<a id='drawing'></a>

---

## Draw the eye of the reader

There is a certain visual hierarchy when we view something, an image, out the window, a graphic. We can control that hierarchy and ensure our viewers attention goes to the right place.

Some graphs, it's obvious where the eye will be drawn to:

In [None]:
schiphol = pd.read_csv('data/schiphol-passengers.csv', index_col='date', parse_dates=['date'])
(
    schiphol
    .loc['2009':]
    .plot()
);

But in some cases it might not be immediately obvious... 

In [None]:
lego = pd.read_csv('data/lego.csv')['year'].value_counts().sort_index().loc[:'2020']
ax = lego.plot(title='Number of sets produced by Lego')

Let's add a focal point for when Jorgen became the CEO and started to shake things up:

In [None]:
from collections import namedtuple

Focus = namedtuple('Key', ['year', 'info'])
focus = Focus(2001, 'Jorgen Vig Knudstorp \nbecomes CEO')

Simply we could add a **vertical line** and **text**:

In [None]:
ax = lego.plot(title='Number of sets produced by Lego')

ax.axvline(focus.year, color='k', ls='--', lw='1')
ax.text(focus.year*1.001, 200, focus.info);

Or add some **color**:

In [None]:
lego.loc[:focus.year].plot()
lego.loc[focus.year:].plot(
    title='Lego ramps up number of sets after \nJorgen Vig Knudstorp joins as CEO'
);

With a simple addition of a vertical line or change in colour, the eye is now drawn to the key message of the story.

Of course there are more elements than just **text** or **color** that we could add to a chart, but these are usually enough to get the point accross. We don't want to add too many characters in case we lose the audience!

<a id='ex-all'></a>

---

## <mark>Assignment: Putting it all together</mark>

<img src=images/friendzy.png align=right width=200px>

It's Friday afternoon, and you receive the following email from your dear stakeholder:

    Dear Data Storyteller,

    I've just received last years' revenue for our all areas of the company. Looks like we have some good revenues! 
    Last year our average revenue was 10% lower so anything above that is great in my opinion! And what good news for our new companies: Josie Saws, Golden Kez Topic and Samuelifant Graveyard!
    
    Could you prepare a graph to let the big cheeses know our good news and also how we can improve for next year? 

    Thanks!

    Love from your dear stakeholder
    
    InAFrendzy Inc.
    


Here is the data:

In [None]:
sales = pd.read_csv('data/exercises/sales-data.csv')

The stakeholder has given it a go and here's the chart they have created. Let's make this a little bit better shall we?

### <mark>Improve the chart!</mark>

In [None]:
sales.plot(x='Company', y='Sales', rot=0, figsize=(12,7), title='Sales')

**Struggling where to start?** Follow the following steps to improve the chart...

1. Isn't a bar chart a better way to show discrete information??
2. I prefer to read things horizontally
3. Didn't your stakeholder say the average for last year is 10% lower than what we see here? What is that value?
4. Wouldn't it be great if we could show which bars were greater than our benchmark?
5. Oh dear stakeholder... nice try on the title... and where are your labels?
6. Hmm this doesn't really fit in with our company logo... can we change that?

**Answers**: Uncomment the following and run to see a *potential solution* in matplotlib:

In [None]:
# %load answers/putting-it-all-together.py

---
<img src=images/conclusion.png align=right>
          
## Conclusion

With the People P, really we are consider how can the chart by itsself tell a story? What elements can I add to allow the chart to tell a story without the need for someone to explain what is happening?

Key characters include:

- Units & ranges
- Labelling of axes
- Title
- Colouring
- Any addition lines or points

Always ask yourself: **What is the first character my audience will be introduced to in my plot?**

In other words: **What do I want the audience to see & conclude first from my plot?**

