### GESIS Fall Seminar in Computational Social Science 2022
### Introduction to Computational Social Science with Python
# Day 5-1: Basics of Visualisation

## Overview

* Understanding plot elements
* Choosing the right chart
* Principles of colour
* Approaches going forward

## Understanding plot elements
* Title, subtitle
* Annotations
* Caption
* Axes
* Gridlines
* Data
    - Lines, points, areas, colours, associated errors
* Legend

# TODO: Figure(s) here

### Title, subtitle
* A simple description of the figure.
* Subtitles can be used for extra detail, or to describe different subsets/facts of the data
* Try to avoid repeating information in the axes labels/annotations.

### Caption
* A longer textual description of the figure.
* The figure + should be fully interpredtable if removed from the article they are in.

### Annotations
* Short annotations can be used to emphasise key parts of a plot.
* Anything more than a few words should be included in the figure subtitle/caption.

### Axes
* The spatial dimensions and scales used to visualise data.
* Typically X axis (horizontal) and Y axis (vertical), but more possible (Z, polar).
* Should be clearly labelled with name, units, and numeric scale.
* Be aware of different axis scales: linear, inverted, logarithmic, ...

### Gridlines
* Lines on a plot indicating specific axis values.
* Help to make the plot more readable, viewers can more easily compare values against gridlines than the axis.

### Data
* The visual representation of the data.
* Many different forms of visualisation.
* Ensure data is appropriately scaled, all datapoints are clearly represented.

### Legend
* A part of the plot used to describe different groups of data, typically represented by different colour/size/shape.
* A legend should be used when different groups/series of data are used.
* A legend should not obscure (important) data on the plot.

## Choosing the right chart
* Data
* Audience
* Message

### Data
* The style(s) of visualisation available to use is clearly dependent on the data at hand.
    - Relational, distributional, comparisonal, compositional, ... (and combinations thereof)
* Sometimes different options are available, not all of them good options!
* The [Data Visualisation Catalogue](https://datavizcatalogue.com/) and [Data Viz Project](https://datavizproject.com/) are excellent libraries for different kinds of visualisations.

### Audience
* Yourself - exploratory data analysis
    - Quick and easy
    - Explore all facets of data (extremes of data, errors, transformations)
* Academic papers
    - The "best" version of your figure - optimising information density and simplicity for expert reader
    - Figure with caption should stand on its own
    - Detailed reference, interpretation, and discussion of figures in body of paper
* Academic presentations
    - BIG labels
    - Emphasise key points (figure can be dynamic)
* Media / public 
    - Jargon removed
    - Instructive labels often required 
    - Nuance/technicality sometimes lost at expense of conveying key point

### Message
* What is the point trying to be conveyed by the visualisation?
* Is it self evident from the figure, or can it be supported with labels/captions?
* How can scale, colour, shape, etc be used to emphasise particular points?

## Principles of colour
* Palettes
* Hue
* Luminance
* Colourblindness
# TODO

### Palettes

### Hue

### Luminance

### Colourblindness

## Approaches going forward
* Best practices
* Dataviz crimes(?)

### Best practices
#### Do
* Title and label all figures, axes, scales
* Include full axis scales
* Include indications of error/uncertainty where appropriate
* Ensure colours are differentiable (think about colourblindness, B&W print)
* Ensure colours are representative of the subject if applicable (where possible)
* Maximise "data-ink ratio"
* Follow norms and conventions

#### Don't
* Omit data without good reason and communication to viewer
* Use extra dimensions like colour, shape if they do not represent anything
* Rely on comparisons of area, humans are not good at it
* Overcrowd a single plot with data - facets can be useful
* Use static 3D figures in a 2D medium
* Waste time perfecting your figures before finishing the analysis
* Deliberately mislead - often the "worst" graphics are produced by those with best understanding of visualisation

#### Be flexible
* Sometimes rules need to be broken e.g.:
    - Principles of differentiable / representative colours are sometimes incompatible.
    - Starting axes at 0 is not always best, especially if emphasising consequential absolute changes that are small in relative terms (e.g. global temperature changes)
    - Understand your audience, ask someone to preview your figure. If they don't understand it, find out why.

<!-- #### Keep learning
* Active communities focussed on data visualisation
    -
    - 
    - Be wary of "gurus" constantly selling courses -->
    



### Dataviz crimes(?)

#### Non-representative visualisations
![GE1997](figs/GE1997.svg "GE1997")
UK general election results 1997. This was a landslide majority victory for Labour (red), but the area of colour on the figure does not convey the win by number of constituencies.

#### Uninterpretable plots
![pie](figs/badpie.png "pie")
Pie charts are bad. 3D pie charts are worse. Exploded 3D pie charts are unforgivable. 


#### Misleading figures
Purdue Pharma promoted their drug Oxycontin in the US using the figure below. It supposedly illustrates that their long-acting opioid doesn't produce the highs and lows of short-acting opioids, so is less addictive. What is wrong with it?

![Oxycontin1](figs/oxy1.jpg "Oxycontin1")

Not only are the y axes not labelled, but the plot for their long-acting opioid data is based on a figure using a *logarithmic* y-axis - compressing the top of the curve. The true data is below.

![Oxycontin2](figs/oxy2.png "Oxycontin2")

This criminal visualisation contributed to a guilty verdict in a 2007 legal case which cost the company $600million in fines. The company and Sackler family have since paid billions further in related settlements.

#### Embracing 'bad' dataviz
[Billionaire wealth to scale](https://mkorostoff.github.io/1-pixel-wealth/)
* Our data isn't all presented, it takes significant scrolling to view the entire figure.
* This is a feature, not a bug - demonstrates enormous billionaire wealth. 
* As such, the figure is well designed for the audience and web format.