<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Principles of Data Visualisation With Python

---

# Agenda

#### Data Visualisation: Communication & Exploration
#### Mathplotlib coding runthrough (quick!!)
#### Titanic dataset EDA (also quick!!!)
#### BREAK
#### Additional Seaborn & Plotly tutorials - peruse at own leisure
#### Coding Practice: Matplotlib, Election Data EDA. (Optional Seaborn exercises)

#### Navigation

[Part 1: Data Visualisation for Communication](#part1)

[Part 2: Data Visualisation for Exploration](#part2)

# Why do we bother visualising data?

### Why do we bother visualising data?

- human brains are wired to process more information visually

- a universal way to convey information

- attractive to look at

- often the best way to describe/analyse data **as the amount of data increases**

### Purposes of Data Visualisation

#### Communication

#### Exploration

### Learning Objectives

#### Part 1

- Describe why data visualisation is important.
- Identify the characteristics of a great data visualisation.

<a id="part1"> </a>

# Data Visualisation for Communication

### Bad Data Visualisation

<table style="border-width:0px">
    <tr style="border-width:0px"><td style="border-width:0px"><img src="assets/images/bad_datavis_1.png" /></td><td style="border-width:0px"><img src="assets/images/bad_datavis_2.png" /></td></tr>
    <tr style="border-width:0px"><td style="border-width:0px"><img src="assets/images/bad_datavis_3.png" /></td><td style="border-width:0px"><img src="assets/images/bad_datavis_4.png" /></td></tr>
</table>

![](assets/images/worst_piechart.jpg)

<img src="assets/images/hotdogs.jpg" style="width:60%" />

# Good DataVis?

![](assets/images/economist-map.JPG)

For good examples of DataVis you can try:

- [Information is Beautiful](https://www.informationisbeautifulawards.com)
- [FlowingData](http://flowingdata.com)

## What are some visual attributes that we can use to visualise data?

i.e. how can we visually convey the difference between two things that are different in our data?

Let's take a look at what Jeffrey Shaffer, who teaches data visualisation at the University of Cincinnati, thinks:

![](assets/images/data%20attributes.png)

Which ones do you think are easier/harder for humans to perceive?

Interestingly, some attributes have more of an effect on our brains than others. The ones we tend to focus on most are position, then color, then size.

# Colour

Generally, in data visualisations, you’re going to use colour in one of **three** ways.

## Sequential

Sequential colours are used to show values ordered from low to high.

<img src="assets/images/sequential.png" style="width:55%" />

Which of the types of data (nominal, ordinal, interval, ratio) would this be suitable for?

## Divergent

Divergent colours are used to show ordered values that have a critical midpoint, like an average or zero.

<img src="assets/images/divergent.png" style="width: 45%" />

Which of the types of data (nominal, ordinal, interval, ratio) would this be suitable for?

## Categorical

Categorical colours are used to distinguish data that falls into distinct groups.

<img src="assets/images/categorical.png" style="width: 50%" />

Which of the types of data (nominal, ordinal, interval, ratio) would this be suitable for?

<a id="part2"> </a>

# Data Visualisation for Exploration

### Learning Objectives
- Describe when you would use a bar chart, pie chart, line chart, and scatter plot
- Practise creating plots of your data using `matplotlib` & `seaborn`

<a id='anscombe'></a>

Below are the summary statistics for four plots. What do you think the visualisation for each plot would look like? 

![summary statistics for four different plots](assets/images/anscombe_dataset.png)

### Anscombe's Quartet

You can probably already guess what the answer is: although the four plots have the same summary statistics, they are actually completely different. 

This can be seen when we visualize them together. 



<img src="assets/images/anscombe.png" style="width:70%" />

These descriptive statistics come from a data set constructed in 1973 by the statistician Francis Anscombe. It is a classic demonstration of the importance of data visualization.

- It highlights the failures of summary statistics.

- It shows the effect of outliers on statistical properties.

- Anscombe's intention was to attack the impression among statisticians that "numerical calculations are exact, but graphs are rough."

<a id='chart_choice'></a>

# Choosing the Right Chart

### Bar Charts

Bar charts make it easy to compare information, revealing highs and lows quickly

Bar charts are most effective when you have numerical data that splits neatly into different categories

![](./assets/images/bar%20chart.png)

### Pie Charts

Pie charts are only useful to show relative proportions or percentages of information, but are both **overused** and **misused**.

After 2-3 slices pies become useless. 

Best to avoid them entirely.

### The Best Use of a Pie Chart

![](http://i.imgur.com/uhTf6Ek.jpg)

### Scatter Plots

Scatter plots are a great way to give you a sense of trends, concentrations, and outliers.

![](./assets/images/scatter%20plot.png)
[Scatter plot via Wikibooks](https://en.wikibooks.org/wiki/Statistics/Displaying_Data/Scatter_Graphs)

### Line charts

Line charts are used for when there's a temporal element to your data.

![](assets/images/xkcd-chart.png)

[xkcd #418](https://xkcd.com/418)

### Histograms 

Histograms are useful when you want to see how your data are distributed across groups.

![](./assets/images/histogram%20chart.png)


There isn't a better or worse chart type, overall. 

(Except for pie charts.... they're just the worst).

You should consider which one is most appropriate for representing a particular data set.

[Which chart is right for you? (via Tableau)](https://drive.google.com/file/d/0Bx2SHQGVqWasT1l4NWtLclJJcWM/view)

## Visualisation Programming Libraries

In this course, we will mostly use the Python library [Matplotlib](https://matplotlib.org/) but also see examples of using [Seaborn](https://seaborn.pydata.org/) which has some additional plots and options.

Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames.

Matplotlib has recently improved - but Seaborn still has better pandas integration, and is very customisable.

Many other Python libraries exist for making visualisations. Some of the most popular include:

- **[Bokeh](http://bokeh.pydata.org/en/latest/):** Python visualisation library that targets the web browser (e.g., in Jupyter). Makes interactive plots, dashboards, data applications, etc.

- **[Plotly](https://plot.ly/):** Python visualisation library, similar to Bokeh. We include a workthrough of interactive plotting using Plotly in notebook 5.

- **[Graphviz](http://graphviz.readthedocs.io/en/stable/manual.html):** Popular visualization library for graph data structures (e.g., edges, vertices, etc). Has Python extensions.

- **[Basemap](http://matplotlib.org/basemap/):** Python Matplotlib extension for drawing static maps. There are many other Python libraries for plotting geographic data, including [folium](https://github.com/python-visualization/folium).

One of the most popular libraries for interactive visualizations in the web browser is D3. Because web browsers only natively run JavaScript, D3 requires knowledge of JavaScript:

- **[D3.js](https://d3js.org/):** JavaScript library for interactive web visualizations [D3.js](https://d3js.org/) | [Examples](https://github.com/mbostock/d3/wiki/Gallery)

### Other Visualization Tools

Although this course emphasizes a Python approach to data science, a variety of non-programming tools are also used in industry. Often, these tools can be applied much more quickly than creating a custom Python solution. For example:

- **Excel:** For quick data cleaning and simple graphs
- **Power BI:** A suite of business analytics tools
- **Tableau:** Business intelligence and analytics software
- **Periscope Data:** Data analysis platform

