> **Jupyter slideshow:** This notebook can be displayed as slides. To view it as a slideshow in your browser, type the following in the console:


> `> ipython nbconvert [this_notebook.ipynb] --to slides --post serve`


> To toggle off the slideshow cell formatting, click the `CellToolbar` button, then `View --> Cell Toolbar --> None`.

<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Principles of Data Visualization With Python

_Author:_ Dave Yerrington (San Francisco)

### Learning Objectives

*After this lesson, you will be able to:*
- Describe why data visualization is important.
- Identify the characteristics of a great data visualization.
- Describe when you would use a bar chart, pie chart, scatter plot, and histogram.

<a id='why_data_viz'></a>

### Discussion: Why Use Data Visualization?

---

On Slack, discuss some of the ways you have used or enjoyed data visualization. Why do you think data visualization is useful? Why is it important?

Humans are much better at spotting patterns in a visual display than in a table of numbers. 

<a id='anscombe'></a>

### Anscombe's Quartet

---

Below are the summary statistics for four plots. What do you think the visualization for each plot would look like? 

![summary statistics for four different plots](../assets/images/anscombs%20quartet.png)

![anscomb's quartet](../assets/images/anscombs%20quartert%20visualization.png)

Lessons:

- Summary statistics don't tell the whole story.
- Outliers have a big effect on statistical properties.
- Visualization is not just about making pretty pictures. It's also about learning how your data is structured so that you model is properly.

<a id='viz_attr'></a>

### Mapping Variables to Attributes

---

![](../assets/images/data%20attributes.png)

We tend to focus on position, then color, then size.

When we visualize data, we are mapping values of variables to visual attributes. Here are some examples of visual attributes.

![sequential](../assets/images/sequential.png)

Let's focus on color for a moment. Generally, in data visualizations, you’re going to use color in one of three ways: sequential, divergent, or categorical. 

Sequential colors are used to show values ordered from low to high.

![divergent](../assets/images/divergent.png)

Divergent colors are used to show ordered values that have a critical midpoint, like an average or zero.

![categorical](../assets/images/categorical.png)

[Images via MediaShift](http://mediashift.org/2016/02/checklist-does-your-data-visualization-say-what-you-think-it-says/)

Categorical colors are used to distinguish data that falls into distinct groups.

<a id='chart_choice'></a>

### Choosing the Right Chart

![](http://www.comicsenglish.com/wp-content/uploads/2013/06/xkcd-stove_ownership.png)


In addition to considering data visualization attributes, you should also carefully choose the type of chart or graph you'll use. Let's look at a few commonly used charts and graphs.

### Bar Charts

![](https://lenagroeger.s3.amazonaws.com/cuny-fall15/MakeThisChart/nytimes-bar-chart.jpg)

Bar charts are great because they make it easy to compare across categories.

### Pie Charts

![](http://i.imgur.com/uhTf6Ek.jpg)

Pie charts are kind of the comic sans of data visualization.

They are fine when you just want to show that some piece is a very large or small part of some whole, but even then you are probably better off using a bar chart. If you want to make a lot of comparisons across a lot of categories, then you have to use a bar chart.

Never ask people to make comparisons across multiple pie charts. people are bad at judging angles.

![](../assets/images/scatter%20plot.png)
[Scatter plot via Wikibooks](https://en.wikibooks.org/wiki/Statistics/Displaying_Data/Scatter_Graphs)

Scatter plots are a great way to give you a sense of trends, concentrations, and outliers. This will provide a clear idea of what you may want to investigate further. 

### Histograms 

![](../assets/images/histogram%20chart.png)

[Charts and graphs via Tableau](https://drive.google.com/file/d/0Bx2SHQGVqWasT1l4NWtLclJJcWM/view)

Histograms are useful when you want to see how your data are distributed across groups.

A histogram is a plot of frequency against a binned continuous variable, whereas a bar plot is a plot of frequency against a categorical variable.

<a id='visualization_libraries'></a>

### Visualization Programming Libraries

- **[Matplotlib](https://matplotlib.org/)**
    - Produces publication-quality static plots
    - Powerful
    - A little clunky
- **[Seaborn](https://seaborn.pydata.org/)**
    - Extends matplotlib
    - Provides modern themes
    - Provides convenience functions for statistical data visualization
- **[Bokeh](http://bokeh.pydata.org/en/latest/)**
    - Provides interactive plots for the web
    - Has a nicer, more modern interface
- **[Graphviz](http://graphviz.readthedocs.io/en/stable/manual.html)**
    - Visualizes graph data structures (e.g., edges, vertices, etc)
- **[Basemap](http://matplotlib.org/basemap/)**
    - Produces static maps
- **[D3.js](https://d3js.org/)**
    - Used by fivethirtyeight, NYTimes, and others to create stunning, high-performance, interactive visualizations for the web (e.g. [Visualizing Algorithms](https://bost.ocks.org/mike/algorithms/))
    - Low-level Javascript library


### Other Visualization Tools

- **Excel:** For quick data cleaning and simple graphs
- **Power BI:** Suite of business analytics tools
- **Tableau:** Business intelligence and analytics software
- **Periscope Data:** Data analysis platform
- **Plotly:** Create charts and dashboards