# Good Plots Bad Plots
### What you gonna do, what you gonna do?

Let's explore some data visualisation! 
This project will ask you use data visualisation libraries in order to present the included data as well as possible.
Different types of data will be included, requiring different styles of plots. 
It's better to make one plot well than to make many bad plots, so focus on presenting the information as clearly as possible.

There are many useful plotting libraries out there. 
As scientists, we're (hopefully) very familiar with matplotlib, but there are other useful libraries such as seaborn, 
bokeh, plotly and holoviews. 
Feel free to explore these libraries and more when creating plots for this exersize.

In [1]:
import numpy as np
import pandas as pd

## Simple plotting
These first examples will help us explore how to use different libraries to clearly and aesthetically present data.

### 1D plots
We often use line plots to represent time series data.
While seemingly straightforward, it becomes complicated to represent multiple related series in a coherent and informative manner.
In this exercise we're going to compare measurements of air pollution over time in different districts in Seoul, South Korea.
This is found in the seoul_air_pollution.csv.
Each district is identified with a unique Station code, and the name is given in the Address column.
How many ways can you come up with to plot the data? Which ways are most informative? 
What relationships can you find within the data?

To start, plot the PM10 (10um particle count) for each district as a function of time. 
How is it best to visualise this many datasets?
Compare the PM2.5 data to the PM10 data for at least two of the districts. 

In [2]:
df = pd.read_csv("data/seoul_air_pollution.csv")
df.tail()

Unnamed: 0,Measurement date,Station code,Address,Latitude,Longitude,SO2,NO2,O3,CO,PM10,PM2.5
647506,2019-12-31 19:00,125,"59, Gucheonmyeon-ro 42-gil, Gangdong-gu, Seoul...",37.544962,127.136792,0.003,0.028,0.013,0.5,23.0,17.0
647507,2019-12-31 20:00,125,"59, Gucheonmyeon-ro 42-gil, Gangdong-gu, Seoul...",37.544962,127.136792,0.003,0.025,0.015,0.4,25.0,19.0
647508,2019-12-31 21:00,125,"59, Gucheonmyeon-ro 42-gil, Gangdong-gu, Seoul...",37.544962,127.136792,0.003,0.023,0.015,0.4,24.0,17.0
647509,2019-12-31 22:00,125,"59, Gucheonmyeon-ro 42-gil, Gangdong-gu, Seoul...",37.544962,127.136792,0.003,0.04,0.004,0.5,25.0,18.0
647510,2019-12-31 23:00,125,"59, Gucheonmyeon-ro 42-gil, Gangdong-gu, Seoul...",37.544962,127.136792,0.003,0.037,0.005,0.5,27.0,18.0


### Histograms
Histograms are very commonly used to present statistical data. 
How would you visualise the x and y distributions here?

In [3]:
x,y = np.genfromtxt("data/2d-data.dat", delimiter = ',').T

Let's try with multiple datasets. There are 12 distributions of brain signals that need to be compared.
What are good ways of plotting multiple histograms? 
Are there correlations between any of the networks?

In [4]:
import seaborn as sns
# Load the example dataset of brain network correlations
brain_networks = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)
# Pull out a specific subset of networks
used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (brain_networks.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
brain_networks = brain_networks.loc[:, used_columns]


### Analyzing a dataset
This dataset is taken from the London marathon, and includes columns for the age, gender, split time and final time for each runner.
What are some questions that we can explore from this dataset, and how can you present that information?

In [5]:
marathon_data = pd.read_csv("data/marathon-data.dat")
marathon_data.tail()

Unnamed: 0,age,gender,split,final
37245,18,M,04:24:24,09:32:57
37246,36,M,04:35:43,09:33:28
37247,51,M,04:22:35,09:33:40
37248,55,W,04:58:06,10:00:40
37249,58,W,04:59:49,10:01:08


## Interactive plots
Tutorials and websites are valuable tools for learning how to interpret data, and interactive plots provide a way for users to visualise more complicated datasets.
In this case, we'll create an interactive plot of a basic quadratic function using the ipywidgets library.
Find out how to create a plot that will let you interactively adjust the A, B and C parameters of the function!


In [14]:
from ipywidgets import interact
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

x = np.linspace(-10, 10,100)
def f(x, A, B, C):
    return A*x**2 + B*x + C


fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.set_ylim(-10,10)
line, = ax.plot(x, f(x, A=1, B=1, C=1))




<IPython.core.display.Javascript object>