# 6. Advanced Visualisation tools

![](images/logo.png)

Hello and welcome to session 6 of the Visual Analytics with Python and Power BI, hosted by the DataKirk. Once again, this session is about visualisation tools available in Python, with the module Matplotlib. 

If you're working on the server at https://jupyterhub.thedatakirk.org.uk/ then all the relevant libraries (Matplotlib, Pandas etc) should already be installed and ready to use. However, if you're running the code on your own computer (which we do advise at this point as it will accelerate your learning!) then you'll need to make sure that you have installed them. This can be done by opening up a command prompt and typing 

```
pip install matplotlib pandas
```

To check whether you have these libraries installed, run the cell below.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

As long as you get no errors, you are good to go! Once the libraries are imported, it's also a good idea to run the line in the cell below:

In [2]:
%matplotlib notebook

The will make all Matplotlib plots interactive so you can pan, zoom and move around within the figure. Once you have run this cell you should see these options at the bottom of every plot:

![](images/mplnb.png)

If you don't see them when you begin plotting, try going back and running the cell above again. 

Here are some useful links for the session:

1. List of matplotlib colours: https://matplotlib.org/3.1.0/gallery/color/named_colors.html
2. Google colour picker: https://www.google.com/search?q=color+picker


Like last time, this notebook is more or less a blank canvas for you to experiment with different visualisations. Here, you will find the code to load in various different pandas dataframes. It is you decision how you would like to proceed from there. Some tips:

1. Always start each new visualisation with `plt.figure()`
2. To plot a single column against the dataframe index, try `plt.plot(df.index, df['column name])`. 
3. Two columns can be scattered against each other via `plt.scatter(df['col 1'], df['col 2'])`


# 1. Coronavirus data UK

The first dataset contains information about the spread of coronavirus in Scotland, England, Wales and Northern Ireland. 

Potential analysis suggestions: 

1. To what extent are caseloads correlated amongst the four nations? 
2. What aspects are the same and different about the first and second wave? 
3. How are the different nations comparing in the fight against covid? 

In [4]:
data_sco = pd.read_csv('../5. Examples of Visual Analytics in Python/data/covid/Corona_Scot.csv', index_col=0, parse_dates=True)

data_wal = pd.read_csv('../5. Examples of Visual Analytics in Python/data/covid/Corona_Wales.csv', index_col=0, parse_dates=True)

data_eng = pd.read_csv('../5. Examples of Visual Analytics in Python/data/covid/Corona_Eng.csv', index_col=0, parse_dates=True)

data_nir = pd.read_csv('../5. Examples of Visual Analytics in Python/data/covid/Corona_NI.csv', index_col=0, parse_dates=True)

# 2. Stock price data

The second dataset contains stock price information since 2012 for the largest 20 companies listed in the United Kingdom:

Potential analysis suggestions: 

1. How has coronavirus affected the stock market in the UK?
2. How correlated are stocks in the UK?
3. Which stocks are the biggest winners and the biggest losers in the last 8 years?

In [7]:
price_data = pd.read_csv('../5. Examples of Visual Analytics in Python/data/stocks/FTSE_stock_prices.csv', index_col='Date', parse_dates=True)

company_info = pd.read_csv('../5. Examples of Visual Analytics in Python/data/stocks/companies.csv')

# 3. Income, Inequality and Environment

The third dataset contains annual data for GDP, inequality and carbon emissions for 192 countries around the world. 

Potential analysis suggestions: 

1. What is the relation between GDP and carbon emissions?
2. What trends in time can you identify?
3. Is there a relation between carbon emissions and inequality? 

Helpful hint: 

Sometimes is can be helpful to set the scale of an axis to logarithmic - this can be done by calling 

```python
plt.xscale('log')
```

or 

```python
plt.yscale('log')
```

In [19]:
population = pd.read_csv('../5. Examples of Visual Analytics in Python/data/national/population.csv', index_col=0, parse_dates=True)

co2_per_cap = pd.read_csv('../5. Examples of Visual Analytics in Python/data/national/co2_emissions_tonnes_per_person.csv', index_col=0, parse_dates=True)

gdp_per_cap = pd.read_csv('../5. Examples of Visual Analytics in Python/data/national/gdppercapita_us_inflation_adjusted.csv', index_col=0, parse_dates=True)

inequality_metric = pd.read_csv('../5. Examples of Visual Analytics in Python/data/national/gini.csv', index_col=0, parse_dates=True)

## UK Geographical Data

The final dataset contains the elevation profile of the UK and the coordinates of around 50 of the most populated cities. 

Potential analysis:

1. Can you use `imshow` to view the elevation profile?
2. Where are the population centres of the UK?
3. Can you scatter the cities over the elevation profile?

In [17]:
cities = pd.read_csv('data/UK_cities.csv', index_col=0)
elevation = pd.read_csv('data/UK_elevation.csv', index_col=0)