# 5. Examples of Visual Analytics in Python

![](images/logo.png)

Hello and welcome to another session with the DataKirk. This week it's colour time! 

If you're working on the server at https://jupyterhub.thedatakirk.org.uk/ then all the relevant libraries (Matplotlib, Pandas etc) should already be installed and ready to use. However, if you're running the code on your own computer (which we do advise at this point as it will accelerate your learning!) then you'll need to make sure that you have installed them. This can be done by opening up a command prompt and typing 

```
pip install matplotlib pandas
```

To check whether you have these libraries installed, run the cell below. 

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

As long as you get no errors, you are good to go! Once the libraries are imported, it's also a good idea to run the line in the cell below:

In [3]:
%matplotlib notebook

The will make all Matplotlib plots interactive so you can pan, zoom and move around within the figure. Once you have run this cell you should see these options at the bottom of every plot:

![](images/mplnb.png)

If you don't see them when you begin plotting, try going back and running the cell above again. 

Here are some useful links for the session:

1. List of matplotlib colours: https://matplotlib.org/3.1.0/gallery/color/named_colors.html
2. Google colour picker: https://www.google.com/search?q=color+picker


This session will be a little different from the last few. Rather than give you questions and answers, it will be a bit more open-ended! I will just provide 3 different datasets and you can use the tools from the presentation to visualise the data any way you like. 

You don't have to explore all of these datasets - if you want to pick one to focus on that's fine. 


# 1. Coronavirus data UK

The first dataset contains information about the spread of coronavirus in Scotland, England, Wales and Northern Ireland. 

Potential analysis suggestions: 

1. To what extent are caseloads correlated amongst the four nations? 
2. What aspects are the same and different about the first and second wave? 
3. How are the different nations comparing in the fight against covid? 

Feel free to find additional data such as population numbers, or dates of lockdown. 

In [None]:
data_sco = pd.read_csv('data/covid/Corona_Scot.csv', index_col=0, parse_dates=True)

data_wal = pd.read_csv('data/covid/Corona_Wales.csv', index_col=0, parse_dates=True)

data_eng = pd.read_csv('data/covid/Corona_Eng.csv', index_col=0, parse_dates=True)

data_nir = pd.read_csv('data/covid/Corona_NI.csv', index_col=0, parse_dates=True)

# 2. Stock price data

The second dataset contains stock price information since 2012 for the largest 20 companies listed in the United Kingdom:

Potential analysis suggestions: 

1. How has coronavirus affected the stock market in the UK?
2. How correlated are stocks in the UK?
3. Which stocks are the biggest winners and the biggest losers in the last 8 years?

In [None]:
price_data = pd.read_csv('data/stocks/FTSE_stock_prices.csv', index_col='Date', parse_dates=True)

company_info = pd.read_csv('data/stocks/companies.csv')

# 3. Income, Inequality and Environment

The final dataset contains annual data for GDP, inequality and carbon emissions for 192 countries around the world. 

Potential analysis suggestions: 

1. What is the relation between GDP and carbon emissions?
2. What trends in time can you identify?
3. Is there a relation between carbon emissions and inequality? 

Helpful hint: 

Sometimes is can be helpful to set the scale of an axis to logarithmic - this can be done by calling 

```python
plt.xscale('log')
```

or 

```python
plt.yscale('log')
```

In [5]:
population = pd.read_csv('data/national/population.csv', index_col=0, parse_dates=True)

co2_per_person = pd.read_csv('data/national/co2_emissions_tonnes_per_person.csv', index_col=0, parse_dates=True)

gdp_per_cap = pd.read_csv('data/national/gdppercapita_us_inflation_adjusted.csv', index_col=0, parse_dates=True)

inequality_metric = pd.read_csv('data/national/gini.csv', index_col=0, parse_dates=True)