<a id='menu'></a>
<hr style="width:100% ; height:2px ; border-width:0 ; color:gray ; background-color:#003d59 ; opacity:1">

 ![logos](../images/la_dsc_logo.jpg)
 
 
<hr style="width:100% ; height:2px ; border-width:0 ; color:gray ; background-color:#003d59 ; opacity:1">

# Data Visualisation in Python

## Chapter 1 – Introduction
### Hannah Hodge Waller

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:0.25"> 
Follow along with the code by running cells as you encounter them

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:0.25"> 
*Chapter Overview*
1. [Introduction](#intro)
<br><br>
2. [Packages and Data](#packages)
<br><br>
3.	[Processing Data](#processing)
<br><br>
4.	[Visualisation Guidelines](#guidelines)


<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:1"> 

<a id='intro'></a>
# 1. Introduction

This course aims to give an overview of data visualisation and plotting techniques in Python. Data visualisation can be described as both an art and a science, and while this course is not designed to act as a fully comprehensive guide the authors hope that it will give you an insight into the possibilities and best practice around the subject.

It is important to note that while this course aims to follow both ONS and GSS visualisation guidelines, it does not replace the current procedures in place for publishing data.


If you are a member of another government department you should also familiarise yourself with your department's guidelines.

This guide is written for running `matplotlib` inside of Jupyter Notebooks, and can easily be adapted for use in Spyder. 

[return to menu](#menu)

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:1"> 

<a id='packages'></a>
# 2. Packages and Data

## Packages

The main packages we will be using for this course are:

* Numpy – Version 1.18.1 *

Numpy gives us access to arrays and matrices; and additional mathematical functions we may need.

* Pandas – Version 0.20.1 *

Pandas gives us the functionality to work with DataFrames, as well as to easily manipulate or transform our data.

* Matplotlib – Version 2.0.2 *

Matplotlib is a plotting library for Python. Initially released in 2003, it comes with the module Pyplot – which provides a MATLAB – like interface. 

We will often be importing the `matplotlib.pyplot` variant.

* Seaborn - Version 0.7.1

Seaborn is an enhancement to Matplotlib. It gives us several enhancements to Matplotlib that we'll explore through the course.

Run the cells below to load the packages; and use the `.__version__` attribute to check your version.

In [None]:
# Load the packages

import numpy as np

import pandas as pd

import matplotlib

import seaborn as sns

Note - Some users have had errors when importing Seaborn. This was an error with the `np.nosetester` module. This was solved by upgrading the SciPy pakage. To do this use the Anaconda Prompt and enter:

`pip install --upgrade scipy`

N.B - You may need to take additional steps to download or update packages on a networked government computer. 

ONS staff using networked device you will need to ensure you have your computer set up to download packages using Artifactory. Please see [Yammer](https://www.yammer.com/ons.gov.uk/#/Threads/show?threadId=57654244122624) for help.

Staff from other departments should follow their own internal guidance.


In [None]:
# Check your versions

print("Numpy Version: ", np.__version__ )
print("Pandas Version: ", pd.__version__ )
print("Matplotlib Version: ", matplotlib.__version__ )
print("Seaborn Version: ", sns.__version__ )

You may have different versions of these packages; if code causes errors we advise you to look in the help functions for your specific version. Typically this will be a change in the parameter name

Different versions are often forwards and backwards compatible, but small changes might be required.

While it is impossible to check every package version, this course has also been checked with the following versions:

* Numpy Version:  1.19.4
* Pandas Version:  1.1.3
* Matplotlib Version:  3.3.3
* Seaborn Version:  0.11.0

<hr style="width:75%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:0.75"> 

## Data


In this course we’ll be using a variety of data. This is stored in the “data” folder.

Gapminder contains data from a variety of years for different countries relating to several elements:

* life_exp – life expectancy at birth in years.

* pop – population.

* gdp_per_cap – gross domestic product per capita in “international dollars” – a hypothetical unit of currency, equivalent to the power parity of the US dollar in 2005, in this case.

* infant_mortality - Number of deaths per 1,000 in children under 1 year of age.

* fertility – number of children per woman.

We will use the `pd.read_csv()` to read in our data.

In [3]:
gapminder = pd.read_csv("../data/gapminder.csv")

gapminder.head()

Unnamed: 0,country,continent,year,life_exp,pop,gdp_per_cap,infant_mortality,fertility
0,Afghanistan,Asia,1952,28.801,8425333.0,779.445314,,
1,Afghanistan,Asia,1957,30.332,9240934.0,820.85303,,
2,Afghanistan,Asia,1962,31.997,10267083.0,853.10071,,
3,Afghanistan,Asia,1967,34.02,11537966.0,836.197138,,
4,Afghanistan,Asia,1972,36.088,13079460.0,739.981106,,


Please feel free to explore the gapminder data before starting the course.

[return to menu](#menu)

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:1"> 

<a id='processing'></a>
# 3. Processing Data

In this course we’ll be using Pandas to process our data.

The code will be commented and follow PEP-8 guidelines.

Specifics of how data has been processed will not be provided as these techniques are covered in the [“Introduction to Python”](https://learninghub.ons.gov.uk/course/view.php?id=536) course, which is a pre-requisite for this course.

Any techniques that are not covered in the introduction course will be explained in full.

Please ensure you’re comfortable with manipulating data before commencing this course.


[return to menu](#menu)

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:1"> 

<a id='guidelines'></a>
# 4. Visualisation Guidelines

In this course we will be following guidelines set out in the Data Visualisation courses run by the Data Visualisation team within ONS and the Good Practice team for GSS. 

Material used to create this course can be found here:

[GSS Introduction to Data Visualisation]( https://gss.civilservice.gov.uk/training/introduction-to-data-visualisation/) - this course can be found on the Learning Hub, for access please email gss.capability@statistics.gov.uk.

[Style.ons.gov.uk – Data Visualisation](https://style.ons.gov.uk/category/data-visualisation/)

As previously mentioned, this course is not intended to replace traditional data visualisation processes that apply within organisations.

Please check your organisations guidelines before using visualisations produced using this guide. 


[return to menu](#menu)

<hr style="width:100%;height:4px;border-width:0;color:gray;background-color:#003d59; opacity:1"> 

# End of Chapter

You have completed Chapter 1 of the Data Visualisation Course. Please move on to Chapter 2.

[return to menu](#menu)