# matplotlib.pyplot

## Overview

### Introduction: matplotlib and pyplot

Matplotlib[1] is a popular python package for data visualisation. Developed since 2003 [R] as a way of making plots in python using a matlab like syntax. It is currently maintained and actively developed by a large community of open source developers.

Matplotlib features an object oriented interface at the base of which is the `Figure` class. A `Figure` can contain any number of subplots which are reified as objects of the `Axes` class, which contains many of the methods and fields that define the plot. 

Matplotlib.pyplot is a procedural wrapper around Matplotlib's object-oriented interface. The Matplotlib documentation recommends that, with the exception of a few functions that simplify intitialisation and saving of figures (`pyplot.figure`, `pyplot.subplot`, `pyplot.subplots`, and `pyplot.savefig`), the object oriented interface be used when programming, and the stateful pyplot interface be reserved for interactive work [R].

### Importing matplotlib.pyplot

The canonical alias for `matplotlib.pyplot` is `plt` and it is therefore imported by convention as follows:


In [5]:
# import matplotlib.pyplot and assign alias 'plt'
import matplotlib.pyplot as plt

### Quick plots using the procedural interface

### Using the OO interface for greater flexibility  

## matplotlib.pyplot example plots


In [3]:
plt.Axes.

Object `plt.Axes.AxesSubplot` not found.


### 1. Scatterplot of geographical data

Reference: scikit learn book

In [2]:
# imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [3]:
# Read ED centroid data into pandas dataframe
eds = pd.read_csv('data/ed_centroids.csv')
eds.head()

Unnamed: 0,X,Y,NUTS1,NUTS1NAME,NUTS2,NUTS2NAME,NUTS3,NUTS3NAME,COUNTY,COUNTYNAME,CSOED,OSIED,EDNAME,LAND_AREA,TOTAL_AREA
0,236368.456056,286670.682399,IE0,Ireland,IE01,"Border,Midland and Western",IE011,Border,32,Cavan County,32090,27053,Kilcogy,17.793398,17.838415
1,296984.205766,180513.89374,IE0,Ireland,IE02,Southern and Eastern,IE024,South-East (IE),1,Carlow County,1004,17022,Hacketstown,22.068904,22.068904
2,292582.032469,178524.343479,IE0,Ireland,IE02,Southern and Eastern,IE024,South-East (IE),1,Carlow County,1005,17023,Haroldstown,11.543112,11.543112
3,284014.641367,180190.200539,IE0,Ireland,IE02,Southern and Eastern,IE024,South-East (IE),1,Carlow County,1006,17029,Kineagh,18.033035,18.033035
4,286425.977882,183194.21536,IE0,Ireland,IE02,Southern and Eastern,IE024,South-East (IE),1,Carlow County,1007,17038,Rahill,16.431779,16.431779


In [4]:
# Rename ED ID column to make it easier to join with other data
eds['ID'] = eds['CSOED']
# Set the dataframe index to be the ED ID column
eds.set_index('ID', inplace=True)
# Cull unnecessary columns
eds = eds[['X', 'Y', 'LAND_AREA']]
# Check the data
eds.head()

Unnamed: 0_level_0,X,Y,LAND_AREA
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
32090,236368.456056,286670.682399,17.793398
1004,296984.205766,180513.89374,22.068904
1005,292582.032469,178524.343479,11.543112
1006,284014.641367,180190.200539,18.033035
1007,286425.977882,183194.21536,16.431779


Get data to join with ED centroids for scatterplot. Small Area Population Statistics from CSO website [R]

In [None]:
# 2016 SAPS
saps2016_url = 'https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS2016_ED3409.csv'

# 2016 Key
# https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS_2016_Glossary.xlsx

# 2011 SAPS
saps2011_url = 'https://www.cso.ie/en/media/csoie/census/documents/saps2011files/AllThemesTablesED.csv'

# 2011 Key
# https://www.cso.ie/en/media/csoie/census/documents/saps2011files/Theme,breakdown.xlsx

2016 SAPS csv file has an unusual encoding and produces errors when importing using pandas.read_csv(). This is avoided by specifying the encoding (cp1252) [R]. For the purposes of the present demonstration the statistic of interest is total population per ED. The correct column ('T1_1AGETT') is identified using the key linked to above [R]. The file also imports numeric columns as dtype('object') and inserts commas as thousand separators. This is corrected using `pandas.to_numeric` and `pandas.Series.str.replace()` [R]. The correctly formatted data is stored in a new column `Pop2016`.

In [None]:
# Download 2016 SAPS data at ED level and extract ID column and Total Population column
saps2016 = pd.read_csv(saps2016_url, encoding='cp1252', usecols=['GEOGID', 'T1_1AGETT'])
# Remove commas and cast Total Population column to numeric dtype. 
saps2016['Pop2016'] = pd.to_numeric(saps2016['T1_1AGETT'].str.replace(',', ''))

### 2. Scatterplot matrix

### 3. Bubble plot using geographic data

***

## References

[1] Matplotlib Development Team. (2021). Matplotlib: Visualization with Python. [online] Available from: <https://matplotlib.org/> Accessed: 28th September, 2021

[3] Matplotlib module docstring