# Visualisation for Presentation
In this worksheet we will use the Matplotlib libraries to prepare some more interesting visualisations with the purpose of presenting our findings.  Matplot lib provides a huge range of types of visualisation and styles.

In [None]:
# Allow import of libraries from parent directory
import sys
sys.path.append("..")

In [None]:
from dasi_library import *

In [None]:
dataset = readCsv('../../datasets/World Development Indicators/World Indicators 2010.csv')

## Four Variable Bubble Chart

Scatter plots are very useful for conveying data relationships.   We already used simple scatter plots to help review our features in preparation for model fitting.  We can also use them for presentation graphics.
One of the great things about scatter plots is that you can show the relationship of 4 numeric features at once.
In the scatter plot below we assign one feature to the x-axis, one to the y-axis, one to the circle colour and one to the circle size.  Sometimes these are called Bubble Charts.

In [None]:
bubbleChart(dataset,
            xCol='BirthRate', 
            yCol='FertilityRate', 
            sizeCol='GDP', 
            colourCol='Pop65+', 
            labelsCol='CountryName')

We added all the labels above, but this is a bit messy.  Let's just select a few labels

In [None]:
bubbleChart(dataset, 
            xCol='BirthRate', 
            yCol='FertilityRate', 
            sizeCol='GDP', 
            colourCol='Pop65+', 
            labelsCol='CountryName', 
            labelsToShow=['Albania','India','United Kingdom'])   

Let's plot some different metrics

In [None]:
bubbleChart(dataset, 
            xCol='LifeExp', 
            yCol='FertilityRate', 
            sizeCol='GDP', 
            colourCol='Pop65+', 
            labelsCol='CountryName', 
            labelsToShow=['Albania','India','United Kingdom'],
            minBubble=5,
            maxBubble=1000)  

## Four Variable Bubble Chart, Colour is Categorical

In the above, the colour of the circle was assigned to a numeric variable and the colour scale down the right hand size indicated the value of the colour.  It is also interesting to see a categorical variable against the colour.  Let's colour the circles by region.

First read in the country data, which contains the region:

In [None]:
countries = readCsv('../../datasets/World Development Indicators/Country.csv')

In [None]:
countries = selectCols(countries, ['TableName','CountryCode','Region'])

Merge the region onto our data, so each country is tagged with the region:

In [None]:
merged = mergeOn(dataset, countries, on='CountryName', to='TableName')

In [None]:
merged

Now show the bubble chart with the regions:

In [None]:
bubbleChart(merged,
            xCol='LifeExp', 
            yCol='FertilityRate', 
            sizeCol='GDP', 
            colourCol='Region', 
            labelsCol='CountryName', 
            labelsToShow=['Albania','India','United Kingdom'])  