### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics Using Python

## Demonstration: Penguins case study

A wildlife conservation company operates small boat tours to raise funds for their preservation work. The natural habitats for a number of penguin species are the locations for the boat tours, and these are the areas the company is striving to preserve. 

The boat tours cater to tourists who wish to view penguins in their natural habitat and learn more about the different species. The penguins live on a number of islands, and the company wants to optimise their boat tours so that tourists have the best chance of seeing the penguins. The visualisations will serve to inform the company strategy and also be included on the website, impact reports, and promotional materials such as brochures distributed to tourists.

Note: You will use this notebook for 4.1.10 (steps 1-6), 4.2.3 (steps 7-11) and the optional Extended study: One, two, and three dimensions in Matplotlib (steps 12-14)

Let’s help save some penguins! 

## 1. Prepare your workstation

In [None]:
# Import Matplotlib, Seaborn, and Pandas.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Read the CSV file.
penguins = pd.read_csv('penguins.csv')

# View the DataFrame.
print(penguins.shape)
print(penguins.dtypes)
print(penguins.columns)
penguins.head()

# 4.1.10 Outlier analysis: Pairplots

## 2. Create a pairplot

In [None]:
# Create a simple pairplot.
sns.pairplot(penguins)

## 3. Pairplot: Layered kernel density estimate

In [None]:
# Create a pairplot with KDE (kernel density estimate).
sns.pairplot(penguins, hue='species')

## 4. Pairplot: Facetgrid method

In [None]:
# Create a pairplot with the facetgrid method.
sns.pairplot(penguins, hue='species', diag_kind='hist')

## 5. Pairplot: Adjusting size

In [None]:
# Create a pairplot and indicate the height.
sns.pairplot(penguins, hue='species', diag_kind='hist', height=2)

## 6. Remove any redundant plots

In [None]:
# Remove any redundant plots.
sns.pairplot(penguins,
             x_vars=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm'],
             y_vars=['bill_length_mm', 'bill_depth_mm'])

# 4.2.3 Customising styles, colours, titles, and labels

## 7. Adjust figure size

In [None]:
# Set the figure size.
sns.set(rc={'figure.figsize':(5, 5)})

# Set the tick style.
sns.set_style('ticks')

# Set the colour style.
sns.set_style('darkgrid')

# Plot the scatterplot.
sns.scatterplot(data=penguins, x='bill_length_mm', 
                y='flipper_length_mm', hue='species')

## 8. Customise individual attributes

In [None]:
#Run this first.
sns.axes_style()

In [None]:
# Set the style of ticks.
sns.set_style('ticks', {'axes.facecolor': '#dddddd',
                        'axes.spines.top': False,
                        'axes.spines.right': False})

# Create the scatterplot.
sns.scatterplot(data=penguins, x='bill_length_mm',
                y='flipper_length_mm', hue='species')

## 9. Change colour of a data series in Seaborn

In [None]:
# Set the style first.
sns.set_style('ticks')

# Specify the Pastel palette.
sns.scatterplot(data=penguins, x='bill_length_mm',
                y='flipper_length_mm', hue='species',
                palette='pastel')

## 10. Make it accessible

In [None]:
# Set the colour as colorblind.
sns.scatterplot(data=penguins, x='bill_length_mm',
                y='flipper_length_mm', hue='species',
                palette='colorblind')

## 11. Customise labels

In [None]:
# Plot the scatterplot.
ax = sns.scatterplot(data=penguins, x='bill_length_mm',
                     y='flipper_length_mm', hue='species',
                     palette='colorblind')

# Specify the labels.
ax.set_xlabel("Bill length (mm)")
ax.set_ylabel("Flipper length (mm)")
ax.set_title("Penguins Bill vs. Flipper Length")

## (Optional) Extended study: One, two, and three dimensions in Matplotlib

## 12. Create a two-colour barplot

In [None]:
# Create a variable.
penguin_sample_size = 10

# Create a second variable.
random_penguins = penguins.sample(n=penguin_sample_size)

# Create an empty plot.
fig, ax = plt.subplots()

# Set the figure size.
fig.set_size_inches(8, 8)

# Create a barplot and specify parameters.
plt.bar(list(range(penguin_sample_size)),
        random_penguins['body_mass_g'],
        color=['green', 'red'])

# Specify ticks, labels, and title.
ax.set_xticks(list(range(penguin_sample_size)))
ax.set_xlabel("Penguin Number")
ax.set_ylabel("Body Mass (g)")
ax.set_title("Penguins Masses")

## 13. Pastel2 colormap demonstration

In [None]:
import numpy as np

# Create an empty plot.
fig, ax = plt.subplots()

# Set the plot size.
fig.set_size_inches(8, 8)

# Create a barplot.
ax.bar(list(range(penguin_sample_size)),
       random_penguins['body_mass_g'],
       color=plt.cm.Pastel2(np.linspace(0, 1, 10)))

# Specify ticks, labels, and title.
ax.set_xticks(list(range(penguin_sample_size)))
ax.set_xlabel("Penguin Number")
ax.set_ylabel("Body Mass (g)")
ax.set_title("Penguins Masses")

## 14. Cycle the colormap

In [None]:
# Specify the colormap to use.
plt.rcParams['axes.prop_cycle'] = plt.cycler('color',
                                             plt.cm.Paired(np.linspace(0, 4, 10)))

# Create an empty plot.
fig, ax = plt.subplots()

# Specify the size of the plot.
fig.set_size_inches(8, 8)

# Create variables.
penguins = penguins.sort_values('bill_length_mm')
male_penguins = penguins[penguins['sex'] == 'MALE']
adelie = male_penguins[male_penguins['species'] == 'Adelie']
gentoo = male_penguins[male_penguins['species'] == 'Gentoo']
chinstrap = male_penguins[male_penguins['species'] == 'Chinstrap']

# Create the plot.
ax.plot(adelie['bill_length_mm'],
        adelie['flipper_length_mm'], label='Adelie')
ax.plot(gentoo['bill_length_mm'],
        gentoo['flipper_length_mm'], label='Gentoo')
ax.plot(chinstrap['bill_length_mm'],
        chinstrap['flipper_length_mm'], label='Chinstrap')

# Add labels, legend, title, etc.
ax.legend()
ax.set_xlabel("Bill Length (mm)")
ax.set_ylabel("Flipper Length (mm)")
ax.set_title("Penguin Bill vs Flipper Length (Male)")