### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics Using Python

## Demonstration: Boston Marathon case study

The Boston Marathon organisers want to promote the annual event and engage participants for future events. In 2017, marathon participants agreed to wear tracking devices during the run, and the stats gathered from these devices have been compiled. The organisers want to learn from the data to inform their marketing and promotional campaign. They hope to identify something in the data that the marketing team could use to inform their campaign design. 

In this demonstration, let's use the marathon data set to create more effective visualisations.

This Notebook is used in 4.3.5 [Optional] Readability of visualisations and is a continuation of the LSE_DA201_the_visualisation_workflow.ipynb Notebook used in 4.3.1 The visualisation workflow. The new content begins at Step 7 below.

### 1. Import libraries and create a DataFrame

In [None]:
# Import Matplotlib, Seaborn, NumPy, and Pandas.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import datetime

# Read the CSV file.
marathon = pd.read_csv("marathon_results.csv")

# View the DataFrame.
print(marathon.shape)
print(marathon.columns)
marathon.head()

### 2. Convert time to integers

In [None]:
# Ensure the variables of interest are numeric.
marathon['official_time'] = pd.to_timedelta(marathon['Official Time'])

# Calculate the number of seconds.
marathon['official_time_seconds'] = marathon['official_time'].dt.seconds

# Calculate the number of minutes.
marathon['official_time_minutes'] = marathon['official_time_seconds']/60

# View the DataFrame.
marathon.head()

### 3. What is the spread of data?

In [None]:
# Create an empty plot and set plot size.
fig, ax = plt.subplots()
fig.set_size_inches(16, 8)

# Create a histogram.
ax.hist(marathon['Official Time'], bins=20)

### 4. What is the spread of male marathon runners?

In [None]:
# Create an empty plot and set plot size.
fig, ax = plt.subplots()
fig.set_size_inches(16, 8)

# Create a variable for the x-values.
males = marathon[marathon['M/F'] == 'M']

# Create a histogram.
ax.hist(males['Official Time'], bins=20)

### 5. What is the spread of female marathon runners?

In [None]:
# Create an empty plot and set plot size.
fig, ax = plt.subplots()
fig.set_size_inches(16, 8)

# Create a variable for the x-values.
females = marathon[marathon['M/F'] == 'F']

# Create a histogram.
ax.hist(females['Official Time'], bins=20)

### 6. What is the relationship between gender and race times?

In [None]:
# Create a data set for males and females to use in the matplotlib boxplot.
males = marathon[marathon['M/F'] == 'M']
females = marathon[marathon['M/F'] == 'F']

# Create a variable for the boxplot.
males = males['official_time_minutes']
females = females['official_time_minutes']

# Create a boxplot using Matplotlib.
plt.boxplot([males,females], labels=['M','F'], patch_artist=True)

# View the boxplot.
plt.show()

In [None]:
# Create a second boxplot using Seaborn.
sns.boxplot(x='M/F', y='official_time_minutes', data=marathon)

### 7. Combine the two histograms (male and female) into one plot