<a href="https://colab.research.google.com/github/mnijhuis-dnb/open_source_workshop/blob/master/Trains_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Downloading the data from Github

In [None]:
!wget https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/full_trains.csv

Loading the packages to do some initial data wrangling

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Load the csv file to a dataframe

In [None]:
df = pd.read_csv('full_trains.csv')

Show the data to get an idea of what is in the data

In [None]:
df.head(10)

Do some data cleaning (only use the national trains, convert the datestrings to dates and only select the necessary data)

In [None]:
df = df[df['service']=='National']
df['date'] = pd.to_datetime(df['year'].astype(str) + '-' + df['month'].astype(str))
df = df[['date', 'departure_station', 'avg_delay_all_departing']]

Make a pivot table of the data

In [None]:
df2 = pd.pivot_table(df, index=['date'], values='avg_delay_all_departing', columns='departure_station', aggfunc='mean')
df2[df2<0] = 0

Show the resulting data

In [None]:
df2.head(5)

Set the font for the plot

In [None]:
import matplotlib as mpl

font = {'family' : 'sans-serif',
        'weight' : 'normal',
        'size'   : 16}

mpl.rc('font', **font)

Make the plot

In [None]:
import datetime as dt

# plot the delays for each station in a greytone with a slight transparency
fig, ax = plt.subplots(figsize=(10,5))
for col_name in df2.columns:
  if col_name == 'ANNECY':    # If the station name is ANNECY label it other stations, so a single item will appear in the legend
    ax.plot(df2[col_name], alpha=0.4, color=[0.8,0.8,0.8], label='Other stations')
  else:                       # Otherwise do not give a label
    ax.plot(df2[col_name], alpha=0.4, color=[0.8,0.8,0.8])

# plot the delays for Lyon-Part-Dieu in a bright color with a thicker line
ax.plot(df2['LYON PART DIEU'], color=[1, 0, 0], linewidth=2, label='Lyon-Part-Dieu')

# Adjust the ticks and axis labels
ax.set(ylim = (0,20),
       yticks = np.linspace(0, 20, 5),
       ylabel='average delay [min]',
       xlim = (dt.datetime(2015,1,1), dt.datetime(2018,1,1)))

# Set the time interval to 6 month
ax.xaxis.set_major_locator(mpl.dates.MonthLocator(interval=6))

# Remove the spines
ax.spines["right"].set_visible(False) 
ax.spines["top"].set_visible(False) 

# Add a legned and decrease the transparency of the grey line
legend = ax.legend(loc='upper center', framealpha=0.0, ncol=2) 
for idx, legend_entry in enumerate(legend.get_lines()):
  if idx == 0:
    legend_entry.set_alpha(0.8)