# Time Series Analysis Exercise 1: Seattle Bicycles

In this exercise we will practice exploratory data analysis on time series data, using the Seattle [Fremont Bridge Bicycle Counter](https://data.seattle.gov/Transportation/Fremont-Bridge-Bicycle-Counter/65db-xm6k) dataset from [Seattle's Open Data portal](https://data.seattle.gov/).

This exercise requires the `statsmodels` library, available by default in Google Colab and Anaconda and which can be installed with `pip install statsmodels`. 

**Questions:**
1. Load the attached `Fremont_Bridge_Bicycle_Counter.csv` dataset as a Pandas DataFrame `bike_df`. What are the columns of this DataFrame?

In [None]:
import pandas as pd

bike_df = pd.read_csv("Fremont_Bridge_Bicycle_Counter.csv")
print(bike_df.columns)

In [None]:
bike_df.head()

2. We will only use the date and total number of bicycle crossings as our features. Use the arguments `usecols=`, `index_col=`, and `parse_dates=True` in `pd.read_csv()` so that only the `"Date"` and `"Fremont Bridge Total"` columns are read, and the dates are used as the DataFrame index.


In [None]:
bike_df = pd.read_csv("Fremont_Bridge_Bicycle_Counter.csv", usecols=['Date', 'Fremont Bridge Total'], index_col='Date', parse_dates=True)
display(bike_df.head())

3. Run `bike_df.describe()` and give a short explanation of the statistics that are printed. Hint: What time period does each row of `bike_df` cover?


In [None]:
bike_df.describe()

there are 8766 hours in a year so in this dataset we have almost 10 years of data and each row represents an hour and the total amount of bicycles that cross the bridge using the pedestrian/bicycle pathways.

4. Plot bike crossings for the month of January 2016. What patterns do you see in the data? Hint: you can `bike_df.loc['2016-01']` to access a month in the date-time index of the DataFrame. Use the Pandas DataFrame `.plot()` function and not `plt.plot(...)` from Matplotlib.


In [None]:
import matplotlib.pyplot as plt

january_2016 = bike_df.loc['2016-01']
january_2016.plot()

plt.xlabel('Date')
plt.ylabel('Number of Bike Crossings')
plt.title('Bike Crossings for January 2016')

plt.show()

We can see different kind of patterns, first we can see 31 low points in the data, low "valleys" that probably represents the night hours where there are not a lot of crossings. 
Also we can clearly see than 2 days of 7 have lower values, which represent the week-end days. 
Lastly we can clearly see two spikes each day, probably one in the morning when people go to work and one in the evening when they are coming back.

5. Use `bikes_per_week = bike_df.resample(...).sum()['Fremont Bridge Total']` to get the total bike crossings per week for the entire time period covered in the dataset, and plot this data. What seasonal pattern do you see in the data?


In [None]:
# Resample the data to get total bike crossings per week
bikes_per_week = bike_df.resample('W').sum()['Fremont Bridge Total']

bikes_per_week.plot()

plt.xlabel('Date')
plt.ylabel('Total Bike Crossings per Week')
plt.title('Total Bike Crossings per Week')

plt.show()


We can see a pattern every year, a raising in crossings until we get to a maximum (approximately mid-year) and then a drop in values untill the end of the year where we get a minimum in crossings sum values.

6. Examine the autocorrelation plots generated below and explain them using your answers to questions 4-5.


In [None]:
# code for question 6
from matplotlib import pyplot as plt

bikes_per_day = bike_df.resample('d').sum()['Fremont Bridge Total']

plt.figure()
pd.plotting.autocorrelation_plot(bikes_per_day)
plt.xlim((0, 20))
plt.ylim((-1, 1))
plt.xticks(range(20))
plt.title('Autocorrelation of bikes_per_day over 20 days')

plt.figure()
pd.plotting.autocorrelation_plot(bikes_per_day)
plt.xlim((0,500))
plt.ylim((-1, 1))
plt.title('Autocorrelation of bikes_per_day over 500 days');

For the first plot, where we limit the x-axis to 20 days, the autocorrelation plot shows the correlation between the number of bike crossings on a given day and the number of bike crossings on the preceding days up to a lag of 20 days.
We can see that the autocorrelation at a lag of 7 and 14 days is high, it suggests that there is a weekly pattern in the data, meaning that the number of bike crossings on a given day is strongly correlated with the number of crossings on the same day of the week for the preceding weeks, and that confirms what we saw in question 4.

In the second plot, where we extend the x-axis to 500 days, we are examining the autocorrelation over a longer period. This helps in understanding if there are any longer-term patterns or trends in the data. If there's a significant autocorrelation at certain lags, it suggests the presence of seasonality or other cyclic patterns in the data, such as monthly or yearly trends.  As we can see, there is a high autocorrelation around day 365, meaning that there is seasonality pattern related to the monts of the year as we saw in question 5.

7. Use `statsmodels.tsa.seasonal.seasonal_decompose()` on `bikes_per_week` to decompose it into seasonal, trend, and residual components. Plot all of these components on the same graph, and explain what each component represents. What can you say about how the number of bicycle crossings have changed over time?


In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(bikes_per_week)

plt.figure(figsize=(10, 8))

plt.subplot(4, 1, 1)
plt.plot(bikes_per_week, label='Original')
plt.legend(loc='upper left')
plt.ylabel('Total Bike Crossings per Week')

plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, label='Trend', color='orange')
plt.legend(loc='upper left')
plt.ylabel('Trend')

plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, label='Seasonal', color='green')
plt.legend(loc='upper left')
plt.ylabel('Seasonal')

plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, label='Residual', color='red')
plt.legend(loc='upper left')
plt.ylabel('Residual')

plt.xlabel('Date')
plt.suptitle('Decomposition of Bike Crossings per Week')

plt.tight_layout()
plt.show()


We can see with the trend graph that the number of crossings stays approximatively constant from 2013 to 2017, then we have a small rise untill end of 2019 where we have a huge drop untill mid-end of 2020, and this is of course because of the covid crisis and lockdown.

**BONUS:** Describe the meaning of the graph generated by the code below marked BONUS.

In [None]:
# your code here

In [None]:
# code for BONUS question
bike_df['Fremont Bridge Total'].groupby([
    bike_df.index.time,
    bike_df.index.dayofweek
]).mean().unstack().plot();