# Data preparation

In this assignment, we'll use the dataset and data preparation used in [Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow](https://github.com/ageron/handson-ml3/blob/main/15_processing_sequences_using_rnns_and_cnns.ipynb) book.

In [None]:
!wget https://github.com/ageron/data/raw/main/ridership.tgz

In [None]:
!tar -xf ridership.tgz

In [None]:
import pandas as pd
df = pd.read_csv("ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv", parse_dates=["service_date"])
df.columns = ["date", "day_type", "bus", "rail", "total"] # shorter names
df = df.sort_values("date").set_index("date")
df = df.drop("total", axis=1) # no need for total, it's just bus + rail
df = df.drop_duplicates() # remove duplicated months (2011-10 and 2014-07)

In [None]:
df.head()

In [None]:
import matplotlib.pyplot as plt
df["2019-03":"2019-05"].plot(grid=True, marker=".", figsize=(8, 3.5))
plt.show()

In [None]:
rail_train = df["rail"]["2016-01":"2018-12"] / 1e6
rail_valid = df["rail"]["2019-01":"2019-05"] / 1e6
rail_test = df["rail"]["2019-06":] / 1e6

# Exercise

You're task to model the rail's sequences. The data is split into train, validation and test. Each example consists of 57 consective time steps where the first 56 steps are the input and the last time step is the output. For each model you should report the MSE, RMSE and MAE.

The required models:
* MLP
* RNN
* GRU
* Transformer encoder


## Deliverables
* The code.
* Table summarizing the hyperparameters and the result.
* A brief write-up describing the experiment, results, and analysis.