# Bike highways - revisit in PyCaret

We already analyzed the bike-highway data a bit. It was a time series, so let's focus on the counting-point nearest to our school and see if we can predict it. And by "we" I mean PyCaret.

Let's look at the [official PyCaret-documentation](https://pycaret.gitbook.io/docs/get-started/quickstart#time-series). We won't bother with the stock example this time but start right away with our own data.

In [20]:
import pandas as pd

df = pd.read_csv('files/bike_counters_data/Measured data-nl-Geel_FMN GV 21 Geel.csv')
df.head()

Unnamed: 0,Meetpunt surrogate key,Meetpunt locatie,Datum,Tijd,Aantal fietsers,Aantal fietsers van,Aantal fietsers naar,Meetpunt code
0,89,Rauwelkoven 54,2020-02-14,00:00:00,0,0,0,FMN GV 21 Geel
1,89,Rauwelkoven 54,2020-02-14,01:00:00,0,0,0,FMN GV 21 Geel
2,89,Rauwelkoven 54,2020-02-14,02:00:00,0,0,0,FMN GV 21 Geel
3,89,Rauwelkoven 54,2020-02-14,03:00:00,2,2,0,FMN GV 21 Geel
4,89,Rauwelkoven 54,2020-02-14,04:00:00,0,0,0,FMN GV 21 Geel


Once again we need to do some cleaning. As we can read [here](https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/time-series-forecasting-with-pycaret-regression) PyCaret can't deal with dates so we'll have to store all parts of the date and time separately. We're also only interested in "Aantal fietsers", not the "van" and "naar" columns. The "Meetpunt surrogate key", "Meetpunt locatie" and "Meetpunt code" is always the same, so you can drop these as well.

And rename "Aantal fietsers" to "nr_cyclists". It'll be easier to work with.

In [21]:
# DELETE

df['year'] = pd.DatetimeIndex(df['Datum']).year
df['month'] = pd.DatetimeIndex(df['Datum']).month
df['day'] = pd.DatetimeIndex(df['Datum']).day
df['hour'] = pd.DatetimeIndex(df['Datum']).hour
df = df.rename(columns={"Aantal fietsers": "nr_cyclists"})

df = df.drop(columns=['Datum', 'Tijd', 'Meetpunt surrogate key', 'Meetpunt locatie','Aantal fietsers van','Aantal fietsers naar','Meetpunt code'])
df.head()

Unnamed: 0,nr_cyclists,year,month,day,hour
0,0,2020,2,14,0
1,0,2020,2,14,0
2,0,2020,2,14,0
3,2,2020,2,14,0
4,0,2020,2,14,0


In [25]:
from pycaret.time_series import *

s = setup(df, fh = 3, fold = 5, session_id = 123, target="nr_cyclists")

Unnamed: 0,Description,Value
0,session_id,123
1,Target,nr_cyclists
2,Approach,Univariate
3,Exogenous Variables,Present
4,Original data shape,"(29256, 5)"
5,Transformed data shape,"(29256, 5)"
6,Transformed train set shape,"(29253, 5)"
7,Transformed test set shape,"(3, 5)"
8,Rows with missing values,0.0%
9,Fold Generator,ExpandingWindowSplitter


In [26]:
best = compare_models(sort = 'MAE')

Unnamed: 0,Model,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2,TT (Sec)
naive,Naive Forecaster,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.532
grand_means,Grand Means Forecaster,1.97,0.9504,35.6621,35.6621,1.6060759998085222e+17,2.0,0.0,1.178
arima,ARIMA,6.1903,3.1763,112.0242,119.1153,5.045122230756617e+17,2.0,0.0,6.794
snaive,Seasonal Naive Forecaster,7.9779,3.9975,144.4,149.9529,6.503197861922995e+17,2.0,0.0,1.342


Processing:   0%|          | 0/97 [00:00<?, ?it/s]

There is a lot more documentation on this topic:

* [PyCaret 101](https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/time-series-101-for-beginners)
* [A blog-post](https://towardsdatascience.com/announcing-pycarets-new-time-series-module-b6e724d4636c)
