# Bike highways - revisit in PyCaret

We already analyzed the bike-highway data a bit. It was a time series, so let's focus on the counting-point nearest to our school and see if we can predict it. And by "we" I mean PyCaret.

Let's look at the [official PyCaret-documentation](https://pycaret.gitbook.io/docs/get-started/quickstart#time-series). We won't bother with the stock example this time but start right away with our own data.

In [None]:
import pandas as pd

df = pd.read_csv('files/bike_counters_data/Measured data-nl-Geel_FMN GV 21 Geel.csv')
df.head()

Once again we need to do some cleaning. As we can read [here](https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/time-series-forecasting-with-pycaret-regression) PyCaret can't deal with dates so we'll have to store all parts of the date and time separately. We're also only interested in "Aantal fietsers", not the "van" and "naar" columns. The "Meetpunt surrogate key", "Meetpunt locatie" and "Meetpunt code" is always the same, so you can drop these as well.

And rename "Aantal fietsers" to "nr_cyclists". It'll be easier to work with.

In [None]:
# DELETE

df['year'] = pd.DatetimeIndex(df['Datum']).year
df['month'] = pd.DatetimeIndex(df['Datum']).month
df['day'] = pd.DatetimeIndex(df['Datum']).day
df['hour'] = pd.DatetimeIndex(df['Datum']).hour
df = df.rename(columns={"Aantal fietsers": "nr_cyclists"})

df = df.drop(columns=['Datum', 'Tijd', 'Meetpunt surrogate key', 'Meetpunt locatie','Aantal fietsers van','Aantal fietsers naar','Meetpunt code'])
df.head()

Finally, group this data so you're working with the daily totals, not the hourly data. Otherwise you'll be predicting way to many zeros.

In [None]:
df_daily = df.drop(columns=["hour"]).groupby(["year","month","day"], as_index=False).sum(["nr_cyclists"])
df_daily.head()

Next up is PyCaret! Some of these steps will take a while. If you [have better things to do](https://www.youtube.com/watch?v=nLJ8ILIE780), save the last variable you made (the setup or the best model) in a [pickle](https://www.geeksforgeeks.org/how-to-use-pickle-to-save-and-load-variables-in-python/) file.

First, setup using the setup-function.

In [None]:
# DELETE
from pycaret.time_series import *

s = setup(df, fh = 3, fold = 5, session_id = 123, target="nr_cyclists")

Next up compare the different models. This one will take a while, as we have a lot of data and we're training a lot of models.

Also, use the option "n_select=5" as parameter to compare_models. You'll thank me for this. (Note: the next one took half an hour on a computer with a GPU. Not sure what it would take on a computer without.)

In [None]:
best = compare_models(sort = 'MAE', n_select=5)

In [None]:
import pickle 

with open('best_model.pkl', 'wb') as file: 
    pickle.dump(best, file) 

In [None]:
import pickle

with open('best_model.pkl', 'rb') as file: 
      
    # Call load method to deserialze 
    best_2 = pickle.load(file) 

print(best_2)

Strangest thing: we have a perfect model:

![Alt text](files/image.png)

Which is a bad thing. It means something went wrong, because no model can be perfect. Let's see what it predicts using a forecast plot. Predict 10 days, or 240 hours.

(You can drag a small window across the data you want to zoom in on. Also remember we stored 5 models (n_select), so you'll have to use an index.)


In [None]:
# DELETE
plot_model(best[0], plot = 'forecast', data_kwargs = {'fh' : 240})

Now compare this "perfect" model to the other four you stored. Some provide a pretty prediction, others are plain bad. Still it's a good start.

In [None]:
# DELETE
plot_model(best[2], plot = 'forecast', data_kwargs = {'fh' : 240})

There are a lot of other plots to be made. Experiment a bit!

In [None]:
#DELETE



There is a lot more documentation on this topic:

* [PyCaret 101](https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/time-series-101-for-beginners)
* [A blog-post](https://towardsdatascience.com/announcing-pycarets-new-time-series-module-b6e724d4636c)
