# Bike highways - revisit in PyCaret

We already analyzed the bike-highway data a bit. It was a time series, so let's focus on the counting-point nearest to our school and see if we can predict it. And by "we" I mean PyCaret.

Let's look at the [official PyCaret-documentation](https://pycaret.gitbook.io/docs/get-started/quickstart#time-series). We won't bother with the stock example this time but start right away with our own data.

In [None]:
import pandas as pd

df = pd.read_csv('../files/bike_counters_data/Measured data-nl-Geel_FMN GV 21 Geel.csv')
df.head()

Once again we need to do some cleaning. As we can read [here](https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/time-series-forecasting-with-pycaret-regression) PyCaret can't deal with dates so we'll have to store all parts of the date and time separately. We're also only interested in "Aantal fietsers", not the "van" and "naar" columns. The "Meetpunt surrogate key", "Meetpunt locatie" and "Meetpunt code" is always the same, so you can drop these as well.

And rename "Aantal fietsers" to "nr_cyclists". It'll be easier to work with.

In [None]:
# DELETE

df['year'] = pd.DatetimeIndex(df['Datum']).year
df['month'] = pd.DatetimeIndex(df['Datum']).month
df['day'] = pd.DatetimeIndex(df['Datum']).day
df['hour'] = pd.DatetimeIndex(df['Datum']).hour
df = df.rename(columns={"Aantal fietsers": "nr_cyclists"})

df = df.drop(columns=['Datum', 'Tijd', 'Meetpunt surrogate key', 'Meetpunt locatie','Aantal fietsers van','Aantal fietsers naar','Meetpunt code'])
df.head()

Finally, group this data so you're working with the daily totals, not the hourly data. Otherwise you'll be predicting way to many zeros.

In [None]:
df_monthly = df.drop(columns=["hour","day"]).groupby(["year","month"], as_index=False).sum(["nr_cyclists"])
df_monthly.head()

Next up is PyCaret! Some of these steps will take a while. If you [have better things to do](https://www.youtube.com/watch?v=nLJ8ILIE780), save the last variable you made (the setup or the best model) in a [pickle](https://www.geeksforgeeks.org/how-to-use-pickle-to-save-and-load-variables-in-python/) file.

First, setup using the setup-function.

In [None]:
# DELETE
from pycaret.time_series import *

s = setup(df_monthly, fh = 3, fold = 5, session_id = 123, target="nr_cyclists")

Next up compare the different models. We're predicting based on the monthly data, giving us 40 datapoints to predict and test on. This is not nearly enough, but as a POC it'll do.

Also, use the option "n_select=5" as parameter to compare_models.

In [None]:
best = compare_models(sort = 'MAE', n_select=5)

In [None]:
import pickle 

with open('best_model.pkl', 'wb') as file: 
    pickle.dump(best, file) 

In [None]:
import pickle

with open('best_model.pkl', 'rb') as file: 
      
    # Call load method to deserialze 
    best_2 = pickle.load(file) 

print(best_2)

Predict 6 months into the future!

In [None]:
# DELETE
plot_model(best[0], plot = 'forecast', data_kwargs = {'fh' : 6})

Now compare this model to the other four you stored. Some provide a pretty prediction, others are plain bad. Still it's a good start.

In [None]:
# DELETE
plot_model(best[1], plot = 'forecast', data_kwargs = {'fh' : 6})