#### BASELINE SOLUTION

A simple baseline method could be a naive method, where the forecast for the next week would be simply equal to the usage of the previous week.

First, I'll load the data and preprocess it to get the hourly bus usages for each municipality, by aggregating the two measurements for an hour and taking the maximum value:

In [1]:
import pandas as pd

# Load the data
df = pd.read_csv("municipality_bus_utilization.csv", parse_dates=["timestamp"])

# Aggregate the two measurements for an hour and take the max value
df_hourly = df.groupby(["municipality_id", pd.Grouper(key="timestamp", freq="H")])["usage"].max().reset_index()
df_hourly.set_index("timestamp", inplace=True)

Next, I'll split the data into training and test sets, where the last two weeks (i.e., the last 336 hours) will be used as the test set:

In [2]:
# Split the data into training and test sets
train = df_hourly.iloc[:-336]
test = df_hourly.iloc[-336:]

Now, I can make our forecast using the simple baseline method, which is to simply use the previous week's usage as the forecast for the next week:

In [3]:
# Make the forecast using the previous week's usage
forecast = test.shift(168)

Finally, I can calculate the error between the forecast and the true values for the test set:

In [4]:
# Calculate the error between the forecast and the true values
error = (forecast - test)**2
rmse = error.mean()**0.5

print("RMSE:", rmse)

RMSE: municipality_id      0.000000
usage              221.268031
dtype: float64


This will display the root mean squared error (RMSE), which is a gauge of how well our forecast performs, between the forecast and the test set's true values.