The goal of this notebook is to forecast the usage of Paris Open Wifi, more specifically the hourly data usage.
The data has been gathered using ODSQL on Paris Open Data Website's API.
Packages/models: Exponential Smoothing (fitted with statsmodel), SARIMA, Facebook Prophet
The challenge of this forecast is the presence an abrupt change in the time serie due to the COVID-19 lockdown. It results in a time serie composed of two very distinct periods, the second one showing lower data consumption and less difference from day to day (less weekly seasonality).
We find that Holt-Winters exponential smoothing performs the best regarding the RMSEs :
See the attached notebook for additional models
Exponential smoothing performs slightly better than the other models (better RMSE overall). But the differences in performance might be due to randomness in the variation of the time serie (noise).
We need to keep in mind that the serie seems quite noisy with no clear daily pattern during open hours. The approach we took is very naive in that regard (applying a simple daily pattern and a trend). The Wifi usage is dependent of other external variables not taken into account (current weather, bank holiday, amount of tourists in Paris that day, etc). Using external data or having a few years of consistent data would obviously help.
Therefore, our approach is very limited and incomplete, even if it produces decent short term forecasts. Its naivety, the fact that the models all have been trained on a VERY short time window, the noisyness of the serie, are all servere weaknesses to our models. They are unsuitable for long term predictions by design (only using daily seasonality).