# Predict tomorrow's meals

### The "Why" of exercise

Technology is revolutionizing all areas - including catering.

If restorers develop over the years an excellent intuition of "working" days and months, this problem can also be tackled with the modeling tools seen in this module. The interest is multiple: reduction of ingredient purchases, deduction of days when opening a restaurant is not profitable, promotional targeting to optimize margins...

This prediction of future activity is the subject of this practical work.

### The "What"

Tiller Systems is a startup launched in 2012 that made the bet to simplify the lives of restaurateurs with a single iPhone application. This application supports in a single system:
- order taking in front of the customer,
- sending to the kitchen,
- collection (credit card and issuance of the receipt)
- ... and integrates the meal into the restaurant's accounts

We can therefore see that Tiller's data is exactly what is needed to address the problem of predicting restaurant attendance.

Here we are going to use data shared by Tiller on several restaurants and between 2016 and 2020. We are going to focus in particular on the prediction of the **daily number of covers**.

### The "How"

The time series we are dealing with here are quite specific, because restaurant sales are impacted by several seasonalities:
- depending on the time of year (some restaurants are busier during school holidays, others not)
- depending on the day of the week (weekends are abnormal)
- depending on public holidays

The models we saw last time aren't good enough to meet the needs of this kind of cross-seasonality. This is why we are going to use more complex tools here: the prediction algorithms Silverkite (Linked In) and Prophet (Facebook).

## 0. [REQUIRED] Creating a new environment

We will use the [Greykite](https://pypi.org/project/greykite/) library during this lab.

This library requires the creation of a new environment because it uses certain specific versions of certain libraries.

Create a new Python 3.7 environment and install greykite.
If you encounter difficulties, depending on your operating system, follow the instructions [at this address](https://linkedin.github.io/greykite/installation). On Mac, running the `brew install cmake` command in a terminal before installing greykite may suffice as an alternative to the instructions in the previous link.

Activate the new environment for the rest of this lab.

If for some specific reason an error has occurred, don't get stuck and use [Google Colaboratory](https://colab.research.google.com/?utm_source=scs-index) to complete this lab.

## 1. Exploratory Analysis

The dataset that we propose to study contains the number of daily covers for 13 restaurants in Ile de France, for dates ranging from February 12, 2017 to October 20, 2020, which used the Tiller system.

<b>1.A)</b> Load data from [`tiller_restaurants.csv`](https://drive.google.com/file/d/1fD382tB7TLay5vXAF_ymEfC9ERDxmA-W/view?usp=sharing) file into a Data Frame which you will name `df`.

<b>1.B)</b> The `date` column must be converted to be in `datetime.date` format.

Perform the necessary transformations for its conversion into this format.

<details>
<summary><i>Click for a hint</i></summary>

    
Hint 1: to convert character strings to date, we use the "pd.to_datetime" function

Hint 2: to convert "datetime" (date and time information) to "date" (date information only), you can use the ".dt.date" method of a pd.Series

</details>

<b>1.B)</b> Does this game contain one line per day? Check that it is complete: in other words, check that there are no missing dates.

<b>1.C)</b> Display all time series on the same chart, versus time. List all the patterns you see emerging that relate to:

- atypical periods

- seasonality

- remarkable points (outliers)

<b>1.D)</b> Remove the outlier from `store_1070`, setting it to the value of the previous week (this is considered a bug in the data).

<b>1.E)</b> How many full years do you have in this dataset? Is it enough to predict?

## 2. Modeling with Silverkite and Prophet: the `store_425` series, a simple case

<b>2.A)</b> We now decide to predict these time series.

To start, we specifically propose to use the **Silverkite** algorithm, the Linked In algorithm, to **predict over one year, from March 3, 2019** , the `store_425` time series.

Modify the following code so that the call to `ForecastConfig` allows such a forecast:

In [None]:
from greykite.framework.templates.autogen.forecast_config import ForecastConfig, MetadataParam

forecast_config = ForecastConfig(
    model_template = ~~~~ , # To be completed
    forecast_horizon = ~~~~ , # To be completed
    coverage = 0.95,
    metadata_param = MetadataParam(
      time_col = ~~~~ , # To be completed
      value_col = ~~~~ , # To be completed
      freq = ~~~~ , # To be completed
      train_end_date = ~~~~ , # To be completed
   )
)

<b>2.B)</b> Model the time series with a call to `forecaster.run_forecast_config()`, then display its prediction on the "test" sample.

<b>2.C)</b> After reading this graph, are you satisfied with the quality of the prediction? Do you identify any obvious errors in the model?

<b>2.D)</b> What is the SMAPE of this prediction? Does that sound good to you?

<b>2.E)</b> We now propose to specifically model public holidays in France to improve the performance of the model. Integrate this particular configuration into your model and calculate the new SMAPE.

Do you see an improvement?

<b>2.F)</b> We now want to test Prophet, Facebook's algorithm, on this problem. Adapt the preceding code to perform the prediction with Prophet.

<b>2.G)</b> Compare the performance of these two algorithms with SMAPE. What can you say?

<b>2.H)</b> To finish this first application, you are asked to re-train the algorithm of your choice using all the data until October 10, 2019.

Display the one-year prediction, therefore over the period extending from `2019-10-10` to `2020-10-10`. Did your algorithm anticipate COVID?

## 3. Modeling holidays with the `store_953` series and its breaks

<b>3.A)</b> We now address the case of the time series `store_953`.

Predict this series with `Silverkite` from 2019-10-10. Can you identify unpredicted dynamics in the "train" sample?

<details>
<summary><i>Click for a hint</i></summary>
Hint 1: You can create the list of holidays in a country with the following code:

```
# According to the Greykite documentation, we can extract the list
of all holidays in a specific country using the following method:
from greykite.common.features.timeseries_features import get_available_holidays_across_countries

# Select your countries
holiday_lookup_countries = ["France"]
# List the holidays
france_holidays = get_available_holidays_across_countries(
    countries=holiday_lookup_countries,
    year_start=2019,
    year_end=2020)
```

Hint 2: In the ForecastConfig, you have the option to provide a *model_component_param* attribute containing certain [events](https://linkedin.github.io/docs/0.1.0/html/pages/model_components/0400_events.html?highlight=holidays_to_model_separetely) where you expect a deviation from the usual growth and seasonality pattern. You can define a *custom_silverkite* *model_component_param* like this:
```
custom_silverkite = ModelComponentsParam(
    events = {
        "holiday_lookup_countries": ["France"],
        "holiday_pre_num_days": 1,
        "holiday_post_num_days": 1,
        "holidays_to_model_separately": france_holidays
    },
)
```

</details>

<b>3.B)</b> We now learn that the empty periods in August actually corresponded to the restaurant's vacation periods.

We want to try to model the model by giving it the information of these remarkable dates, with the assumption that they are predictable (the restaurateur plans his vacations).

Go back to your data frame containing all the data (`df`) and create a new column `closure_953`, which contains `1` when the orders are less than 10 per day, and `0` otherwise.

You can help yourself with the list of tuples below which summarizes the start and end dates of this restaurateur's vacation.

In [None]:
closures = [ ('2017-08-06', '2017-08-20'),
               ('2019-08-02', '2019-08-25'),
               ('2020-08-02', '2020-08-23'),
             ]

<b>3.C)</b> We now have the idea of ​​modeling the period of the first confinement as a vacation period. Integrate this period into the external regressors and observe the new prediction. What do you think ?

<details>
<summary><i>Click for a hint</i></summary>

Hint:
In the ForecastConfig, you have the option to declare [*regressors*](https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0700_regressors.html?highlight=add_regressor_dict) to improve the forecast, taking into account multiple features, here an example:

```
custom_silverkite = ModelComponentsParam(
    events = {
        "holiday_lookup_countries": ["France"],
        "holiday_pre_num_days": 1,
        "holiday_post_num_days": 1,
        "holidays_to_model_separately": france_holidays
    },
    regressors = {"regressor_cols":
                    [
                      "closure_953",
                    ]
                }
)
```


</details>

In [None]:
closures += [ ('2020-03-17', '2020-05-12') ]

## 4. The `store_1410` series, a pathological case

<b>4.A)</b> We are now interested in the restorer `store_1410`. The goal is to predict it over a period of one year from March 3, 2019. Do you think this task is easy?

<b>4.B)</b> Use the `Silverkite` algorithm to predict from March 3, 2019. What do you think of the results?

<b>4.C)</b> An advanced feature of `Silverkite` is to force the establishment of ["change points"](https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0500_changepoints.html?highlight=method%20custom), which are times when the inflection of the time-series changes.

Use the code below to force such a change at the very beginning of February, and re-run your prediction.

What do you think of the result?

In [None]:
custom_silverkite = ModelComponentsParam(
    events = {
        "holiday_lookup_countries": ["France"],
        "holiday_pre_num_days": 1,
        "holiday_post_num_days": 1,
        "holidays_to_model_separately": france_holidays
    },
    changepoints= {"changepoints_dict":
                   {
                      "method": "custom",
                      "dates": ["2019-2-1"] ### We force an inflection point on 1 February 2019
                   },
    }
)

## 5. To go further

Intuitively, we imagine that the weather has a significant impact on consumer behavior.

We will see if weather information can help with the prediction!

<b>5.A)</b> To see if this information helps in the prediction, **we place ourselves in the case where we have reliable weather forecasts a week in advance**.

Run the code below to establish a baseline of the performance that can be expected to predict the restaurant `store_545`, for a one week prediction.

What is the performance on the `backtest`? And on the `forecast`?

<b>5.B)</b> Now download daily weather information for the Paris region from this site: https://www.historique-meteo.net/site/export.php?ville_id=188

Load the `.csv` file in the form of a DataFrame which you will name `meteo`.

If you decided to use Google Colaboratory, you can read your google drive foldder running the following lines:

```
#from google.colab import drive
#import os
#drive.mount('/content/drive')
```


Please modify the following line to match the file path of the CSV in your drive:
```
#os.chdir('/content/drive/My Drive/Colab Notebooks/SC')
```


<b>5.C)</b> Add a column named `day_of_rain` to your `weather` dataframe, which is `True` on days when `PRECIP_TOTAL_DAY_MM > 3`.

How many rainy days are there per year in the Paris region, on average?

<b>5.D)</b> Feed this information into your main DataFrame, which should now have an additional `day_of_rain` column that describes whether or not the day was actually rainy.

<b>5.D)</b> Modify the `custom_prophet` variable to include weather information, and retrain. Do you see an improvement in prediction performance? Conclude on the impact of including this weather information in the model

<details>
<summary><i>Click for a hint</i></summary>

Hint:
In the ForecastConfig, you have the option to declare [*regressors*](https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0700_regressors.html?highlight=add_regressor_dict) to improve the forecast, taking into account multiple features, here an example:

```
custom_prophet = ModelComponentsParam(
    events = {
        "holiday_lookup_countries": ["France"],
        "holiday_pre_num_days": [1],
        "holiday_post_num_days": [1],
    }
    regressors = {"add_regressor_dict":
                  {"day_of_rain": {"mode": 'additive}
                  }
    }
)
```

</details>