#### Engineering

I'll start by installing the cream library:

```bash
pip install creme
```

I'm importing the packages that I'm going to need:

In [None]:
import copy
import datetime
import random
import tqdm

In [None]:
from creme import compose
from creme import feature_extraction
from creme import neighbors
from creme import metrics
from creme import optim
from creme import preprocessing
from creme import stats
from creme import stream

I use this first function to parse the date and extract the number of the day.

In [None]:
def extract_date(x):
    """Extract features from the date."""
    import datetime
    if not isinstance(x['date'], datetime.datetime):
        x['date'] = datetime.datetime.strptime(x['date'], '%Y-%m-%d')
    x['wday'] = x['date'].weekday()
    return x

``get_metadata`` allows you to extract the identifier of the product and the store.

In [None]:
def get_metadata(x):
    key = x['id'].split('_')
    x['cat_id'] = f'{key[0]}'
    x['dept_id'] = f'{x["cat_id"]}_{key[1]}'
    x['item_id'] = f'{x["cat_id"]}_{x["dept_id"]}_{key[2]}'
    return x

Below I define the feature extraction pipeline. I use the module ``feature_extraction.TargetAgg`` to calculate the features on the target variable of the stream.

In [None]:
extract_features = compose.TransformerUnion(
    compose.Select('wday'),
    
    feature_extraction.TargetAgg(by=['item_id'], how=stats.Mean()),
    feature_extraction.TargetAgg(by=['item_id'], how=stats.Var()),
    
    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(1)),
    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(30)),
    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(15)),
    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(7)),
    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(3)),
    
    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(30)),
    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(15)),
    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(7)),
    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(3)),
    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(1)),
)

Below, I define the global pipeline I want to deploy in production. The pipeline is composed of:

- Extraction of the product identifier.

- Extraction of the day number of the date $\in$ {1, 2, ..7}. 

- Computation of the features.

- Standard scaler that centers and reduces the value of features.

- Model declaration ``neighbors.KNeighborsRegressor``.

In [None]:
model = (
    compose.FuncTransformer(get_metadata) |
    compose.FuncTransformer(extract_date) |
    extract_features |
    preprocessing.StandardScaler() |
    neighbors.KNeighborsRegressor(window_size=30, n_neighbors=15)
)

I have chosen to create one template per product and per store. The piece of code below create a copy of the pipeline for all product/store pairs and store them in a dictionary.

In [None]:
list_model = []

X_y = stream.iter_csv('./data/sample_submission.csv', target_name='F8')

for x, y in tqdm.tqdm(X_y, position=0):
    
    item_id = '_'.join(x['id'].split('_')[:5])
    
    if item_id not in list_model:
    
        list_model.append(item_id)
        
dic_models = {item_id: copy.deepcopy(model) for item_id in tqdm.tqdm(list_model, position=0)}

I make a warm-up of all the models from a subset of the training set. To do this pre-training, I selected the last two months of the training set and saved it in csv format.I use Creme's ``stream.iter_csv`` module to iterate on the training dataset. The pipeline below consumes very little RAM memory because we load the data into the memory one after the other.

In [None]:
random.seed(42)

params = dict(
    target_name='y', 
    converters={
        'y': int, 
        'id': str,
    },
    parse_dates= {'date': '%Y-%m-%d'},
)

X_y = stream.iter_csv('./data/train.csv', **params)

bar = tqdm.tqdm(X_y, position = 0)

metric = metrics.Rolling(metrics.MAE(), 300000)

for i, (x, y) in enumerate(bar):
    
    item_id = '_'.join(x['id'].split('_')[:5])

    # Predict:
    y_pred = dic_models[item_id].predict_one(x)

    # Update the model:
    dic_models[item_id].fit_one(x=x, y=y)

    # Update the metric:
    metric = metric.update(y, y_pred)
    
    if i % 4000 == 0:

        # Update tqdm progress bar every 4000 iterations.
        bar.set_description(f'MAE: {metric.get():4f}')

#### Deployment of the model:

Now that all the models are pre-trained, I will be able to deploy the pipelines behind an API in a production environment. I will use the [Chantilly](https://github.com/creme-ml/chantilly) library to do so.

```
pip install git+https://github.com/creme-ml/chantilly
```

After installing Chantilly, I start the chantilly instance with the bash command:

```bash
chantilly run
```

I'm going to associate the regression flavor with the Chantilly API. Chantilly uses this flavor to select the appropriate metrics (MAE, MSE and SMAPE). Finally, I deploy all my models in production. Each model is identifiable by its name which is composed of the product identifier and the store identifier.

In [None]:
import requests

requests.post('http://127.0.0.1:5000/api/init', json= {'flavor': 'regression'})

After initializing the whipped cream API, I upload all the templates I've pre-trained. Each model has a name. This name is composed of the product and store ID. I use dill to serialize the model before uploading it to my API.

In [None]:
import dill

for model_name, model in dic_models.items():
    
    r = requests.post('http://localhost:5000/api/model/{}'.format(model_name), data=dill.dumps(model))

All the models are now deployed in production and available to make predictions. The models can also be updated on a daily basis. That's it.

![](images/online_learning.png)

**As you may have noticed, the philosophy of online learning allows to reduce the complexity of the deployment of a machine learning algorithm in production. Moreover, to update the model, we only have to make calls to the API. We don't need to re-train the model from scratch.**

#### Make a prediction by calling the API:

In [None]:
json = {
    'id': 1,
    'model': 'HOBBIES_1_001_CA_1',
    'features': {'date': '2020-04-30', 'id': 'HOBBIES_1_001_CA_1'}
}

r = requests.post('http://localhost:5000/api/predict', json=json)

In [None]:
print(r.json())

#### Update models with new data:

In [None]:
r = requests.post('http://localhost:5000/api/learn', json={
    'id': 1,
    'model': 'HOBBIES_1_001_CA_1',
    'ground_truth': 2,
})