Bonus: Production Lifecycle Iterations
======================================

## Monitoring Serving Performance

In [2]:
! forml model -R spark eval --lower '2014-10-21 03:00:00' forml-solution-avazuctr

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/05/23 16:03:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/23 16:03:09 WARN TaskSetManager: Stage 0 contains a task of very large size (6213 KiB). The maximum recommended task size is 1000 KiB.
23/05/23 16:03:09 WARN TaskSetManager: Stage 1 contains a task of very large size (76732 KiB). The maximum recommended task size is 1000 KiB.
23/05/23 16:03:18 WARN TaskSetManager: Stage 3 contains a task of very large size (89853 KiB). The maximum recommended task size is 1000 KiB.
23/05/23 16:03:19 WARN TaskSetManager: Stage 5 contains a task of very large size (97663 KiB). The maximum recommended task size is 1000 KiB.
23/05/23 16:03:20 WARN TaskSetManager: Stage 6 contains a task of very large size (6238 KiB). The maximum recommended task size is 1000 KiB.
23/05/23 16:03:20 W

## Incremental Model Refreshing (Model Update)

In [2]:
! forml model train --lower '2014-10-21 03:00:00' --upper '2014-10-21 05:00:00' forml-solution-avazuctr

  warn(

  warn(



In [3]:
! forml model list forml-solution-avazuctr 0.1.dev1

1  2  


In [5]:
! curl -H 'Content-Type: application/json' -d '[{ \
    "hour": "2014-10-21 05:00:00", \
    "C1": "1002", \
    "banner_pos": "0", \
    "site_id": "887a4754", \
    "site_domain": "e3d9ca35", \
    "site_category": "50e219e0", \
    "app_id": "ecad2386", \
    "app_domain": "7801e8d9", \
    "app_category": "07d7df22", \
    "device_id": "0e79d423", \
    "device_ip": "9f423918", \
    "device_model": "fc10a0d3", \
    "device_type": "0", \
    "device_conn_type": "0", \
    "C14": "22701", \
    "C15": "320", \
    "C16": "50", \
    "C17": "2624", \
    "C18": "0", \
    "C19": "35", \
    "C20": "-1", \
    "C21": "221" \
}]' http://127.0.0.1:8000/forml-solution-avazuctr

[{"c0":0.1750026552}]

In [1]:
! forml model eval --lower '2014-10-21 05:00:00' forml-solution-avazuctr

0.40221980700198084


## New Model Release (Model Upgrade)

...removing useless columns

In [2]:
%cd forml-solution-avazuctr

/opt/forml/workspace/3-solution/forml-solution-avazuctr


In [3]:
from forml import project
from forml.pipeline import payload, wrap
from avazuctr import pipeline

with wrap.importer():
    from category_encoders import TargetEncoder
    
PROJECT = project.open(path='.', package='avazuctr')
trainset = PROJECT.components.source.bind(TargetEncoder(cols=pipeline.CATEGORICAL_COLUMNS)).launcher.train()

In [4]:
import pandas
pandas.set_option('display.max_columns', None)
corr = trainset.features.corr()
corr[corr > 0.90]

Unnamed: 0,hour,C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21,dayofweek,day,month
hour,1.0,,,,,,,,,,,,,,,,,,,,,,,,
C1,,1.0,,,,,,,,,,,0.923641,,,,,,,,,,,,
banner_pos,,,1.0,,,,,,,,,,,,,,,,,,,,,,
site_id,,,,1.0,0.977386,,,,,,,,,,,,,,,,,,,,
site_domain,,,,0.977386,1.0,,,,,,,,,,,,,,,,,,,,
site_category,,,,,,1.0,,,,,,,,,,,,,,,,,,,
app_id,,,,,,,1.0,,,,,,,,,,,,,,,,,,
app_domain,,,,,,,,1.0,,,,,,,,,,,,,,,,,
app_category,,,,,,,,,1.0,,,,,,,,,,,,,,,,
device_id,,,,,,,,,,1.0,,,,,,,,,,,,,,,


We see strong correlations between the following features:
* `device_type` and `C1`
* `site_domain` and `site_id`
* `C14` and `C17` and `C21`
* `C15` and `C16`

Let's update our [avazuctr/source.py](forml-solution-avazuctr/avazuctr/source.py) and [avazuctr/pipeline.py](forml-solution-avazuctr/avazuctr/pipeline.py) to keep only the first feature from each of the sets:

1. Open the [avazuctr/source.py](forml-solution-avazuctr/avazuctr/source.py) component.
2. Edit the `FEATURES` statement to remove the `C1`, `site_id`, `C17`, `C21` and `C16` features.
```python
FEATURES = (
    schema.Avazu.select(
        schema.Avazu.hour,
        schema.Avazu.banner_pos,
        schema.Avazu.site_domain,
        schema.Avazu.site_category,
        schema.Avazu.app_id,
        schema.Avazu.app_domain,
        schema.Avazu.app_category,
        schema.Avazu.device_id,
        schema.Avazu.device_ip,
        schema.Avazu.device_model,
        schema.Avazu.device_type,
        schema.Avazu.device_conn_type,
        schema.Avazu.C14,
        schema.Avazu.C15,
        schema.Avazu.C18,
        schema.Avazu.C19,
        schema.Avazu.C20,
    )
    .orderby(schema.Avazu.hour)
    .limit(500000)
)
```
3. Save the file!

In [5]:
! git add avazuctr/source.py

1. Open the [avazuctr/pipeline.py](forml-solution-avazuctr/avazuctr/pipeline.py) component.
2. Edit the`CATEGORICAL_COLUMNS` list to remove the `C1`, `site_id`, `C17`, `C21` and `C16` features.
```python
CATEGORICAL_COLUMNS = [
    "banner_pos",
    "site_domain",
    "site_category",
    "app_id",
    "app_domain",
    "app_category",
    "device_id",
    "device_ip",
    "device_model",
    "device_type",
    "device_conn_type",
    "C14",
    "C15",
    "C18",
    "C19",
    "C20",
]
```
3. Save the file!

In [6]:
! git add avazuctr/pipeline.py

1. Open the [pyproject.toml](forml-solution-avazuctr/pyproject.toml).
2. Eddit the `version` setting it to `0.2`.
```toml
version = "0.2"
```
3. Save the file!

In [7]:
! git add pyproject.toml

In [10]:
! time forml project eval

running eval
0.4009414480776054
forml project eval  376.53s user 22.86s system 304% cpu 2:11.17 total


In [11]:
! git commit -m 'Released 0.2'
! git tag 0.2

[main 8030858] Released 0.2
 4 files changed, 8 insertions(+), 13 deletions(-)
 create mode 100644 application.py


In [23]:
! forml project release

running bdist_4ml
Looking in indexes: https://pypi.org/simple, http://127.0.0.1:9000/
Collecting category-encoders==2.6.0
  Using cached category_encoders-2.6.0-py2.py3-none-any.whl (81 kB)
Collecting forml==0.93
  Downloading http://127.0.0.1:9000/forml/forml-0.93-py3-none-any.whl (279 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m279.3/279.3 kB[0m [31m152.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting imbalanced-learn==0.10.1
  Using cached imbalanced_learn-0.10.1-py3-none-any.whl (226 kB)
Collecting lightgbm==3.3.5
  Using cached lightgbm-3.3.5-py3-none-manylinux1_x86_64.whl (2.0 MB)
Collecting openschema==0.6.dev2
  Downloading http://127.0.0.1:9000/openschema/openschema-0.6.dev2-py3-none-any.whl (14 kB)
Collecting scikit-learn==1.2.2
  Using cached scikit_learn-1.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB)
Collecting numpy>=1.14.0 (from category-encoders==2.6.0)
  Using cached numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylin

In [24]:
! forml model list forml-solution-avazuctr

0.1.dev1  0.2       


In [25]:
! forml model train --upper '2014-10-21 03:00:00' forml-solution-avazuctr

In [26]:
! forml model list forml-solution-avazuctr 0.2

1  


In [27]:
! curl -H 'Content-Type: application/json' -d '[{ \
    "hour": "2014-10-21 03:00:00", \
    "banner_pos": "0", \
    "site_domain": "e3d9ca35", \
    "site_category": "50e219e0", \
    "app_id": "ecad2386", \
    "app_domain": "7801e8d9", \
    "app_category": "07d7df22", \
    "device_id": "0e79d423", \
    "device_ip": "9f423918", \
    "device_model": "fc10a0d3", \
    "device_type": "0", \
    "device_conn_type": "0", \
    "C14": "22701", \
    "C15": "320", \
    "C18": "0", \
    "C19": "35", \
    "C20": "-1" \
}]' http://127.0.0.1:8000/forml-solution-avazuctr

[{"c0":0.1674632179}]

In [29]:
! forml model eval --lower '2014-10-21 03:00:00' forml-solution-avazuctr

0.41973887556694417
