Tutorial: Evaluation
====================

Objective:

> Learning how evaluation workflows are derived from the main ML pipeline.

Principles:

1. ML metric is a measure to assess the performance of an ML solution.
2. Development evaluation (of the logical model) is derived from both the training and predicting workflows combined according to a selected backtesting method.
3. Workflow for evaluating an already trained model is composed simply from the model predictions and the (eventually observed) true outcomes.

Development Evaluation
----------------------
experimentation, CI/CE

### Holdout Method

In [1]:
from sklearn import metrics
from forml import evaluation, project

EVALUATION = project.Evaluation(
    evaluation.Function(metrics.log_loss),
    evaluation.HoldOut(test_size=0.2, stratify=True, random_state=42),
)

In [5]:
from forml.pipeline import payload, wrap
from dummycatalog import Foo
with wrap.importer():
    from sklearn.linear_model import LogisticRegression

SOURCE = project.Source.query(Foo.select(Foo.Value), Foo.Label)
PIPELINE = LogisticRegression(random_state=42)

SOURCE.bind(PIPELINE, evaluation=EVALUATION).launcher(runner='graphviz').eval()

### Cross-validation Method

In [3]:
from sklearn import model_selection

EVALUATION = project.Evaluation(
    evaluation.Function(metrics.log_loss),
    evaluation.CrossVal(crossvalidator=model_selection.StratifiedKFold(n_splits=3, shuffle=True, random_state=42)),
)

In [6]:
SOURCE.bind(PIPELINE, evaluation=EVALUATION).launcher(runner='graphviz').eval()

ERROR: 2023-05-22 18:54:17,591: __init__: Instruction LogisticRegression(random_state=42).train failed when processing arguments:       Level  Value
0     Alpha   0.26
1     Tango   0.94
2      Zulu   0.57
3    Victor   0.61
4      Echo   0.12
5   Whiskey   0.78
6    Yankee   0.68
7   Charlie   0.35
8      Mike   0.54
9     Romeo   0.58
10  Foxtrot   0.45
11     Papa   0.59
12    Siera   0.72, 0     1
1     0
2     0
3     0
4     1
5     0
6     0
7     1
8     0
9     0
10    1
11    0
12    0
Name: Label, dtype: int64
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/forml/flow/_code/target/__init__.py", line 56, in __call__
    result = self.execute(*args)
  File "/usr/local/lib/python3.10/site-packages/forml/flow/_code/target/user.py", line 196, in execute
    return self.action(self.builder(), *args)
  File "/usr/local/lib/python3.10/site-packages/forml/flow/_code/target/user.py", line 168, in __call__
    actor.train(*args)
  File "/usr/local/lib

ValueError: could not convert string to float: 'Alpha'

Production Performance Tracking
-------------------------------

Will be demonstrated later as part of the final solution of [Avazu CTR Prediction](../3-solution).