# Advanced Tutorial: ML Scoring Metrics

This tutorial goes in depth on the various types of `MLScoringMetrics`, and how to use them to score machine learning models.

You will learn how to:
1. **Call a scoring metric directly**, to test how it works.
1. **Configure scoring metrics on an MLPipe**, used to score after training or to score on new test data.
1. Disable automatic **scoring after training**.
1. **Configure** the behavior of a scoring metric via **parameters** on the metric.
1. Use a scoring metric from the **sklearn** library.
1. Use scoring metrics that do **not require ground truth** to score.
1. Create your own **custom scoring metrics**.

This is a self-contained notebook, but it will skip the explanation of some of the common setup steps (connecting to c3, creating datasets, creating a model) that were covered in the *Getting Started Tutorial: ML Pipeline* notebook (`TutorialIntroMLPipeline.ipynb`). 

## Setup

Before running the notebook below, please make sure you are in the `py-sklearn_3_0_0` kernel (Kernel -> Change kernel -> py-sklearn_3_0_0). If this kernel is not listed, install the kernel (Kernel -> Manage Kernels -> py-sklearn_3_0_0).

### Create datasets

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

In [2]:
iris = load_iris()
# Tuple of (training X, test X, training y, test y)
datasets_np = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

In [3]:
# Train and test datasets
XTrain, XTest, yTrain, yTest = [c3.Dataset.fromPython(pythonData=ds_np) for ds_np in datasets_np]

### Create a model

In [4]:
logisticRegression = c3.SklearnPipe(
                        name="logisticRegression",
                        technique=c3.SklearnTechnique(
                            name="linear_model.LogisticRegression",
                            processingFunctionName="predict",
                            hyperParameters={"random_state": 42,
                                             "C": .1}  # regularization to get errors
                        ),
                     )

In [5]:
# NotVerify: result
trainedLr = logisticRegression.train(input=XTrain, targetOutput=yTrain)
trainedLr

c3.SklearnPipe(
 name='logisticRegression',
 implType='SklearnPipeV7d13',
 noTrainScore=False,
 typeVersion='7.14.0',
 untrainableOverride=False,
 technique=c3.SklearnTechnique(
             name='linear_model.LogisticRegression',
             hyperParameters=c3.Mapp<string, any>({'C': 0.1,
                               'random_state': 42}),
             processingFunctionName='predict',
             keepInputColumnIndices=False),
 trainedModel=c3.MLTrainedModelArtifact(
                model='eJxlVFtv3EQY9SbbC+69AZqW0AspJS3pkk3bUNqCQ1MIxGVLDRSrDwxjr3dt8Np77HGTfUBKqHabLUIIoVY0XFTaFxDi0nKRQEgrzfwTXvkRMJ5sQcBYtj+fsb/znTnfeHHATd8OPZpEpTCI5J004qoXlkgY14OUBa5+th9YXj3x0jSIIx3awSUU3sHAmL1O07SmF9GQtTBoD8incBJFuyiDakZDrOnagzJmcYi1s8auqd//GJk5bBckNIN1s8YvN/KxYm+SQC1gJIiYl7hek2H9sr1Ngn8DJHWprLCO+8yCvVHOuCFNUzLvBXWfQa8oLKFRNW6QlFHmYYN5yF4rwTQOL3kJNtpr8uqcWj3FJnu9jBt0gQQyPTabVXtDDmQhC4hKjC1KA81YjK1KpczhxKmHbaZm6/J5niaKKWHY3lVEEXkrdlIMVVT2sEwSyoIY91fszWq25lGWyTWUKgkeMIvqNcUmMTzoVoNQLnx+1UndY4QylujY

### Generate output for scoring from model

In [6]:
predictTest = trainedLr.process(input=XTest)

In [7]:
concatResult = c3.Dataset.concatenate(tensors=[predictTest, yTest], dimension=1)
c3.Dataset.toPandas(dataset=concatResult)

Unnamed: 0,prediction,0
0,1.0,1.0
1,0.0,0.0
2,2.0,2.0
3,1.0,1.0
4,1.0,1.0
5,0.0,0.0
6,1.0,1.0
7,2.0,2.0
8,1.0,1.0
9,1.0,1.0


## Using Scoring Metrics Directly

C3 defines several built-in machine learning scoring metrics, such as `MLAccuracyMetric`, `MLF1ScoreMetric`, `MLRSquaredMetric`.  All these scoring metrics implement a common interface, defined in the abstract Type `MLScoringMetric`.

To see a list of available ML scoring metrics, run `c3ShowType(MLScoringMetric)` at the Javascript client console, and navigate to Types that mix in the `MLScoringMetric` Type using the hyperlinks under "Used by".  You may need to navigate a few levels down the inheritance hierarchy to see several of the built-in metrics (many are under the "Used by" hyperlinks under `MLScoringMetricWithTruth`).

To use a scoring metric, you have to create an instance of the Type first.

In [8]:
accuracy_metric = c3.MLAccuracyMetric()  # this metric works for multi-class classification
accuracy_metric.score(output=predictTest, targetOutput=yTest)

1.0

## Configure scoring metrics on an `MLPipe`
If you wish to score the machine learning model in an `MLPipe`, you need to configure a scoring metric (or multiple scoring metrics) on the `MLPipe` first.  

In [9]:
# NotVerify: result
# The scoringMetricList argument below can take a list of multiple scoring metric objects
trainedLr.scoringMetrics = c3.MLScoringMetric.toScoringMetricMap(
                              scoringMetricList=[c3.MLAccuracyMetric()])
trainedLr

c3.SklearnPipe(
 name='logisticRegression',
 implType='SklearnPipeV7d13',
 noTrainScore=False,
 scoringMetrics=c3.Mapp<string, MLScoringMetric>({'MLAccuracyMetric': c3.MLAccuracyMetric()}),
 typeVersion='7.14.0',
 untrainableOverride=False,
 technique=c3.SklearnTechnique(
             name='linear_model.LogisticRegression',
             hyperParameters=c3.Mapp<string, any>({'C': 0.1,
                               'random_state': 42}),
             processingFunctionName='predict',
             keepInputColumnIndices=False),
 trainedModel=c3.MLTrainedModelArtifact(
                model='eJxlVFtv3EQY9SbbC+69AZqW0AspJS3pkk3bUNqCQ1MIxGVLDRSrDwxjr3dt8Np77HGTfUBKqHabLUIIoVY0XFTaFxDi0nKRQEgrzfwTXvkRMJ5sQcBYtj+fsb/znTnfeHHATd8OPZpEpTCI5J004qoXlkgY14OUBa5+th9YXj3x0jSIIx3awSUU3sHAmL1O07SmF9GQtTBoD8incBJFuyiDakZDrOnagzJmcYi1s8auqd//GJk5bBckNIN1s8YvN/KxYm+SQC1gJIiYl7hek2H9sr1Ngn8DJHWprLCO+8yCvVHOuCFNUzLvBXWfQa8oLKFRNW6QlFHmYYN5yF4rwTQOL3kJNtpr8uqcWj3FJnu9jBt0gQQyPTabVXtDDmQhC4hKjC1KA81YjK1Kpczhx

`c3.MLScoringMetric.toScoringMetricMap()` above is a convenience function, to generate the keys to the `scoringMetrics` C3 Mapp field by calling the `MLScoringMetric.toString()` member function on the scoring metric object.  Using consistent names for the scoring metric (by calling `MLScoringMetric.toString()`) will make it easier to compare scores across different models / `MLPipe`s.

In [10]:
c3.MLAccuracyMetric().toString()

'MLAccuracyMetric'

### Scoring on test set

In [11]:
# Now you can call score() on the MLPipe
scoresMapp = trainedLr.score(input=XTest, targetOutput=yTest)
scoresMapp

c3.Mapp<string, double>({'MLAccuracyMetric': 1.0})

Notice that the key to the score is the same as the key to the scoring metric the score was generated from.

### Scoring after training
If you configure a scoring metric on an `MLPipe` and set `MLPipe.noTrainScore = False` (the default value), when you call `MLPipe.train()` the scoring metric will be used to score the `MLPipe` on the entire training Dataset.

In [12]:
# NotVerify: result
logisticRegression.scoringMetrics = c3.MLScoringMetric.toScoringMetricMap(
                                        scoringMetricList=[c3.MLAccuracyMetric()])
# logisticRegression.noTrainScore defaults to False.  You may also set it explicitly.
trainedLr = logisticRegression.train(input=XTrain, targetOutput=yTrain)
trainedLr  # notice that trainedLr.trainingScores is now populated

c3.SklearnPipe(
 name='logisticRegression',
 implType='SklearnPipeV7d13',
 noTrainScore=False,
 scoringMetrics=c3.Mapp<string, anyof(MLScoringMetric<any,any>)>({'MLAccuracyMetric': c3.MLAccuracyMetric()}),
 trainingScores=c3.Mapp<string, double>({'MLAccuracyMetric': 0.95}),
 typeVersion='7.14.0',
 untrainableOverride=False,
 technique=c3.SklearnTechnique(
             name='linear_model.LogisticRegression',
             hyperParameters=c3.Mapp<string, any>({'C': 0.1,
                               'random_state': 42}),
             processingFunctionName='predict',
             keepInputColumnIndices=False),
 trainedModel=c3.MLTrainedModelArtifact(
                model='eJxlVFtv3EQY9SbbC+69AZqW0AspJS3pkk3bUNqCQ1MIxGVLDRSrDwxjr3dt8Np77HGTfUBKqHabLUIIoVY0XFTaFxDi0nKRQEgrzfwTXvkRMJ5sQcBYtj+fsb/znTnfeHHATd8OPZpEpTCI5J004qoXlkgY14OUBa5+th9YXj3x0jSIIx3awSUU3sHAmL1O07SmF9GQtTBoD8incBJFuyiDakZDrOnagzJmcYi1s8auqd//GJk5bBckNIN1s8YvN/KxYm+SQC1gJIiYl7hek2H9sr1Ngn8DJHWprLCO+8yCvVHOuCFNUzLvBXWfQa8o

The scores computed from the training data are stored in a field on the `MLPipe`, because they are a characteristic of the model.  We chose NOT to store any scores from the test data on the `MLPipe` because the test data is no intrinsically part of the model.

In [13]:
# NotVerify: result
logisticRegression.noTrainScore = True
trainedLr = logisticRegression.train(input=XTrain, targetOutput=yTrain)
trainedLr # notice that trainedLr.trainingScores is NOT populated

c3.SklearnPipe(
 name='logisticRegression',
 implType='SklearnPipeV7d13',
 noTrainScore=True,
 scoringMetrics=c3.Mapp<string, anyof(MLScoringMetric<any,any>)>({'MLAccuracyMetric': c3.MLAccuracyMetric()}),
 typeVersion='7.14.0',
 untrainableOverride=False,
 technique=c3.SklearnTechnique(
             name='linear_model.LogisticRegression',
             hyperParameters=c3.Mapp<string, any>({'C': 0.1,
                               'random_state': 42}),
             processingFunctionName='predict',
             keepInputColumnIndices=False),
 trainedModel=c3.MLTrainedModelArtifact(
                model='eJxlVFtv3EQY9SbbC+69AZqW0AspJS3pkk3bUNqCQ1MIxGVLDRSrDwxjr3dt8Np77HGTfUBKqHabLUIIoVY0XFTaFxDi0nKRQEgrzfwTXvkRMJ5sQcBYtj+fsb/znTnfeHHATd8OPZpEpTCI5J004qoXlkgY14OUBa5+th9YXj3x0jSIIx3awSUU3sHAmL1O07SmF9GQtTBoD8incBJFuyiDakZDrOnagzJmcYi1s8auqd//GJk5bBckNIN1s8YvN/KxYm+SQC1gJIiYl7hek2H9sr1Ngn8DJHWprLCO+8yCvVHOuCFNUzLvBXWfQa8oLKFRNW6QlFHmYYN5yF4rwTQOL3kJNtpr8uqcWj3FJnu9jBt0gQQyPTabVXtDDmQhC4hKjC

> *NOTE*: `MLPipe.train()` will fail if the `MLPipe` is configured with a scoring metric that requires ground truth and you do not pass the `targetOutput` argument to `train()` (i.e. calling `MLPipe.train(input)` for *unsupervised* learning).  The `targetOutput` argument to `train()` serves as the ground truth for scoring.

## Configuring parameters on a scoring metric
Some scoring metrics have parameters that configure how they score.  These parameters will be defined as fields on the Type.  For example, try `c3ShowType(MLPrecisionMetric)` in the Javascript client console.

In [14]:
# Modify data for binary classification, so we can apply binary classification metric
yBinaries_np = datasets_np[2:]  # yTrain, yTest
for yBinary_np in yBinaries_np:
    yBinary_np[yBinary_np > 0] = 1  # Manipulate numpy data to only have 2 classes
yTrainBinary, yTestBinary = [c3.Dataset.fromPython(pythonData=ds_np) for ds_np in yBinaries_np]

In [15]:
# Train a new model for binary classification 
binaryLr = c3.SklearnPipe(
               name="logisticRegression",
               technique=c3.SklearnTechnique(
                   name="linear_model.LogisticRegression",
                   processingFunctionName="predict_proba",  # to compute the precision, we use predicted probabilities
                   hyperParameters={"random_state": 42,
                                    "C": 0.001},  # strong regularization to get errors
               ),
               scoringMetrics=c3.MLScoringMetric.toScoringMetricMap(
                   scoringMetricList=[c3.MLPrecisionMetric(threshold=0.1)])
            )
trainedLr = binaryLr.train(input=XTrain, targetOutput=yTrainBinary)

In [16]:
trainedLr.score(input=XTest, targetOutput=yTestBinary)

c3.Mapp<string, double>({'MLPrecisionMetric, threshold=0.1': 0.6666666666666666})

In [17]:
# Score Trained Model over multiple scoring metrics
trainedLr.scoringMetrics = c3.MLScoringMetric.toScoringMetricMap(
                               scoringMetricList=[c3.MLPrecisionMetric(), c3.MLPrecisionMetric(threshold=0.61), c3.MLAccuracyMetric()])
trainedLr.score(input=XTest, targetOutput=yTestBinary)

c3.Mapp<string, double>({'MLAccuracyMetric': 0.6666666666666666,
 'MLPrecisionMetric, threshold=0.0': 0.6666666666666666,
 'MLPrecisionMetric, threshold=0.61': 0.6896551724137931})

In [18]:
# Change the scoring metric to an impossible to cross threshold
trainedLr.scoringMetrics = c3.MLScoringMetric.toScoringMetricMap(
                               scoringMetricList=[c3.MLPrecisionMetric(threshold=2)])
trainedLr.score(input=XTest, targetOutput=yTestBinary)  # Should give a score of 0

c3.Mapp<string, double>({'MLPrecisionMetric, threshold=2.0': 0.0})

## Category of a scoring metric

We saw in a the previous sections that we can configure an `MLPipe` to compute scoring metrics during training or on a test dataset. 
We also saw that those metrics can be computed:
- on target labels (like `MLAccuracyMetric`); in that case, we set the `processingFunctionName` of the pipe to `predict`; or
- on output probabilities (like `MLPrecisionMetric`); in that case, we set the `processingFunctionName` to `predict_proba`.

You may have noticed that in the last example, we computed the two "kinds" of scoring metrics with the same pipe and the same `processingFunctionName` (here `predict_proba`). So we should expect the results to be incorrect (in fact, the metric `MLAccuracyMetric` would throw an error if it had received probabilities as an input). 

In fact, the function `SklearnPipe.score()` is aware of the kind of scoring metrics it has to evaluate, and it overrides the `processingFunctionName` to compute the correct input for each metric.
This is encoded in the type `MLScoringMetricCategory`:

In [19]:
c3.MLAccuracyMetric().category()  # this scoring metric expects target labels (or PREDICTIONS)

'PREDICTION'

In [20]:
c3.MLPrecisionMetric(threshold=0.5).category()  # this scoring metric expects probabilities

'PROBABILITY'

The `MLPipe` that mix-in `SklearnApiCommon` (e.g. `SklearnPipe`, `XgBoostPipe`, `LightGbmPipe`, `CatBoostPipe` and `StatsModelsTsaPipe`) leverage that information to change the `processingFunctionName` during scoring, if needed.

### Category override

It is possible to override that behavior with the parameter `MLScoringMetric.categoryOverride`:

In [21]:
precision_metric = c3.MLPrecisionMetric(threshold=0.61, categoryOverride="PREDICTION")
precision_metric.category()

'PREDICTION'

In [22]:
trainedLr.scoringMetrics = {
    'MLPrecisionMetric, threshold=0.61': c3.MLPrecisionMetric(threshold=0.61),
    'MLPrecisionMetric, threshold=0.61, categoryOverride=PREDICTION': precision_metric,
    'MLAccuracyMetric': c3.MLAccuracyMetric()
}
trainedLr.score(input=XTest, targetOutput=yTestBinary)

c3.Mapp<string, double>({'MLAccuracyMetric': 0.6666666666666666,
 'MLPrecisionMetric, threshold=0.61': 0.6896551724137931,
 'MLPrecisionMetric, threshold=0.61, categoryOverride=PREDICTION': 0.6666666666666666})

In the example above, the precision with the category override is not computed correctly since labels (0/1) were used instead of probabilities.

### Data selection for scoring

In some cases, the output of an `MLPipe` has not the same "shape" as the target output used to train the pipe.
A simple example is the case of a binary classification pipe with its processing function name set to `predict_proba`: the target output is a dataset with one column and N rows and the output is a dataset with two columns and N rows (the first column is the probability of the class 0 and the second one is the probability of the class 1).

We need a way to select the correct column to compute the score, for example if we want to compute the precision.

To do so, we can use the field `dataSelector` on `MLSklearnCommonTechnique` (which is subtyped by `SklearnTechnique`, `XgBoostTechnique`, etc.).
It is a mapping between a processing function name and an array of column indices: when a scoring metric requires that processing function name, these columns will be extracted and fed to the function `MLScoringMetric.score()`.

In [23]:
binaryLr = c3.SklearnPipe(
               name="logisticRegression",
               technique=c3.SklearnTechnique(
                   name="linear_model.LogisticRegression",
                   processingFunctionName="predict_proba",  # to compute the precision, we use predicted probabilities
                   hyperParameters={"random_state": 42,
                                    "C": 0.001},  # strong regularization to get errors
                   dataSelectorForScoring={
                       "predict_proba": [1]  # use the column of index 1 to compute the scores (probabilities of class 1)
                   }
               ),
               scoringMetrics=c3.MLScoringMetric.toScoringMetricMap(
                   scoringMetricList=[c3.MLPrecisionMetric(threshold=0.1)])
            )
trainedLr = binaryLr.train(input=XTrain, targetOutput=yTrainBinary)
trainedLr.trainingScores

c3.Mapp<string, double>({'MLPrecisionMetric, threshold=0.1': 0.6666666666666666})

Note that for the scoring metrics that subtype `MLBinaryScoringMetricWithTruth` (e.g. `MLPrecisionMetric`), when the value of the argument `output` in `MLBinaryScoringMetricWithTruthngMetric.score(output, targetOutput)` is a dataset with two columns, then the column of index 1 is used to compute the score (so the field `dataSelectorForScoring` is superfluous in that specific case).

## Using scoring metrics from `sklearn`
Scikit-learn also provides several [pre-defined scoring metrics](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).  If you choose to use these, you can configured the pre-defined Type `SklearnScoringMetric` to call scikit-learn to score your model.

We will use the [F-Beta scoring metric](https://en.wikipedia.org/wiki/F1_score#Definition) to demonstrate how to use `SklearnScoringMetric` when there are configurable parameters.  For brevity, we will score directly using `yTest` and `predictTest` computed above, but `SklearnScoringMetric` can also be saved in `MLPipe.scoringMetrics`.

In [24]:
# Documentation for this metric at
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score
from sklearn.metrics import fbeta_score  # Used below to demonstrate we are calling sklearn, getting same values

In [25]:
# Notice that we must set `higherIsWorse` below, used by implementation of `higherIsBetter()`
fbeta = c3.SklearnScoringMetric(name='fbeta_score', higherIsWorse=False,
                                parameters={'beta': 2, 'average': 'macro'})
fbeta

c3.SklearnScoringMetric(
 parameters=c3.Mapp<string, any>({'beta': 2, 'average': 'macro'}),
 name='fbeta_score',
 higherIsWorse=False)

In [26]:
# Compare with sklearn's fbeta_score
print('sklearn directly:     ' + 
      str(fbeta_score(c3.Dataset.toPandas(dataset=yTest),
                      c3.Dataset.toPandas(dataset=predictTest), beta=2, average='macro')))
print('SklearnScoringMetric: ' +
      str(fbeta.score(targetOutput=yTest, output=predictTest)))

sklearn directly:     1.0
SklearnScoringMetric: 1.0


In [27]:
# Change beta to demonstrate that parameters are actually applied
fbeta1000 = c3.SklearnScoringMetric(name='fbeta_score', higherIsWorse=False,
                                    parameters={'beta': 1000, 'average': 'macro'})
print('sklearn directly:     ' + 
      str(fbeta_score(c3.Dataset.toPandas(dataset=yTest),
                      c3.Dataset.toPandas(dataset=predictTest), beta=1000, average='macro')))
print('SklearnScoringMetric: ' +
      str(fbeta1000.score(targetOutput=yTest, output=predictTest)))

sklearn directly:     1.0
SklearnScoringMetric: 1.0


> *NOTE*: `SklearnScoringMetric.toString()` will embed parameter values into the autogenerated name, used by `MLScoringMetric.toScoringMetricMap()` as keys to the returned C3 Mapp.  This allows you to configure 2 F-Beta metrics with different Beta on the same `MLPipe` for scoring, because they will have distinct names.

In [28]:
fbeta1000.toString()  # The auto-generated name contains parameter values

'SklearnScoringMetric:fbeta_score, beta=1000, average=macro'

Some `MLScoringMetric`s support the `predict_proba` processing function:

In [29]:
# Preparing pipe and datasets
gbt = c3.SklearnPipe(
    name="gbt",
    technique=c3.SklearnTechnique(
        name="linear_model.LogisticRegression",
        processingFunctionName="predict",
        dataSelectorForScoring={
            "predict_proba": [1],
        },
    ),
    scoringMetrics=c3.MLScoringMetric.toScoringMetricMap([
        c3.SklearnScoringMetric(
            name='roc_auc_score',
            categoryOverride=c3.MLScoringMetricCategory.PROBABILITY,
        ),
    ]),
)
dataset_X = c3.Dataset(**{
    "m_data": [
        5, 4,
        55, 5,
        5, 4,
        55, 5
    ],
    "indices":{
        "0":['id0','id1','id2','id3'],
        "1":['feat0','feat1']
    },
    "shape":[4,2]
})

dataset_y = c3.Dataset(**{
    "m_data":[
        0,
        1,
        1,
        1,
    ],
    "indices":{
        "0":['id0','id1','id2','id3'],
        "1":['target', 'target2']
    },
    "shape":[4,1]
})

y_true = c3.Dataset.toNumpy(dataset_y).flatten()

In [30]:
# Comparing c3 and native roc_auc_score

trainedPipe = gbt.train(dataset_X, dataset_y)
y_scores = trainedPipe.process(dataset_X).flattenedData()

print('roc_auc_score on c3 platform during training is ' + str(trainedPipe.trainingScores['SklearnScoringMetric:roc_auc_score, categoryOverride=PROBABILITY']))
print('roc_auc_score on c3 platform is ' + str(trainedPipe.score(dataset_X, dataset_y)['SklearnScoringMetric:roc_auc_score, categoryOverride=PROBABILITY']))

roc_auc_score on c3 platform during training is 0.8333333333333333
roc_auc_score on c3 platform is 0.8333333333333333


Note that `gbt` is defined with `processingFunctionName='predict'`. But `predict_proba` is used for scoring because `SklearnScoringMetric` metric defined with `categoryOverride=c3.MLScoringMetricCategory.PROBABILITY`, which indicates that this scoring metric is working with probabilities.

> When using an `SklearnScoringMetric` with an `MLPipe`, it is recommended to always specify a `categoryOverride` since the category of a scikit-learn scoring metric cannot be known statically.

In the example above, the `roc_auc_score` expects probabilities, so we use the category `PROBABILITY`; we also specify the data selector for scoring (to extract the column of index 1 corresponding to the probabilities of the positive class).

For testing with native Sklearn we need to modify pipe to return probabilities:

In [31]:
# Pipe that 
gbt = c3.SklearnPipe(
    name="gbt",
    technique=c3.SklearnTechnique(
        name="linear_model.LogisticRegression",
        processingFunctionName="predict_proba",
        dataSelectorForScoring={
            "predict_proba": [1],
        },
    ),
    scoringMetrics=c3.MLScoringMetric.toScoringMetricMap([
        c3.SklearnScoringMetric(
            name='roc_auc_score',
            categoryOverride=c3.MLScoringMetricCategory.PROBABILITY,
        ),
    ]),
)

trainedPipe = gbt.train(dataset_X, dataset_y)
y_scores = trainedPipe.process(dataset_X).flattenedData()

from sklearn.metrics import roc_auc_score
print('roc_auc_score on native Sklearn ' + str(roc_auc_score(y_true, y_scores[1::2]))) 
print('roc_auc_score on c3 platform during training is ' + str(trainedPipe.trainingScores['SklearnScoringMetric:roc_auc_score, categoryOverride=PROBABILITY']))
print('roc_auc_score on c3 platform is ' + str(trainedPipe.score(dataset_X, dataset_y)['SklearnScoringMetric:roc_auc_score, categoryOverride=PROBABILITY']))

roc_auc_score on native Sklearn 0.8333333333333333
roc_auc_score on c3 platform during training is 0.8333333333333333
roc_auc_score on c3 platform is 0.8333333333333333


## Using scoring metrics without ground truth
There are a few scoring metrics that do not require ground truth to score. In case of such metrics just pass `null` to the `targetOuput` argument of `score` method, as example: `MLScoringMetric.score(output, null, input, spec)`.

As noted in the documentation of `MLScoringMetric` (e.g. run `c3ShowType(MLScoringMetric)` at JS console), only ONE of the optional methods should be implemented by a descendant Type.  `MLPipe.score()` will dispatch the implemented optional method on the scoring metric to score the `MLPipe`.  Defining multiple optional methods on one `MLScoringMetric` will make the dispatch ambiguous.

An example of a pre-defined Type scoring without ground truth is `SklearnScoringMetricNoTruth`, which can be configured to work with the [Silhouette Coefficient](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score) or the [Calinski-Harabaz Score (Variance Ratio Criterion)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabaz_score.html#sklearn.metrics.calinski_harabaz_score).

In [32]:
c3.Dataset.toNumpy(dataset=predictTest).ravel()

array([1., 0., 2., 1., 1., 0., 1., 2., 1., 1., 2., 0., 0., 0., 0., 1., 2.,
       1., 1., 2., 0., 2., 0., 2., 2., 2., 2., 2., 0., 0.])

In [33]:
# NotVerify: stdout
from sklearn.metrics import silhouette_score
# Notice that we must set `higherIsWorse` below, used by implementation of `higherIsBetter()`
silhouette = c3.SklearnScoringMetricNoTruth(name='silhouette_score', higherIsWorse=False,
                                            parameters={'random_state': 1})
# Below, predictTest is converted to a 1d numpy array to suppress an annoying warning
print('sklearn directly:     ' + 
      str(silhouette_score(X=c3.Dataset.toPandas(dataset=XTest),
                           labels=c3.Dataset.toNumpy(dataset=predictTest).ravel(), random_state=1)))
print('SklearnScoringMetricNoTruth: ' +
      str(silhouette.score(output=predictTest, input=XTest)))

sklearn directly:     0.5538417178134319
SklearnScoringMetricNoTruth: 0.5538417178134319


We can save a "no ground truth" scoring metric on an MLPipe, and use it to score on the training data.

In [34]:
# NotVerify: result
binaryLr = c3.SklearnPipe(
               name="logisticRegression",
               technique=c3.SklearnTechnique(
                   name="linear_model.LogisticRegression",
                   processingFunctionName="predict",
                   hyperParameters={"random_state": 42}
               ),
               scoringMetrics=c3.MLScoringMetric.toScoringMetricMap(
                   scoringMetricList=[silhouette])  # use the silhouette metric we created above
            )
trainedLr = binaryLr.train(input=XTrain, targetOutput=yTrainBinary)
trainedLr  # Score on training data stored in `trainingScores` field

c3.SklearnPipe(
 name='logisticRegression',
 implType='SklearnPipeV7d13',
 noTrainScore=False,
 scoringMetrics=c3.Mapp<string, anyof(MLScoringMetric<any,any>)>({'SklearnScoringMetricNoTruth:silhouette_score, random_state=1': c3.SklearnScoringMetricNoTruth(
                                                                                   parameters=c3.Mapp<string, any>({'random_state': 1}),
                                                                                   name='silhouette_score',
                                                                                   higherIsWorse=False)}),
 trainingScores=c3.Mapp<string, double>({'SklearnScoringMetricNoTruth:silhouette_score, random_state=1': 0.6821905014050882}),
 typeVersion='7.14.0',
 untrainableOverride=False,
 technique=c3.SklearnTechnique(
             name='linear_model.LogisticRegression',
             hyperParameters=c3.Mapp<string, any>({'random_state': 42}),
             processingFunctionName='predict',
        

## Creating your own custom scoring metrics
You may create your own custom scoring metric by extending the `MLScoringMetric` interface Type and implementing the methods (overriding the default implementation of methods like `toString()` as necessary).  Alternately (and preferably), you may extend the `MLScoringMetricWithTruth` abstract Type, which provides an implementation of `score()` that will call `scoreDouble()` or `scoreDoubleNested()` (if `output` is multidimensional), which you would have to implement.  It may be easier to implement `scoreDouble()` (`scoreDoubleNested()`) than `score()`, which is overloaded for different input Types and will likely get new overloads in the future. 

There is a scoring spec `MLScoreSpec` that can be passed to `MLScoringMetric.score()`. You can use you own spec to customize behaviour of `MLScoringMetric.score()` then do scoring. If you passing spec to `MLPipe.score()` it will be passed to `MLScoringMetric.score()`.

Read the documentation on `MLScoringMetric` or `MLScoringMetricWithTruth` carefully (e.g. run `c3ShowType(MLScoringMetric)` at JS console), to see what your custom scoring metric needs to implement.

Below is an example custom metric's `c3typ` declaration, followed by its `js` implementation, to serve as a reference when you implement your own scoring metric Type.

This is the end of the tutorial, but check back in the future.  If there is enough demand from users, we will implement a pre-defined Type `MLLambdaMetric` that allows you to use C3 lambdas as scoring metrics (DATA-2678), and update this notebook to show how to use it.