![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FDev&dt=05+Evaluations+-+Testing+Out.ipynb)

---
## Custom Evaluation

Using the test data, calculate a series of metrics using [scikit-learn metrics](https://scikit-learn.org/stable/modules/model_evaluation.html).  Using TFIO to read the batches from BigQuery means the first step is getting the predictions and actual values into numpy arrays:

In [37]:
predictions = model.predict(test)

actuals = np.empty(shape = [0, predictions.shape[1]])
for features, target in test.take(-1): # -1 indicates all batches
    actuals = np.append(actuals, target.numpy(), axis = 0)

predictions_proba = np.max(predictions, axis = 1)
predictions = np.argmax(predictions, axis = 1)
actuals = np.argmax(actuals, axis = 1)

In [38]:
from sklearn import metrics as metrics

In [39]:
metrics.log_loss(actuals, predictions)

0.021812601666304183

In [40]:
metrics.accuracy_score(actuals, predictions)

0.9993684653708512

In [41]:
metrics.confusion_matrix(actuals, predictions).astype(np.float32).tolist()

[[28449.0, 6.0], [12.0, 35.0]]

In [61]:
cm = []
for threshold in np.linspace(0, 1, 101):
    preds = (predictions_proba > threshold).astype('float')
    cm.append({
        "confidenceThreshold": threshold,
        "precision": metrics.precision_score(actuals, preds),
        "recall": metrics.recall_score(actuals, preds),
        "f1score": metrics.f1_score(actuals, preds),
        "f1scoreMicro": metrics.f1_score(actuals, preds, average = 'micro'),
        "f1scoreMacro": metrics.f1_score(actuals, preds, average = 'macro'),
        #"confusionMatrix": {
        #    "annotationSpecs": [{"displayName": '0', 'id': '0'}, {"displayName": '1', 'id': '1'}],
        #    "rows": metrics.confusion_matrix(actuals, preds).astype(np.float32).tolist()
        #}
    })

  _warn_prf(average, modifier, msg_start, len(result))


In [66]:
model_metrics = {
    "auPrc": metrics.average_precision_score(actuals, predictions),
    "auRoc": metrics.roc_auc_score(actuals, predictions),
    "logLoss": metrics.log_loss(actuals, predictions),
    #"confidenceMetrics": cm,
    "confusionMatrix": {
        "annotationSpecs": [{"displayName": '0'}, {"displayName": '1'}],
        "rows": metrics.confusion_matrix(actuals, predictions).astype(np.float32).tolist()
    }
}

In [67]:
model_metrics

{'auPrc': 0.6361241886283929,
 'auRoc': 0.87223499590619,
 'logLoss': 0.021812601666304183,
 'confusionMatrix': {'annotationSpecs': [{'displayName': '0'},
   {'displayName': '1'}],
  'rows': [[28449.0, 6.0], [12.0, 35.0]]}}

In [72]:
model_metrics = {'auPrc': 0.6361241886283929,
 'auRoc': 0.87223499590619,
 'logLoss': 0.021812601666304183,
 'confusionMatrix': {'annotationSpecs': [{'displayName': '0'},
   {'displayName': '1'}],
  'rows': [{'row': [28449.0, 6.0]}, {'row': [12.0, 35.0]}]}}

In [73]:
from google.protobuf.struct_pb2 import Struct

In [74]:
s = Struct()
s.update(model_metrics)

In [75]:
s

fields {
  key: "auPrc"
  value {
    number_value: 0.6361241886283929
  }
}
fields {
  key: "auRoc"
  value {
    number_value: 0.87223499590619
  }
}
fields {
  key: "confusionMatrix"
  value {
    struct_value {
      fields {
        key: "annotationSpecs"
        value {
          list_value {
            values {
              struct_value {
                fields {
                  key: "displayName"
                  value {
                    string_value: "0"
                  }
                }
              }
            }
            values {
              struct_value {
                fields {
                  key: "displayName"
                  value {
                    string_value: "1"
                  }
                }
              }
            }
          }
        }
      }
      fields {
        key: "rows"
        value {
          list_value {
            values {
              struct_value {
                fields {
                  key: "row"
    

### Add The Model Evaluation to The Model Registry

The initial evauation of the model was done right after training.  Some of the metrics were written to Vertex AI Experiment above as part of this run.  This section will write evaluation metrics directly to the Model Registry to accompany this version of the trained model.

**Resources:**
- Doc: [Model Evaluation in Vertex AI](https://cloud.google.com/vertex-ai/docs/evaluation/introduction#tabular)
- API: [aiplatform.gapic.ModelServiceClient.import_model_evaluation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.model_service.ModelServiceClient#google_cloud_aiplatform_v1_services_model_service_ModelServiceClient_import_model_evaluation)
- Example: [Get started with importing a custom model evaluation to the Vertex AI Model Registry](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_evaluation/get_started_with_custom_model_evaluation_import.ipynb)

**Helpful Notes:**
- Evaluations are loaded to a versioned model in the Vertex AI Model Registry.
- Multiple evaluations can be loaded for the same model and version.
- When loading an evaluation you must provide a schema file for the parameter `metrics_schema_uri`.
- A complete list of these is provided by the Doc link above and can be directly reviewed at this [public GCS bucket](https://console.cloud.google.com/storage/browser/google-cloud-aiplatform/schema/modelevaluation).
    - Make sure to use the `gsutil URI` in the API call.

In [59]:
model_client = aiplatform.gapic.ModelServiceClient(
    client_options = {'api_endpoint': f"{REGION}-aiplatform.googleapis.com"}
)

In [77]:
model_client.import_model_evaluation(
    parent = model.resource_name,
    model_evaluation = aiplatform.gapic.ModelEvaluation(
        display_name = 'test_at_training',
        metrics_schema_uri = 'gs://google-cloud-aiplatform/schema/modelevaluation/classification_metrics_1.0.0.yaml',
        metrics = model_metrics
    )
)

InternalServerError: 500 Internal error encountered.

In [78]:
model_metrics

{'auPrc': 0.6361241886283929,
 'auRoc': 0.87223499590619,
 'logLoss': 0.021812601666304183,
 'confusionMatrix': {'annotationSpecs': [{'displayName': '0'},
   {'displayName': '1'}],
  'rows': [{'row': [28449.0, 6.0]}, {'row': [12.0, 35.0]}]}}

In [None]:
s

In [2]:
testmodel = aiplatform.Model('projects/1026793852137/locations/us-central1/models/8504698251591548928')

NameError: name 'aiplatform' is not defined

In [1]:
testeval = testmodel.get_model_evaluation()

NameError: name 'testmodel' is not defined

In [None]:
testeval

#### List The Model Evaluation(s)

Review the Model Evaluation directly in the console also:

'1.21.0'