![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FFramework+Workflows%2FCatBoost&file=CatBoost+In+Notebook.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20In%20Notebook.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FFramework%2520Workflows%2FCatBoost%2FCatBoost%2520In%2520Notebook.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20In%20Notebook.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Framework%20Workflows/CatBoost/CatBoost%20In%20Notebook.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# CatBoost - In Notebook

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [92]:
# tuples of (import name, install name, min_version)
packages = [
    ('catboost', 'catboost'),
    ('bigframes', 'bigframes'),
    ('sklearn', 'scikit-learn'),
    ('numpy', 'numpy'),
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.66.0'), 
    ('google.cloud.storage', 'google-cloud-storage'),   
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [98]:
REGION = 'us-central1'
SERIES = 'frameworks-catboost'
EXPERIMENT = 'notebook'

GCS_BUCKET = PROJECT_ID

packages:

In [211]:
import tempfile, json, os, io

import bigframes.pandas as bpd

import catboost 
import catboost.utils
import sklearn.metrics #import accuracy_score
import numpy as np

from google.cloud import aiplatform
from google.cloud import storage

clients:

In [103]:
# BigFrames API For BigQuery
bpd.options.bigquery.project = PROJECT_ID

# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION, experiment = f"{SERIES}-{EXPERIMENT}")

# gcs storage client
gcs = storage.Client(project = GCS_BUCKET)
bucket = gcs.bucket(GCS_BUCKET)

Parameters:

In [190]:
DIR = f"files/{EXPERIMENT}"

Environment:

In [193]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Data Source

**The Data**

The BigQuery source table is `bigquery-public-data.ml_datasets.ulb_fraud_detection`.  This is a table of credit card transactions that are classified as fradulant, `Class = 1`, or normal `Class = 0`.    
- The data can be researched further at this [Kaggle link](https://www.kaggle.com/mlg-ulb/creditcardfraud).
- Read mode about BigQuery public datasets [here](https://cloud.google.com/bigquery/public-data)

**Description of the Data**

This is a table of 284,807 credit card transactions classified as fradulant or normal in the column `Class`.  In order protect confidentiality, the original features have been transformed using [principle component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) into 28 features named `V1, V2, ... V28` (float).  Two descriptive features are provided without transformation by PCA:
- `Time` (integer) is the seconds elapsed between the transaction and the earliest transaction in the table
- `Amount` (float) is the value of the transaction
>**Quick Note on PCA**<p>PCA is an unsupervised learning technique: there is not a target variable.  PCA is commonlly used as a variable/feature reduction technique.  If you have 100 features then you could reduce it to a number p (say 10) projected features.  The choice of this number is a balance of how well it can explain the variance of the full feature space and reducing the number of features.  Each projected feature is orthogonal to each other feature, meaning there is no correlation between these new projected features.</p>

**Preparation of the Data**

Adds columns to the source data:  
- `splits` (string) this divided the tranactions into sets for `TRAIN` (80%), `VALIDATE` (10%), and `TEST` (10%)

In [10]:
fraud_ds = bpd.read_gbq('bigquery-public-data.ml_datasets.ulb_fraud_detection', use_cache=False)

In [11]:
fraud_ds.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,72890.0,-1.22258,-0.017622,2.317581,-1.547722,-0.958068,-0.370571,-0.583838,0.384328,-0.72238,...,0.430025,1.217131,-0.463494,0.456253,0.385304,-0.104713,-0.303068,-0.300302,5.9,0
1,131206.0,1.967597,-1.009301,-1.970656,-0.406056,1.614598,3.92548,-1.209586,0.952736,-0.429297,...,-0.288566,-0.420307,0.258054,0.632264,-0.148758,-0.656398,0.077885,-0.027551,59.0,0
2,122831.0,2.290614,-1.288035,-1.091499,-1.591945,-0.983697,-0.58711,-0.952236,-0.272064,-1.392405,...,-0.161871,0.04143,0.225622,0.672485,-0.105101,-0.194599,0.007488,-0.040725,30.0,0
3,68397.0,1.258859,0.440981,0.331167,0.681581,-0.267935,-1.046229,0.163925,-0.269223,-0.142249,...,-0.27486,-0.734847,0.116306,0.376938,0.25547,0.090629,-0.015355,0.033149,0.89,0
4,152137.0,2.023988,-0.351874,-0.494781,0.36047,-0.400929,-0.202362,-0.544039,-0.078031,1.364484,...,0.160192,0.774027,0.021697,-0.601828,0.029147,-0.175735,0.04743,-0.041086,9.99,0


In [12]:
fraud_ds = fraud_ds.to_pandas()

In [13]:
shuffle = fraud_ds.sample(frac = 1, random_state = 42)
train_pct, val_pct = .8, .1
train_end = int(train_pct * len(shuffle))
val_end = int((train_pct + val_pct) * len(shuffle))

fraud_ds['splits'] = 'None'
fraud_ds.loc[shuffle[:train_end].index, 'splits'] = 'TRAIN'
fraud_ds.loc[shuffle[train_end:val_end].index, 'splits'] = 'VALIDATE'
fraud_ds.loc[shuffle[val_end:].index, 'splits'] = 'TEST'

In [14]:
fraud_ds.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V22,V23,V24,V25,V26,V27,V28,Amount,Class,splits
0,72890.0,-1.22258,-0.017622,2.317581,-1.547722,-0.958068,-0.370571,-0.583838,0.384328,-0.72238,...,1.217131,-0.463494,0.456253,0.385304,-0.104713,-0.303068,-0.300302,5.9,0,TRAIN
1,131206.0,1.967597,-1.009301,-1.970656,-0.406056,1.614598,3.92548,-1.209586,0.952736,-0.429297,...,-0.420307,0.258054,0.632264,-0.148758,-0.656398,0.077885,-0.027551,59.0,0,TRAIN
2,122831.0,2.290614,-1.288035,-1.091499,-1.591945,-0.983697,-0.58711,-0.952236,-0.272064,-1.392405,...,0.04143,0.225622,0.672485,-0.105101,-0.194599,0.007488,-0.040725,30.0,0,TRAIN
3,68397.0,1.258859,0.440981,0.331167,0.681581,-0.267935,-1.046229,0.163925,-0.269223,-0.142249,...,-0.734847,0.116306,0.376938,0.25547,0.090629,-0.015355,0.033149,0.89,0,TRAIN
4,152137.0,2.023988,-0.351874,-0.494781,0.36047,-0.400929,-0.202362,-0.544039,-0.078031,1.364484,...,0.774027,0.021697,-0.601828,0.029147,-0.175735,0.04743,-0.041086,9.99,0,TRAIN


---
## CatBoost

### Data Preparation

In [15]:
X = fraud_ds.drop(['Class', 'splits'], axis = 1)
y = fraud_ds.Class
splits = fraud_ds.splits

In [16]:
train = catboost.Pool(
    data = X.loc[splits[splits == 'TRAIN'].index],
    label = y.loc[splits[splits == 'TRAIN'].index]
)
validate = catboost.Pool(
    data = X.loc[splits[splits == 'VALIDATE'].index],
    label = y.loc[splits[splits == 'VALIDATE'].index]
)
test = catboost.Pool(
    data = X.loc[splits[splits == 'TEST'].index],
    label = y.loc[splits[splits == 'TEST'].index]
)

In [29]:
test.get_features()

array([[ 2.7930000e+03, -1.0485601e+00,  3.4101680e-01, ...,
        -9.7160764e-02, -3.4747571e-01,  1.0000000e+00],
       [ 1.0873900e+05, -1.2946961e+00,  3.0926020e+00, ...,
        -8.4436423e-01, -2.9500297e-01,  7.5599999e+00],
       [ 6.0400000e+04, -2.8835166e-01,  7.0912164e-01, ...,
         1.0991978e-02,  2.1180758e-02,  2.9800000e+00],
       ...,
       [ 1.3858600e+05, -1.0272281e+00,  1.7412000e+00, ...,
        -4.9604988e-01, -3.1936836e-01,  1.0590000e+01],
       [ 6.3628000e+04, -7.9432732e-01,  1.0733790e+00, ...,
         3.4771362e-01,  1.4111678e-01,  1.9990000e+01],
       [ 2.5450000e+03, -7.5646615e-01,  9.4055700e-01, ...,
        -2.6545532e-02,  4.3833274e-02,  5.8000002e+00]], dtype=float32)

In [48]:
test.get_label()

array([0, 0, 0, ..., 0, 0, 0])

### Training

In [17]:
model = catboost.CatBoostClassifier(
    custom_loss = [catboost.metrics.Accuracy()],
    random_seed = 42,
    iterations = 200,
    verbose = False
)

In [18]:
model.fit(
    train,
    eval_set = validate
)

<catboost.core.CatBoostClassifier at 0x7f9658a30520>

In [19]:
model.get_best_score()

{'learn': {'Accuracy': 0.9999034431301982, 'Logloss': 0.00043716791558089595},
 'validation': {'Accuracy': 0.9995786664794073,
  'Logloss': 0.002149552674352494}}

In [20]:
model.get_best_iteration()

69

In [21]:
model.get_params()

{'iterations': 200,
 'random_seed': 42,
 'verbose': False,
 'custom_loss': ['Accuracy']}

In [22]:
model.get_all_params()

{'nan_mode': 'Min',
 'eval_metric': 'Logloss',
 'iterations': 200,
 'sampling_frequency': 'PerTree',
 'leaf_estimation_method': 'Newton',
 'random_score_type': 'NormalWithModelSizeDecrease',
 'grow_policy': 'SymmetricTree',
 'penalties_coefficient': 1,
 'boosting_type': 'Plain',
 'model_shrink_mode': 'Constant',
 'feature_border_type': 'GreedyLogSum',
 'bayesian_matrix_reg': 0.10000000149011612,
 'eval_fraction': 0,
 'force_unit_auto_pair_weights': False,
 'l2_leaf_reg': 3,
 'random_strength': 1,
 'rsm': 1,
 'boost_from_average': False,
 'model_size_reg': 0.5,
 'pool_metainfo_options': {'tags': {}},
 'subsample': 0.800000011920929,
 'use_best_model': True,
 'class_names': [0, 1],
 'random_seed': 42,
 'depth': 6,
 'posterior_sampling': False,
 'border_count': 254,
 'classes_count': 0,
 'auto_class_weights': 'None',
 'sparse_features_conflict_fraction': 0,
 'custom_metric': ['Accuracy'],
 'leaf_estimation_backtracking': 'AnyImprovement',
 'best_model_min_trees': 1,
 'model_shrink_rate': 

### Inference

In [23]:
predictions = model.predict(test.get_features())
predictions_probs = model.predict_proba(test.get_features())

In [24]:
predictions[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [25]:
predictions_probs[0:10]

array([[9.99921674e-01, 7.83259909e-05],
       [9.89528053e-01, 1.04719474e-02],
       [9.99981821e-01, 1.81792276e-05],
       [9.99993107e-01, 6.89255259e-06],
       [9.99600287e-01, 3.99712633e-04],
       [9.99870655e-01, 1.29344732e-04],
       [9.99973999e-01, 2.60012740e-05],
       [9.99858036e-01, 1.41963930e-04],
       [9.99982863e-01, 1.71372976e-05],
       [9.99960022e-01, 3.99781716e-05]])

In [26]:
sklearn.metrics.accuracy_score(
    test.get_label(),
    model.predict(test.get_features())
)

0.9994733330992591

### Evaluation

In [127]:
# class labels
model.classes_

array([0, 1])

In [126]:
# confusion matrix
sklearn.metrics.confusion_matrix(test.get_label(), model.predict(test)).tolist()

[[28425, 0], [15, 41]]

In [122]:
# precision
sklearn.metrics.precision_score(test.get_label(), model.predict(test), average='macro')

0.9997362869198312

In [123]:
# recall
sklearn.metrics.recall_score(test.get_label(), model.predict(test), average='macro')

0.8660714285714286

In [124]:
# f1
sklearn.metrics.f1_score(test.get_label(), model.predict(test), average='macro')

0.922548521049583

In [125]:
# average precision
sklearn.metrics.average_precision_score(test.get_label(), model.predict(test), average='macro')

0.732669524043598

In [145]:
# fpr, tpr, threshold for ROC curve
fpr, tpr, threshold = sklearn.metrics.roc_curve(test.get_label(), model.predict_proba(test)[:, 1])

In [143]:
fpr, tpr, threshold

(array([0.        , 0.        , 0.        , ..., 0.99971856, 0.99978892,
        1.        ]),
 array([0.        , 0.01785714, 0.73214286, ..., 1.        , 1.        ,
        1.        ]),
 array([           inf, 9.99321985e-01, 5.23995158e-01, ...,
        1.03439049e-06, 1.01246217e-06, 5.94146431e-07]))

In [134]:
# confusion matrix
catboost.utils.get_confusion_matrix(model, test)

array([[2.8425e+04, 0.0000e+00],
       [1.5000e+01, 4.1000e+01]])

In [133]:
# fpr, tpr, thresholds for ROC curve
catboost.utils.get_roc_curve(model, test)

(array([0.        , 0.        , 0.        , ..., 0.99992964, 0.99996482,
        1.        ]),
 array([0.        , 0.01785714, 0.03571429, ..., 1.        , 1.        ,
        1.        ]),
 array([1.00000000e+00, 9.99317951e-01, 9.98980431e-01, ...,
        7.13992184e-07, 6.31039079e-07, 0.00000000e+00]))

### Inference Data Formats

In [79]:
example_df = fraud_ds[(fraud_ds['splits']=='TEST') & (fraud_ds['Class']==1)].iloc[0:1].drop(['Class', 'splits'], axis = 1)
example_df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
4131,13323.0,-5.454362,8.287421,-12.752811,8.594342,-3.106002,-3.179949,-9.252794,4.245062,-6.329801,...,1.305862,1.846165,-0.267172,-0.310804,-1.201685,1.352176,0.608425,1.574715,0.808725,1.0


In [80]:
example_df_reordered = example_df.reindex(columns = np.random.permutation(example_df.columns))
example_df_reordered

Unnamed: 0,V10,V17,V11,V12,Time,V26,V20,V21,V18,V5,...,V22,V28,V6,V24,V2,V9,V25,V19,V1,V16
4131,-13.136698,-14.744902,11.22847,-17.131301,13323.0,0.608425,1.305862,1.846165,-5.247301,-3.106002,...,-0.267172,0.808725,-3.179949,-1.201685,8.287421,-6.329801,1.352176,-0.574675,-5.454362,-9.723565


In [81]:
example_series = example_df.iloc[0]
example_series_reordered = example_df_reordered.iloc[0]
example_series

Time        13323.0
V1        -5.454362
V2         8.287421
V3       -12.752811
V4         8.594342
V5        -3.106002
V6        -3.179949
V7        -9.252794
V8         4.245062
V9        -6.329801
V10      -13.136698
V11        11.22847
V12      -17.131301
V13       -0.169401
V14      -18.049998
V15       -1.366236
V16       -9.723565
V17      -14.744902
V18       -5.247301
V19       -0.574675
V20        1.305862
V21        1.846165
V22       -0.267172
V23       -0.310804
V24       -1.201685
V25        1.352176
V26        0.608425
V27        1.574715
V28        0.808725
Amount          1.0
Name: 4131, dtype: Float64

In [82]:
example_np = example_df.to_numpy()
example_np_reordered = example_df_reordered.to_numpy()
example_np

array([[13323.0, -5.454361779396731, 8.287420555349831,
        -12.7528112729386, 8.59434189301081, -3.10600228114338,
        -3.1799487568641402, -9.25279393795831, 4.24506220985367,
        -6.32980084623466, -13.136698369103902, 11.228470279576001,
        -17.1313009454468, -0.16940105681412398, -18.049997689859396,
        -1.3662356609906499, -9.7235653091894, -14.7449024646768,
        -5.24730110631125, -0.5746751437958171, 1.3058619148343702,
        1.84616479291417, -0.267171794223081, -0.31080396975162106,
        -1.20168545799806, 1.35217609502433, 0.6084245963604029,
        1.5747147838420401, 0.8087252050902329, 1.0]], dtype=object)

In [83]:
example_list = example_df.values.tolist()
example_list_reordered = example_df_reordered.values.tolist()
example_list

[[13323.0,
  -5.454361779396731,
  8.287420555349831,
  -12.7528112729386,
  8.59434189301081,
  -3.10600228114338,
  -3.1799487568641402,
  -9.25279393795831,
  4.24506220985367,
  -6.32980084623466,
  -13.136698369103902,
  11.228470279576001,
  -17.1313009454468,
  -0.16940105681412398,
  -18.049997689859396,
  -1.3662356609906499,
  -9.7235653091894,
  -14.7449024646768,
  -5.24730110631125,
  -0.5746751437958171,
  1.3058619148343702,
  1.84616479291417,
  -0.267171794223081,
  -0.31080396975162106,
  -1.20168545799806,
  1.35217609502433,
  0.6084245963604029,
  1.5747147838420401,
  0.8087252050902329,
  1.0]]

In [84]:
model.predict(example_df), model.predict(example_df_reordered)

(array([1]), array([1]))

In [85]:
model.predict(example_series), model.predict(example_series_reordered)

(1, 0)

In [86]:
model.predict(example_np), model.predict(example_np_reordered)

(array([1]), array([0]))

In [108]:
model.predict(example_list), model.predict(example_list_reordered)

(array([1]), array([0]))

In [207]:
examples_np = fraud_ds[(fraud_ds['splits']=='TEST') & (fraud_ds['Class']==1)].iloc[0:10].drop(['Class', 'splits'], axis = 1).to_numpy()

In [208]:
model.predict(examples_np[0])

1

In [209]:
model.predict(examples_np)

array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1])

---
## Model Files

Save and recall model files using GCS.

### Save To GCS

In [225]:
model_blob = bucket.blob(f'{SERIES}/{EXPERIMENT}/model.cbm')

with tempfile.NamedTemporaryFile() as temp_file:
    model.save_model(temp_file.name, format = 'cbm')
    model_blob.upload_from_filename(temp_file.name)

In [237]:
examples_blob = bucket.blob(f'{SERIES}/{EXPERIMENT}/examples.json')
examples_blob.upload_from_string(
    json.dumps(examples_np.tolist()),
    content_type = 'appliation/json'
)

In [238]:
list(bucket.list_blobs(prefix = f'{SERIES}/{EXPERIMENT}'))

[<Blob: statmike-mlops-349915, frameworks-catboost/notebook/examples.json, 1727814002308236>,
 <Blob: statmike-mlops-349915, frameworks-catboost/notebook/model.cbm, 1727813570750716>]

### Load From GCS

In [239]:
model_bytes = model_blob.download_as_bytes()
reload_model = catboost.CatBoostClassifier()
reload_model.load_model(blob = model_bytes)

<catboost.core.CatBoostClassifier at 0x7f95b6c6ed10>

In [244]:
reload_examples_np = np.array(
    json.loads(examples_blob.download_as_string())
)

In [245]:
reload_model.predict(examples_np)

array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1])

In [246]:
reload_model.predict(reload_examples_np)

array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1])

---
## Vertex AI Experiments

[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) is covered in detail in this repository under the [MLOps](../../MLOps/readme.md) section for [Experiment Tracking](../../MLOps/Experiment%20Tracking/readme.md).

In short:
- Experiments is a service for tracking ana analyzing metrics, parameters and other data related to ML processes
- An [experiment is created or connected](https://cloud.google.com/vertex-ai/docs/experiments/create-experiment) with the SDK via `aiplatform.init(experiment = 'name for experiment here`)
- Information is logged to a [run created under an experiment](https://cloud.google.com/vertex-ai/docs/experiments/create-manage-exp-run) with `aiplatform.start_run(run = 'name for run here')`
- Log information with the SDK:
    - [Autolog](https://cloud.google.com/vertex-ai/docs/experiments/autolog-data) with `aiplatform.autolog()`
        - for supported frameworks
    - [Summary metrics](https://cloud.google.com/vertex-ai/docs/experiments/log-data#summary-metrics) with `aiplatform.log_metrics()`
    - [Time series metrics](https://cloud.google.com/vertex-ai/docs/experiments/log-data#time-series-metrics) with `aiplatform.log_time_series_metrics()`
        - This requires a backing Vertex AI TensorBoard resource
    - [Parameters](https://cloud.google.com/vertex-ai/docs/experiments/log-data#parameters) with `aiplatform.log_params()`
    - [Classification metrics](https://cloud.google.com/vertex-ai/docs/experiments/log-data#classification-metrics) with `aiplatform.log_classification_metrics()`
    - [Model Artifacts](https://cloud.google.com/vertex-ai/docs/experiments/log-models-exp-run) with `aiplatform.save_model()`
        - for supported frameworks
    - [Metadata Artifacts with Executions](https://cloud.google.com/vertex-ai/docs/experiments/track-executions-artifacts) using `aiplatform.start_execution`
- Manages runs with:
    - [End a run](https://cloud.google.com/vertex-ai/docs/experiments/create-manage-exp-run#end-run) with `aiplatform.end_run()`
    - [Resume a run](https://cloud.google.com/vertex-ai/docs/experiments/create-manage-exp-run#resume-run) to update/add to an existing run by adding the `resume = True` parameter to `aiplatform.stsart_run(run = '', resume = True)`
    - [Delete a run](https://cloud.google.com/vertex-ai/docs/experiments/create-manage-exp-run#delete-run) with the SDK's `ExperimentRun` method
    - [Change the status](https://cloud.google.com/vertex-ai/docs/experiments/create-manage-exp-run#manage-status) of a run with the SDK's `ExperimentRun` method

### Start An Experiment Run

In [146]:
aiplatform.start_run(run = 'example')

Associating projects/1026793852137/locations/us-central1/metadataStores/default/contexts/frameworks-catboost-notebook-example to Experiment: frameworks-catboost-notebook


<google.cloud.aiplatform.metadata.experiment_run_resource.ExperimentRun at 0x7f95bf0f2440>

### Log Model Parameters

In [166]:
params = model.get_all_params()
for key, value in params.items():
    if type(value) in [dict, list]:
        params[key] = json.dumps(value)
        
aiplatform.log_params(params)

### Log Summary Metrics

In [170]:
for split in ['train', 'validate', 'test']:
    if split == 'train': dataset = train
    elif split == 'validate': dataset = validate
    elif split == 'test': dataset = test
    aiplatform.log_metrics(
        {
            f'{split}_accuracy' : sklearn.metrics.accuracy_score(dataset.get_label(), model.predict(dataset)),
            f'{split}_precision' : sklearn.metrics.precision_score(dataset.get_label(), model.predict(dataset), average='macro'),
            f'{split}_recall' : sklearn.metrics.recall_score(dataset.get_label(), model.predict(dataset), average='macro'),
            f'{split}_f1' : sklearn.metrics.f1_score(dataset.get_label(), model.predict(dataset), average='macro'),
            f'{split}_average_precision' : sklearn.metrics.average_precision_score(dataset.get_label(), model.predict(dataset), average='macro')        
        }
    )

### Log Classification Metrics

In [178]:
for split in ['train', 'validate', 'test']:
    if split == 'train': dataset = train
    elif split == 'validate': dataset = validate
    elif split == 'test': dataset = test
    
    
    fpr, tpr, threshold = sklearn.metrics.roc_curve(dataset.get_label(), model.predict_proba(dataset)[:, 1])
    aiplatform.log_classification_metrics(
        labels = [str(x) for x in model.classes_],
        matrix = sklearn.metrics.confusion_matrix(dataset.get_label(), model.predict(dataset)).tolist(),
        #fpr = fpr.tolist()[1:],
        #tpr = tpr.tolist()[1:],
        #threshold = threshold.tolist()[1:],
        display_name = split
    )

### Log Model Artifact

In [183]:
model_artifact = aiplatform.Artifact.create(
    schema_title = 'system.Model',
    schema_version = '0.0.1',
    display_name = 'Model For Example',
    uri = f"gs://{bucket.name}/{model_blob.name}",
)

In [186]:
with aiplatform.start_execution(
    schema_title = 'system.ContainerExecution',
    display_name = 'Train Model'
) as exc:
    exc.assign_output_artifacts([model_artifact])

In [187]:
model_artifact.uri

'gs://statmike-mlops-349915/frameworks-catboost/notebook/model.cbm'

In [189]:
#dir(exc)

---
## Save Files For Companion Workflows

This section saves the model and some test instances to a local folder for use in companion workflows that use the model.

In [194]:
model.save_model(f'{DIR}/model.cbm', format = 'cbm')

In [248]:
with open(f'{DIR}/examples.json', 'w') as f:
    json.dump(examples_np.tolist(), f)