# New Relic ML Performance Monitoring- Bring Your Own Data

[ml-performance-monitoring](https://github.com/newrelic-experimental/ml-performance-monitoring) provides a Python library for sending machine learning models' inference data and performance metrics into New Relic.
<br>
It is based on the [newrelic-telemetry-sdk-python](https://github.com/newrelic/newrelic-telemetry-sdk-python) library.
<br>
By using this package, you can easily and quickly monitor your model, directly from a Jupyter notebook or any other environment.
 <br>
This notebook provides an example of sending inference data and metrics of an RandomForestClassifier model

<U>Note</U>- this notebook uses the libraries:
* numpy
* pandas
* sklearn

### 1. Import libraries


In [1]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

from ml_performance_monitoring.monitor import MLPerformanceMonitoring, wrap_model
from ml_performance_monitoring.psi import calculate_psi

### 2. Load the Iris dataset and split it into train and test sets

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris_dataset = load_iris()
X, y = (
    iris_dataset["data"],
    iris_dataset["target"],
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

X[:5], y[:5]

(array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2]]),
 array([0, 0, 0, 0, 0]))

### 3. Fitting Random Forest Classification model



In [3]:
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.pipeline import Pipeline

# Set up a pipeline with a feature selection preprocessor that
# selects the top 2 features to use.
# The pipeline then uses a RandomForestClassifier to train the model.

pipeline = Pipeline(
    [
        ("feature_selection", SelectKBest(chi2, k=2)),
        ("classification", RandomForestClassifier()),
    ]
)
pipeline.fit(X_train, y_train)

Pipeline(steps=[('feature_selection',
                 SelectKBest(k=2, score_func=<function chi2 at 0x13e0fa5e0>)),
                ('classification', RandomForestClassifier())])

### 4. Predicting the test set results

In [4]:
y_pred = pipeline.predict(X_test)
y_pred

array([0, 1, 1, 0, 2, 1, 2, 0, 0, 2, 1, 0, 2, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       2, 0, 2, 1, 0, 0, 1, 2, 1, 2, 1, 2, 2, 0, 1, 0, 1, 2, 2, 0, 1, 2,
       1, 2, 0, 0, 0, 1, 0, 0, 2, 2, 2, 2, 1, 1, 2, 1, 0, 2, 2, 0, 0, 2,
       0, 2, 2, 1, 1, 2, 1, 0, 1])

### 5. Record inference data to New Relic
<b> You have two options for sending your model's inference data (features and predictions) to New Relic: </b>
<br> 1. "Online" instrumentation - sending the data while the model is being invoked in production
<br> 2. "Offline" instrumentation - sending the data as a dataset (as an np.array or pandas dataframe) 

<b>   The MLPerformanceMonitoring object requires the following parameters: </b>
<br> 1. <b>model_name</b> - must be unique per model
<br> 2. <b>New Relic insert key </b>-  can be license-key or insights-insert-key [how to get your insert key](https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#insights-insert-key), set it as the following environment variable: NEW_RELIC_INSERT_KEY <br>
Optional parameters:
<br> 
3. <b>metadata</b> (dictionary) - will be added to each event (row) of the data 
<br>
4. <b>send_data_metrics </b>(boolean) - send data metrics (statistics) to New Relic (False as default)
<br> 
5. <b>features_columns</b> (list) - the features' names ordered as X columns. On New Relic data, the names will be prefixed with the string 'feature_'
<br>
6. <b> labels_columns</b> (list) - the labels' names ordered as y columns. On New Relic data, the names will be prefixed with the string 'label_'
<br> (note: The parameters features_columns and labels_columns are only relevant when sending the data as an np.array. When the data is sent as a dataframe, the dataframes (X,y) columns' names will be taken as features and labels names respectively. In addition, if you send your data as an np.array without sending the features_columns and labels_columns, on New Relic data, the names will appear as "feature_{n}" and "lablel_{n}" numbered by the features/labels order)


In [5]:
metadata = {"environment": "aws", "dataset": "Iris", "version": "1.0"}
features_columns = [
    "sepal_length",
    "sepal_width",
    "petal_length",
    "petal_width",
]


labels_columns = ["species"]

<b> 5.1.  "Online" instrumentation </b>
<br>
The wrap_model function extends the model/pipeline methods with the functionality of sending the inference data as [custom event](https://docs.newrelic.com/docs/data-apis/ingest-apis/introduction-event-api/) named "InferenceData" to New Relic NRDB.
Wrap your model or pipeline by sending it as a parameter and then use it (the return value) as usual (fit, predict, etc.). Your inference data and data metrics will be sent automatically.

In [7]:
ml_performence_monitor_pipeline = wrap_model(
    insert_key=None,  # set the environment variable NEW_RELIC_INSERT_KEY or send your insert key here,
    model=pipeline,
    model_name="RandomForestClassifier on Iris Dataset",
    metadata=metadata,
    send_data_metrics=True,
    features_columns=features_columns,
    labels_columns=labels_columns,
    label_type="categorical",
)

y_pred = ml_performence_monitor_pipeline.predict(
    X=X_test,
)



inference data sent successfully


<b> 5.2.  "Offline" instrumentation </b>
<br>
Define an MLPerformanceMonitoring object and send your inference data and data metrics as an np.array or as a pandas dataframe.

In [8]:
ml_monitor = MLPerformanceMonitoring(
    insert_key="NRII-t2xSb8p8EDY3foKtS4-dysBFuqDCxf4X",  # set the environment variable NEW_RELIC_INSERT_KEY or send your insert key here,
    model_name="RandomForestClassifier on Iris Dataset",
    metadata=metadata,
    send_data_metrics=True,
    features_columns=features_columns,
    labels_columns=labels_columns,
    label_type="categorical",
)



<b> 5.2.1  Send your features and predictions as an np.array. </b> 

In [9]:
ml_monitor.record_inference_data(X=X_test, y=y_pred)

inference data sent successfully


<b> 5.2.2  Send your features and prediction as a pd.DataFrame. <br>  

In [10]:
X_df = pd.DataFrame(
    list(map(np.ravel, X_test)),
    columns=features_columns,
)

y_pred_df = pd.DataFrame(
    list(map(np.ravel, y_pred)),
    columns=labels_columns,
)
X_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.8,4.0,1.2,0.2
1,5.1,2.5,3.0,1.1
2,6.6,3.0,4.4,1.4
3,5.4,3.9,1.3,0.4
4,7.9,3.8,6.4,2.0


In [11]:
y_pred_df.head()

Unnamed: 0,species
0,0
1,1
2,1
3,0
4,2


In [12]:
ml_monitor.record_inference_data(X=X_df, y=y_pred_df)

inference data sent successfully




### 6. Record metrics to New Relic
You can stream custom metrics to New Relic, monitoring your model performance, model data and drift metrics. These metrics will be sent to NRDB as [metric data](https://docs.newrelic.com/docs/data-apis/ingest-apis/metric-api/introduction-metric-api/).

<b> 6.1.  model performance metrics</b>

In [13]:
from sklearn import model_selection
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    f1_score,
    precision_score,
    recall_score,
)

# Model Evaluation
ac_sc = accuracy_score(y_test, y_pred)
rc_sc = recall_score(y_test, y_pred, average="weighted")
pr_sc = precision_score(y_test, y_pred, average="weighted")
f1_sc = f1_score(y_test, y_pred, average="micro")

print(f"Accuracy    : {ac_sc}")
print(f"Recall      : {rc_sc}")
print(f"Precision   : {pr_sc}")
print(f"F1 Score    : {f1_sc}")

Accuracy    : 0.9466666666666667
Recall      : 0.9466666666666667
Precision   : 0.9486769230769231
F1 Score    : 0.9466666666666667


In [16]:
metrics = {
    "Accuracy": ac_sc,
    "Recall": rc_sc,
    "Precision": pr_sc,
    "F1_Score": f1_sc,
}
metrics
ml_monitor.record_metrics(metrics=metrics)

model_metric sent successfully


<b> 6.2.  drift metrics</b>
<br>
send your model drift metric of data drift mertics (need to add the feature_name as variable).

In [None]:
## Calculate psi for top features
df_validation = np.transpose(X_test)
df_training = np.transpose(X_train)
top_feature_list = [0, 1, 2, 3]
data_drift_metrics = {}
psi_list = []
for index, feature_name in enumerate(features_columns):
    # Assuming you have a validation and training set
    psi_t = calculate_psi(
        df_validation[index],
        df_training[index],
        buckettype="quantiles",
        buckets=10,
        axis=1,
    )
    ml_monitor.record_metrics(metrics={"data_drift": psi_t}, feature_name=feature_name)

In [21]:
model_drift = calculate_psi(y_pred, y_train, buckettype="quantiles", buckets=10, axis=1)
ml_monitor.record_metrics(metrics={"model_drift": model_drift})

model_metric sent successfully


### 7. Optional - Simulate 24 hours of model inference data  
As written, the main purpose of this library is to record inference data of a scheduled model in production. By running the cell below, a simulation of inference data of the RandomForest Classifier model on the Iris Dataset will run each hour in the last 24 hours. After running the cell, you can view the data in 2 different places:<br>
* **Machine learning model** entity- an entity of the type **machine learning model** is automatically created. You can explore your model entities by selecting **Explorer** on [New Relic One](https://one.newrelic.com/launcher/), and going to the **Machine Learning** section on the left navigation menu.

* Example dashboard-follow the [instructions](https://docs.newrelic.com/docs/alerts-applied-intelligence/mlops/bring-your-own/mlops-byo/#use-case) to view the data in the example dashboard.

 

In [None]:
from datetime import datetime, timedelta

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)


classifier = RandomForestClassifier(
    n_estimators=10, criterion="entropy", random_state=0
)
classifier.fit(X_train, y_train)

ml_monitor = MLPerformanceMonitoring(
    insert_key=None,  # set the environment variable NEW_RELIC_INSERT_KEY or send your insert key here,
    model_name="RandomForest Classifier on Iris Dataset - Inference Simulation",
    metadata=metadata,
    features_columns=features_columns,
    labels_columns=labels_columns,
    label_type="categorical",
)

last_24h_date = lastHourDateTime = datetime.now() - timedelta(hours=24)
last_24h_timestamp = int(datetime.timestamp(last_24h_date) * 1000)

for i in range(24):

    idx = np.random.choice(np.arange(len(X)), 10, replace=False)
    X_sample = X[idx]
    y_sample = y[idx]

    y_pred = classifier.predict(X_sample)

    X_df = pd.DataFrame(
        list(map(np.ravel, X_sample)),
        columns=features_columns,
    )

    y_pred_df = pd.DataFrame(
        list(map(np.ravel, y_pred)),
        columns=labels_columns,
    )

    y_pred_df.loc[y_pred_df.species == 0, "species"] = "Setosa"
    y_pred_df.loc[y_pred_df.species == 1, "species"] = "Versicolour"
    y_pred_df.loc[y_pred_df.species == 2, "species"] = "Virginica"

    ml_monitor.record_inference_data(
        X=X_df, y=y_pred_df, timestamp=last_24h_timestamp + i * 3600000
    )

    # Model Evaluation
    ac_sc = accuracy_score(y_sample, y_pred)
    rc_sc = recall_score(y_sample, y_pred, average="weighted")
    pr_sc = precision_score(y_sample, y_pred, average="weighted")
    f1_sc = f1_score(y_sample, y_pred, average="micro")

    ## Calculate psi for top features
    df_validation = np.transpose(X_test)
    df_training = np.transpose(X_train)
    top_feature_list = [0, 1, 2, 3]
    data_drift_metrics = {}
    psi_list = []
    for index, feature_name in enumerate(features_columns):
        # Assuming you have a validation and training set
        psi_t = calculate_psi(
            df_validation[index],
            df_training[index],
            buckettype="quantiles",
            buckets=10,
            axis=1,
        )
        # ml_monitor.record_metrics(metrics={"data_drift":psi_t},feature_name=feature_name)

    model_drift = calculate_psi(
        y_pred, y_train, buckettype="quantiles", buckets=10, axis=1
    )

    metrics = {
        "Accuracy": ac_sc,
        "Recall": rc_sc,
        "Precision": pr_sc,
        "F1_Score": f1_sc,
        "model_drift": model_drift,
    }
    metrics
    ml_monitor.record_metrics(metrics=metrics)

    X_df.head()