# New Relic ML Performance Monitoring - Bring Your Own Data

[ml-performance-monitoring](https://github.com/newrelic-experimental/ml-performance-monitoring) provides a Python library for sending machine learning models' inference data and performance metrics into New Relic. 
<br>
By using this package, you can easily and quickly monitor your model, directly from a Jupyter notebook or a cloud service. 
<br>
The package is ML framework agnostic and can be quickly integrated. It is based on the newrelic-telemetry-sdk-python library.
<br>
It is based on the [newrelic-telemetry-sdk-python](https://github.com/newrelic/newrelic-telemetry-sdk-python) library.


This notebook provides an example of sending inference data and metrics of an XGBoost model

<U>Note</U>- this notebook uses the libraries:
* numpy
* pandas
* sklearn

### 0. Install libraries

In [1]:
!pip3 install git+https://github.com/newrelic-experimental/ml-performance-monitoring.git

In [2]:
!pip3 install pandas scikit-learn

### 1. Import libraries


In [3]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

from ml_performance_monitoring.monitor import MLPerformanceMonitoring
from ml_performance_monitoring.psi import calculate_psi

### 2. Load the Iris dataset and split it into train and test sets

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris_dataset = load_iris()
X, y = (
    iris_dataset["data"],
    iris_dataset["target"],
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

X[:5], y[:5]

(array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2]]),
 array([0, 0, 0, 0, 0]))

### 3. Fitting Random Forest Classification model



In [5]:
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.pipeline import Pipeline

# Set up a pipeline with a feature selection preprocessor that
# selects the top 2 features to use.
# The pipeline then uses a RandomForestClassifier to train the model.

pipeline = Pipeline(
    [
        ("feature_selection", SelectKBest(chi2, k=2)),
        ("classification", RandomForestClassifier()),
    ]
)
pipeline.fit(X_train, y_train)

Pipeline(steps=[('feature_selection',
                 SelectKBest(k=2, score_func=<function chi2 at 0x16c941ca0>)),
                ('classification', RandomForestClassifier())])

### 4. Predicting the test set results

In [6]:
y_pred = pipeline.predict(X_test)
y_pred

array([0, 1, 1, 0, 2, 1, 2, 0, 0, 2, 1, 0, 2, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       2, 0, 2, 1, 0, 0, 1, 2, 1, 2, 1, 2, 2, 0, 1, 0, 1, 2, 2, 0, 1, 2,
       1, 2, 0, 0, 0, 1, 0, 0, 2, 2, 2, 2, 1, 1, 2, 1, 0, 2, 2, 0, 0, 2,
       0, 2, 2, 1, 1, 2, 1, 0, 1])

In [7]:
y_pred_names = iris_dataset["target_names"][y_pred]
y_pred_names

array(['setosa', 'versicolor', 'versicolor', 'setosa', 'virginica',
       'versicolor', 'virginica', 'setosa', 'setosa', 'virginica',
       'versicolor', 'setosa', 'virginica', 'versicolor', 'versicolor',
       'setosa', 'versicolor', 'versicolor', 'setosa', 'setosa',
       'versicolor', 'versicolor', 'virginica', 'setosa', 'virginica',
       'versicolor', 'setosa', 'setosa', 'versicolor', 'virginica',
       'versicolor', 'virginica', 'versicolor', 'virginica', 'virginica',
       'setosa', 'versicolor', 'setosa', 'versicolor', 'virginica',
       'virginica', 'setosa', 'versicolor', 'virginica', 'versicolor',
       'virginica', 'setosa', 'setosa', 'setosa', 'versicolor', 'setosa',
       'setosa', 'virginica', 'virginica', 'virginica', 'virginica',
       'versicolor', 'versicolor', 'virginica', 'versicolor', 'setosa',
       'virginica', 'virginica', 'setosa', 'setosa', 'virginica',
       'setosa', 'virginica', 'virginica', 'versicolor', 'versicolor',
       'virginica', 'ver

### 5. Record inference data to New Relic

The MLPerformanceMonitoring parameters: 
   * Required parameters:
      * `model_name` - must be unique per model
      *  `insert_key` - [Get your key](https://one.newrelic.com/launcher/api-keys-ui.api-keys-launcher) (also referenced as `ingest - license`) and set it as environment variable: `NEW_RELIC_INSERT_KEY`.
[Click here](https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#license-key) for more details and instructions.

* Optional parameters:
   * `metadata` (dictionary) - will be added to each event (row) of the data 
   * `send_data_metrics` (boolean) - send data metrics (statistics) to New Relic (False as default)
   * `features_columns`(list) - the features' names ordered as X columns.
   * `labels_columns` (list) - the labels' names ordered as y columns. 

(note: The parameters `features_columns` and `labels_columns` are only relevant when sending the data as an np.array. When the data is sent as a dataframe, the dataframes (X,y) columns' names will be taken as features and labels names respectively. In addition, if you send your data as an np.array without sending the features_columns and labels_columns, on New Relic data, the names will appear as "feature_{n}" and "lablel_{n}" numbered by the features/labels order)


5.1. Define monitoring parameters

In [8]:
metadata = {"environment": "notebook", "dataset": "Iris"}
model_version = "1.0"
features_columns = [
    "sepal_length",
    "sepal_width",
    "petal_length",
    "petal_width",
]

labels_columns = ["species"]

5.2 Create model monitor

In [9]:
insert_key = None

ml_monitor = MLPerformanceMonitoring(
    insert_key=insert_key,  # set the environment variable NEW_RELIC_INSERT_KEY or send your insert key here
    model_name="RandomForestClassifier on Iris Dataset",
    metadata=metadata,
    send_data_metrics=True,
    features_columns=features_columns,
    labels_columns=labels_columns,
    model_version=model_version,
)

5.3  Send your inference data as an np.array.

In [10]:
ml_monitor.record_inference_data(X=X_test, y=y_pred_names)

5.4  Send your inference data as a pd.DataFrame. 

In [11]:
X_df = pd.DataFrame(
    list(map(np.ravel, X_test)),
    columns=features_columns,
)

y_pred_df = pd.DataFrame(
    list(map(np.ravel, y_pred_names)),
    columns=labels_columns,
)
X_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.8,4.0,1.2,0.2
1,5.1,2.5,3.0,1.1
2,6.6,3.0,4.4,1.4
3,5.4,3.9,1.3,0.4
4,7.9,3.8,6.4,2.0


In [12]:
y_pred_df.head()

Unnamed: 0,species
0,setosa
1,versicolor
2,versicolor
3,setosa
4,virginica


In [13]:
ml_monitor.record_inference_data(X=X_df, y=y_pred_df)

### 6. Record metrics to New Relic
You can stream custom metrics to New Relic, monitoring your model performance, model data and drift metrics. These metrics will be sent to NRDB as [metric data](https://docs.newrelic.com/docs/data-apis/ingest-apis/metric-api/introduction-metric-api/).

6.1.  model performance metrics

In [14]:
from sklearn import model_selection
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    f1_score,
    precision_score,
    recall_score,
)

# Model Evaluation
ac_sc = accuracy_score(y_test, y_pred)
rc_sc = recall_score(y_test, y_pred, average="weighted")
pr_sc = precision_score(y_test, y_pred, average="weighted")
f1_sc = f1_score(y_test, y_pred, average="micro")

print(f"Accuracy    : {ac_sc}")
print(f"Recall      : {rc_sc}")
print(f"Precision   : {pr_sc}")
print(f"F1 Score    : {f1_sc}")

Accuracy    : 0.9466666666666667
Recall      : 0.9466666666666667
Precision   : 0.9486769230769231
F1 Score    : 0.9466666666666667


In [15]:
metrics = {
    "Accuracy": ac_sc,
    "Recall": rc_sc,
    "Precision": pr_sc,
    "F1_Score": f1_sc,
}
metrics
ml_monitor.record_metrics(metrics=metrics)

<b> 6.2.  drift metrics</b>
<br>
send your model drift metric of data drift mertics (need to add the feature_name as variable).

In [16]:
## Calculate psi for top features
df_validation = np.transpose(X_test)
df_training = np.transpose(X_train)
top_feature_list = [0, 1, 2, 3]
data_drift_metrics = {}
psi_list = []
for index, feature_name in enumerate(features_columns):
    # Assuming you have a validation and training set
    psi_t = calculate_psi(
        df_validation[index],
        df_training[index],
        buckettype="quantiles",
        buckets=10,
        axis=1,
    )
    ml_monitor.record_metrics(metrics={"data_drift": psi_t}, feature_name=feature_name)

In [17]:
model_drift = calculate_psi(y_pred, y_train, buckettype="quantiles", buckets=10, axis=1)
ml_monitor.record_metrics(metrics={"model_drift": model_drift})

### 7. Monitor and alert
Done! Check your application in the [New Relic UI](https://one.newrelic.com/nr1-core?filters=%28domain%20%3D%20%27MLOPS%27%20AND%20type%20%3D%20%27MACHINE_LEARNING_MODEL%27%29) to see the real time data.

### 8. Optional - Simulate 24 hours of model inference data  
As written, the main purpose of this library is to record inference data of a scheduled model in production. By running the cell below, a simulation of inference data of the RandomForest Classifier model on the Iris Dataset will run each hour in the last 24 hours. After running the cell, you can view the data in 2 different places:<br>
* **Machine learning model** entity- an entity of the type **machine learning model** is automatically created. You can explore your model entities by selecting **Explorer** on [New Relic One](https://one.newrelic.com/launcher/), and going to the **Machine Learning** section on the left navigation menu.

* Example dashboard-follow the [instructions](https://docs.newrelic.com/docs/alerts-applied-intelligence/mlops/bring-your-own/mlops-byo/#use-case) to view the data in the example dashboard.

 

In [18]:
from datetime import datetime, timedelta
from random import randint

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)


classifier = RandomForestClassifier(
    n_estimators=10, criterion="entropy", random_state=0
)
classifier.fit(X_train, y_train)

insert_key = None

ml_monitor = MLPerformanceMonitoring(
    insert_key=insert_key,  # set the environment variable NEW_RELIC_INSERT_KEY or send your insert key here,
    model_name="RandomForest Classifier on Iris Dataset - Inference Simulation",
    metadata=metadata,
    features_columns=features_columns,
    labels_columns=labels_columns,
    model_version=model_version,
)

last_24h_date = datetime.now() - timedelta(hours=24)
last_24h_timestamp = int(datetime.timestamp(last_24h_date))

for i in range(0, 24, 2):
    idx = np.random.choice(np.arange(len(X)), randint(10, 50), replace=False)
    X_sample = X[idx]
    y_sample = y[idx]

    y_pred = classifier.predict(X_sample)

    X_df = pd.DataFrame(
        list(map(np.ravel, X_sample)),
        columns=features_columns,
    )

    y_pred_df = pd.DataFrame(
        list(map(np.ravel, y_pred)),
        columns=labels_columns,
    )

    y_pred_df.loc[y_pred_df.species == 0, "species"] = "setosa"
    y_pred_df.loc[y_pred_df.species == 1, "species"] = "versicolour"
    y_pred_df.loc[y_pred_df.species == 2, "species"] = "virginica"

    ml_monitor.record_inference_data(
        X=X_df, y=y_pred_df, timestamp=last_24h_timestamp + i * 3600
    )

    # Model Evaluation
    ac_sc = accuracy_score(y_sample, y_pred)
    rc_sc = recall_score(y_sample, y_pred, average="weighted")
    pr_sc = precision_score(y_sample, y_pred, average="weighted")
    f1_sc = f1_score(y_sample, y_pred, average="micro")

    ## Calculate psi for top features
    df_validation = np.transpose(X_test)
    df_training = np.transpose(X_train)
    top_feature_list = [0, 1, 2, 3]
    data_drift_metrics = {}
    psi_list = []
    for index, feature_name in enumerate(features_columns):
        # Assuming you have a validation and training set
        psi_t = calculate_psi(
            df_validation[index],
            df_training[index],
            buckettype="quantiles",
            buckets=10,
            axis=1,
        )
        ml_monitor.record_metrics(
            metrics={"data_drift": psi_t},
            timestamp=last_24h_timestamp + i * 3600,
            feature_name=feature_name,
        )

    model_drift = calculate_psi(
        y_pred, y_train, buckettype="quantiles", buckets=10, axis=1
    )

    metrics = {
        "Accuracy": ac_sc,
        "Recall": rc_sc,
        "Precision": pr_sc,
        "F1_Score": f1_sc,
        "model_drift": model_drift,
    }
    metrics
    ml_monitor.record_metrics(metrics=metrics, timestamp=last_24h_timestamp + i * 3600)

    X_df.head()