# New Relic ML Performance Monitoring- Bring Your Own Data
##### “Add your own data” is a library-based of the “newrelic_telemetry_sdk” <br> library that helps the user easily send model data to New Relic,<br> so that they can quickly monitor a simple model, directly from a Jupyter notebook or a cloud service. 
##### in the following notebook, you will see a various ways to use it.

note:
this notebook use the libraries: sklearn, pandas, uuid, xgb

### 1. Import libraries


In [113]:
from new_relic_ml_performance_monitoring.monitor import (
    MLPerformanceMonitoring,
    wrap_model,
)

### 2. Load the iris dataset and split it into train and test sets



In [114]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

X[:5], y[:5]

(array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2]]),
 array([0, 0, 0, 0, 0]))

### 3. Fitting Random Forest Classification to the Training set




In [115]:
from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(
    n_estimators=10, criterion="entropy", random_state=0
)
classifier.fit(X_train, y_train)

RandomForestClassifier(criterion='entropy', n_estimators=10, random_state=0)

### 4. Predicting the Test set results

In [116]:
y_pred = classifier.predict(X_test)
y_pred

array([1, 2, 2, 1, 0, 1, 1, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 1, 0, 0, 1,
       0, 2, 0, 0, 0, 1, 2, 0])

### 5. Record inference data to New Relic
#####  The MLPerformanceMonitoring object requires few parameters:<br> 1.model_name <br> 2.new relic insert key-https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#insights-insert-key <br>
##### Optional parameters:<br> 3.metadata dictonrary that will be added to each event (row) of the data<br>4.send_data_metrics- send datafame sammary to New Relic. False as defualt.  <br>5.features_columns- list of the features names in the same order as X<br>6.labels_columns- list of the labels names in the same order as y


In [117]:
metadata = {"environment": "aws", "dataset": "iris", "version": "1.0"}
monitor = MLPerformanceMonitoring(
    model_name="Iris RandomForestClassifier",
    insert_key=insert_key,
    metadata=metadata,
    send_data_metrics=True,
    features_columns=[
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
    ],
    labels_columns=["species"],
    staging=True,
)



#####  You can use the MLPerformanceMonitoring object in vairios ways:
##### 5.1.  Send your features and prediction as np.array. <br> In this case, the feature columns and the label columns  in new relic will be start with the prefix "feature_" and "lablel_" with numbers, respectively.

In [118]:
monitor.record_inference_data(X=X_test, y=y_pred, data_summary_min_rows=len(y))

inference data sent successfully




##### 5.2.  Send your features and prediction as pd.DataFrame. <br> In this case, the feature columns and the label columns in new relic will be the DataFrame columns names and will be start with the prefix "feature_" and "lablel_", respectively. <br> The paramter "inference_identifier" can be use of setting a unique inference_identifier for each event(row). Just set the relevent column name in the X DataFrame that need to be used as inference_identifier and this column will be name "inference_identifier" in New Relic.

In [119]:
X_df = pd.DataFrame(
    list(map(np.ravel, X_test)),
    columns=[
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
    ],
)

y_pred_df = pd.DataFrame(
    list(map(np.ravel, y_pred)),
    columns=["species"],
)
X_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,6.3,2.5,4.9,1.5
1,6.8,3.0,5.5,2.1
2,6.4,2.8,5.6,2.2
3,5.6,3.0,4.1,1.3
4,4.9,3.6,1.4,0.1


In [120]:
y_pred_df.head()

Unnamed: 0,species
0,1
1,2
2,2
3,1
4,0


In [121]:
import uuid

X_df["uuid"] = X_df.apply(lambda _: str(uuid.uuid4()), axis=1)

monitor.record_inference_data(
    X=X_df, y=y_pred_df, inference_identifier="uuid", calling_method="predict"
)

X_df.head()

inference data sent successfully




Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,uuid
0,6.3,2.5,4.9,1.5,a54b6946-bd78-4168-82ca-920062791f71
1,6.8,3.0,5.5,2.1,ccffa32d-adc5-41fe-ad42-bc43a4e2324b
2,6.4,2.8,5.6,2.2,8956305c-8702-4e48-8fd3-8504a1ea8794
3,5.6,3.0,4.1,1.3,d2130f15-33e1-4d3f-b43a-7658088e9b8b
4,4.9,3.6,1.4,0.1,dbf0bb9e-575e-46a7-b7cc-29b2e164ae9f


##### 5.3.  Use wrap_model() function to send your model or pipelin as parameter and use them as usual (fit, predict, ect.). This function will send your inference data and data_metrics automaticlly.

In [125]:
monitor_model = wrap_model(
    model_name="Iris RandomForestClassifier",
    insert_key=insert_key,
    metadata=metadata,
    staging=True,
    model=classifier,
)
y_pred = monitor_model.predict(
    X=X_df,
    inference_identifier="uuid",
)

inference data sent successfully




In [126]:
# from new_relic_ml_performance_monitoring.monitor import wrap_model

from sklearn.feature_selection import SelectKBest, chi2
from sklearn.pipeline import Pipeline

# Set up a pipeline with a feature selection preprocessor that
# selects the top 2 features to use.
# The pipeline then uses a RandomForestClassifier to train the model.

pipeline = Pipeline(
    [
        ("feature_selection", SelectKBest(chi2, k=2)),
        ("classification", RandomForestClassifier()),
    ]
)
pipeline.fit(X_train, y_train)

metadata = {"environment": "aws", "dataset": "iris", "version": "1.0"}
pipeline = wrap_model(
    insert_key=insert_key,
    model=pipeline,
    staging=True,
    model_name="Iris RandomForestClassifier",
    metadata=metadata,
)
y_pred = pipeline.predict(X_test)

inference data sent successfully




### 6. Record metrics to New Relic
#####  

Send your model metrics as a dictionary to new relic. You can send new metadata or the fuction use the metadata you set in the object creation. Also, a boolean parameter named "data_metric" can be used to idenify is those metrics are data metric (like mean and std of each feature) or model metrics (like accuracy and f1 score)

In [124]:
from sklearn import model_selection
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    f1_score,
    precision_score,
    recall_score,
)

# Model Evaluation
ac_sc = accuracy_score(y_test, y_pred)
rc_sc = recall_score(y_test, y_pred, average="weighted")
pr_sc = precision_score(y_test, y_pred, average="weighted")
f1_sc = f1_score(y_test, y_test, average="micro")

print(f"Accuracy    : {ac_sc}")
print(f"Recall      : {rc_sc}")
print(f"Precision   : {pr_sc}")
print(f"F1 Score    : {f1_sc}")


metrics = {
    "Accuracy": ac_sc,
    "Recall": rc_sc,
    "Precision": pr_sc,
    "F1 Score": f1_sc,
}
metrics
pipeline.record_metrics(metrics=metrics, data_metric=False)

Accuracy    :  1.0
Recall      :  1.0
Precision   :  1.0
F1 Score    :  1.0
model_metric sent successfully
