# Getting started with Superwise.ai on AWS Sagemaker

In this notebook, we will demonstrate how to integrate a Sagemaker based development workflow with Superwise.ai

Part I of this notebook walks you through building a classical Linear Regression model for predicting diamond prices, using Sci-kit learn on SageMaker. 
It is based on [AWS Sagemaker's example for building a Scikit-Learn model](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.ipynb).

Part II of this notebook will walk you through how to setup Superwise.ai to start tracking your model, by registering and providing a baseline for the model's behavior.

Part III will demonstrate how to send new predictions from your model to Superwise.ai, simulating a post-deployment scenario.

At this point, you should be able to start seeing insights from Superwise.ai in the web portal.


## Part I - building a Sagemaker Model to predict diamond prices

This is a classical LinearRegression mode, that uses a publicly available dataset.

This guide is based on the best practices from [AWS Sagemaker's example for building a Scikit-Learn model](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.ipynb)


### Prerequisites

1. A Superwise.ai account that enables you to login and view insights
2. A set of API keys for sending data to Superwise.ai 
3. Permissions to create models, training jobs and inference endpoints inside Sagemaker
4. Grant Superwise.ai permissions to your SageMaker bucket #soon to be removed

Note: this notebook works best when run from within a [Sagemaker's notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) instance

### Setup

In [None]:
# set the Superwise API Keys
%env SUPERWISE_CLIENT_NAME=
%env SUPERWISE_SECRET=
%env SUPERWISE_CLIENT_ID=

In [11]:
import datetime
import time
import tarfile


import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


import boto3
from sagemaker import get_execution_role
import sagemaker


sm_boto3 = boto3.client("sagemaker")

sess = sagemaker.Session()

region = sess.boto_session.region_name

bucket = sess.default_bucket()  # this could also be a hard-coded bucket name

print("Using bucket " + bucket)

Using bucket sagemaker-us-east-2-352030274094


### Prepare data
We load a dataset from sklearn, split it and send it to S3

In [12]:
# we use the Diamond prices dataset 
df = pd.read_csv('https://bit.ly/36gldOJ').drop(columns=['Unnamed: 0'])

In [13]:
print(df.shape)
df.head(3)

(53940, 10)


Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31


#### Dataset Description     
This dataset contains the prices and other attributes of almost 54,000 diamonds.  
- price price in US dollars (\$326--\$18,823)  
- carat weight of the diamond (0.2--5.01)  
- cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)  
- color diamond colour, from J (worst) to D (best)  
- clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))  
- x length in mm (0--10.74)   
- y width in mm (0--58.9)  
- z depth in mm (0--31.8)  
- depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)  


#### Split the data intro train and test sets

In [14]:
X = df.drop(columns="price")
y = df["price"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#### Upload the data into S3 so that Sagemaker can access it

In [15]:
train = X_train.copy()
train["price"] = y_train

test = X_test.copy()
test["price"] = y_test

train.to_csv("diamonds_train.csv")
test.to_csv("diamonds_test.csv")

# send data to S3. SageMaker will take training data from s3
trainpath = sess.upload_data(
    path="diamonds_train.csv", bucket=bucket, key_prefix="sagemaker/sklearncontainer"
)

testpath = sess.upload_data(
    path="diamonds_test.csv", bucket=bucket, key_prefix="sagemaker/sklearncontainer"
)


### Creating the Script for Training and Inference
The following script contains both training and inference functionality. 

It is designed so that it can run both in SageMaker Training hardware or locally (desktop, SageMaker notebook, on prem, etc). 

Detailed guidance [here](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#preparing-the-scikit-learn-training-script)

**Note**

In order to support the integration with Superwise.ai, our Inference API expects **a unique ID** to be sent along with the features for prediction.

Later on, this ID will be sent over to Superwise.ai, enabling analysis such as correlation between prediciton and late-arriving labels.


In [16]:
%%writefile script.py

import argparse
import joblib
import os
from io import BytesIO

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder


feature_column_names = [
    'carat',
    'cut',
    'color',
    'clarity', 
    'depth',
    'table',
    'x',
    'y',
    'z'
    ]

label_column = 'price'

id_column = 'id_col'


# inference functions
def model_fn(model_dir):
    clf = joblib.load(os.path.join(model_dir, "model.joblib"))
    return clf


def input_fn(input_data, content_type):
    """Parse input data payload

    We expect a serialized numpy array as input.
    The numpy array should contain, as the first column, a unique ID, followed by the feature values.
    """
    if content_type == "application/x-npy":
        # Read the raw input
        load_bytes = BytesIO(input_data)
        input_np = np.load(load_bytes, allow_pickle=True)
        num_columns = input_np.shape[1]
        
        # Expect the first column to be a unique ID 
        if num_columns != len(feature_column_names) + 1: 
            raise ValueError(f"payload needs to contain {len(feature_column_names) + 1} columns, with first column being the ID of the record")
        df_columns = [id_column] + feature_column_names
        df = pd.DataFrame(data=input_np, columns=df_columns) 
        return df
    else:
        raise ValueError(f"content type {content_type} is not supported by this inference endpoint. Please send a legal application/x-npy payload")

        
# TODO - check if need to roll it back to the client or will SM track it as part of the input-output correlation
                         
def predict_fn(input_data, model):
    # predict using the features
    prediction = model.predict(input_data[feature_column_names])
    # Concatenate the ID column to the predictions to keep track 
    return np.stack((input_data[id_column], prediction), axis=1 )


# Model training function
if __name__ == "__main__":

    print("extracting arguments")
    parser = argparse.ArgumentParser()
    
    
    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--n-jobs", type=int, default=1)
    
    # Data, model, and output directories
    parser.add_argument("--model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--train-file", type=str, default="diamonds_train.csv")
    parser.add_argument("--test-file", type=str, default="diamonds_test.csv")
    

    args, _ = parser.parse_known_args()

    print("reading data")
    train_df = pd.read_csv(os.path.join(args.train, args.train_file))
    test_df = pd.read_csv(os.path.join(args.test, args.test_file))

    # preprocessing

    
    print(train_df.columns)
    X_train = train_df[feature_column_names]
    y_train = train_df[label_column]
    
    print(X_train.head(3))
    X_test = test_df[feature_column_names]
    y_test = test_df[label_column]
    
    
    categorical_cols = ['cut','clarity','color']
    preprocessor = ColumnTransformer( 
        transformers=[
                    ('categorical',  OneHotEncoder(), categorical_cols)
                    ], remainder='passthrough')
    
    # train
    
    diamond_price_model=LinearRegression(n_jobs=args.n_jobs)

    my_pipeline = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('model', diamond_price_model)
        ])

    print("training model")
    
    my_pipeline.fit(X_train, y_train)
    
    
    train_predictions =  my_pipeline.predict(X_train)
    print(train_predictions)
    
    # print abs error
    print("validating model")
    abs_err = np.abs(my_pipeline.predict(X_test) - y_test)
    print(f"Test prediction error:{abs_err} ")

    # persist model
    path = os.path.join(args.model_dir, "model.joblib")
    joblib.dump(my_pipeline, path)
    print("model persisted at " + path)
    
   
    

Overwriting script.py


### Local training
Test the training locally using the script file


In [17]:
! python script.py --n-jobs 1 \
                   --model-dir ./ \
                   --train ./ \
                   --test ./ \

extracting arguments
reading data
Index(['Unnamed: 0', 'carat', 'cut', 'color', 'clarity', 'depth', 'table', 'x',
       'y', 'z', 'price'],
      dtype='object')
   carat    cut color clarity  depth  table     x     y     z
0   1.21  Ideal     H    VVS2   61.3   57.0  6.92  6.87  4.23
1   0.31  Ideal     E     VS2   62.0   56.0  4.38  4.36  2.71
2   1.21  Ideal     E     VS1   62.4   57.0  6.75  6.83  4.24
training model
[8059.33067384  606.62346058 8567.48039848 ... 1739.45892698 2501.07703433
 6787.8108665 ]
validating model
Test prediction error:0         152.565705
1         996.404196
2         705.029482
3         776.075619
4        2970.562344
            ...     
16177      65.877016
16178     198.983953
16179    2547.207129
16180    1145.708761
16181     670.888210
Name: price, Length: 16182, dtype: float64 
model persisted at ./model.joblib


### Local inference

Test the inference flow:
1. load model
2. process raw input (serialized numpy array)
3. predict and return results 

**Note** 

We attach a unique ID per instance for the prediction.
For the purpose of demonstration, we use the Dataframe index as the ID of the instances.

In a production setting, you should use an ID that has semantic meaning in the context of your application, such as transaction_id etc.


In [18]:
import script

In [19]:
local_model = script.model_fn(".")

In [20]:
from io import BytesIO
rows = X_train.head(10)
#convert the Dataframe index to a string ID
indexes = rows.index.map(str)
inference_input = rows.to_numpy()

#attach the ID to the payload
inference_input = np.insert(inference_input, 0, indexes, axis=1)
np_bytes = BytesIO()
# Serialize the payload
np.save(np_bytes, inference_input, allow_pickle=True)


In [21]:
input_data = script.input_fn(np_bytes.getvalue(), "application/x-npy")
input_data

Unnamed: 0,id_col,carat,cut,color,clarity,depth,table,x,y,z
0,19497,1.21,Ideal,H,VVS2,61.3,57,6.92,6.87,4.23
1,31229,0.31,Ideal,E,VS2,62.0,56,4.38,4.36,2.71
2,22311,1.21,Ideal,E,VS1,62.4,57,6.75,6.83,4.24
3,278,0.81,Ideal,F,SI2,62.6,55,5.92,5.96,3.72
4,6646,0.79,Ideal,I,VVS2,61.7,56,5.94,5.95,3.67
5,30711,0.33,Ideal,E,VS2,61.1,55,4.47,4.51,2.74
6,17532,1.0,Premium,F,VS1,59.2,60,6.53,6.48,3.85
7,6352,0.9,Very Good,E,SI1,60.1,58,6.25,6.29,3.77
8,18684,1.06,Ideal,F,VS2,61.4,56,6.56,6.61,4.04
9,53429,0.7,Ideal,F,SI1,60.7,57,5.78,5.82,3.52


In [22]:
predictions = script.predict_fn(input_data, local_model)
predictions

array([['19497', 8059.3306738397],
       ['31229', 606.623460581668],
       ['22311', 8567.480398475895],
       ['278', 3030.1949027971514],
       ['6646', 3861.8104740338913],
       ['30711', 824.5124921334345],
       ['17532', 6442.444361816638],
       ['6352', 4712.366279256058],
       ['18684', 6793.556467932655],
       ['53429', 2978.5136692657784]], dtype=object)

### SageMaker Training

#### Launching a training job with the Python SDK

In [23]:
# We use the Estimator from the SageMaker Python SDK
from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = "0.23-1"

sklearn_estimator = SKLearn(
    entry_point="script.py",
    role=get_execution_role(),
    instance_count=1,
    instance_type="ml.m5.large",
    framework_version=FRAMEWORK_VERSION,
    base_job_name="Superwise-Sagemaker-job",
    hyperparameters={
        "n-jobs": 1,
        },
)

In [24]:
# launch training job, with asynchronous call
sklearn_estimator.fit({"train": trainpath, "test": testpath}, wait=False)

In [25]:
sklearn_estimator.latest_training_job.wait(logs="None")
artifact = sm_boto3.describe_training_job(
    TrainingJobName=sklearn_estimator.latest_training_job.name
)["ModelArtifacts"]["S3ModelArtifacts"]

print("Model artifact persisted at " + artifact)


2021-10-11 08:18:34 Starting - Starting the training job
2021-10-11 08:18:36 Starting - Launching requested ML instances.........
2021-10-11 08:19:27 Starting - Preparing the instances for training................
2021-10-11 08:20:53 Downloading - Downloading input data...
2021-10-11 08:21:14 Training - Downloading the training image.....
2021-10-11 08:21:43 Training - Training image download completed. Training in progress..
2021-10-11 08:21:54 Uploading - Uploading generated training model.
2021-10-11 08:22:01 Completed - Training job completed
Model artifact persisted at s3://sagemaker-us-east-2-352030274094/Superwise-Sagemaker-job-2021-10-11-08-18-34-286/output/model.tar.gz


### Deploy to a real-time endpoint

Create a Sagemaker *Model* from s3 artifacts, and deploy it to an Endpoint

In [26]:
from sagemaker.sklearn.model import SKLearnModel

model = SKLearnModel(
    name="Superwise-Sagemaker-demo-model-2",
    model_data=artifact,
    role=get_execution_role(),
    entry_point="script.py",
    framework_version=FRAMEWORK_VERSION,
)

In [27]:
predictor = model.deploy(instance_type="ml.t2.medium", initial_instance_count=1)

--------------!

### Test the Endpoint by running an Inference request

In [28]:
# Prepare the input as a numpy array with record_id, feature 1... feature n

# Treat the Dataframe index as the record_id
indexes = X_train.index.map(str)
X_train_inference = X_train.to_numpy()
X_train_inference = np.insert(X_train_inference, 0, indexes, axis=1)

In [29]:
predictions = predictor.predict(X_train_inference)

# Returns a column with ID and the prediction value
predictions

array([['19497', 8059.3306738397],
       ['31229', 606.623460581668],
       ['22311', 8567.480398475895],
       ...,
       ['38158', 1739.458926984018],
       ['860', 2501.0770343272297],
       ['15795', 6787.810866496322]], dtype=object)

## Part II - Tracking the model with Superwise.ai


### Setup

1. Install the Superwise Python package from pip
2. Set environment variables with the API keys
3. Create a Superwise client

In [30]:
%pip install superwise

Note: you may need to restart the kernel to use updated packages.


In [31]:
%env SUPERWISE_CLIENT_NAME=integration
%env SUPERWISE_SECRET=34b75b3e-4b17-404a-b6ef-b7194c6f2711
%env SUPERWISE_CLIENT_ID=cf7011ea-4374-40ef-89b1-bfefebe96ad2


env: SUPERWISE_CLIENT_NAME=integration
env: SUPERWISE_SECRET=34b75b3e-4b17-404a-b6ef-b7194c6f2711
env: SUPERWISE_CLIENT_ID=cf7011ea-4374-40ef-89b1-bfefebe96ad2


In [32]:

from superwise import Superwise
from superwise.models.task import Task
from superwise.models.version import Version
from superwise.models.data_entity import DataEntity
from superwise.resources.superwise_enums import TaskTypes,FeatureType,DataTypesRoles

sw = Superwise()



### Create a Superwise *Task*

A *Task* represents a domain problem.
In our case, the task is to predict diamond prices.

Over time, we may develop and deploy different ML models that attempt to address this task.

In Superwise.ai terminology, each specific ML model we wish to track is called a *Version*. 
There may be multiple versions belonging to a *task* being tracked at any point in time (e.g. new models in shadow mode, or A/B tests)



In [33]:

# Create the Task entity
diamond_task =Task(
    task_type=TaskTypes.REGRESSION,
    title="Superwise-Sagemaker-demo-model-4",
    task_description="4.0",
    monitor_delay=1)
my_task = sw.task.create(diamond_task)

### Create a Baseline for our deployed model

We've just deployed a model to Sagemaker, and wish to start tracking it.
In order to perform the analysis of the model's performance over time, we need to set up a Baseline for the model's behavior.

It's a common practice to use the training or test data (both features and predictions)as the baseline, as they represent 
The state which we consider stable and validated.

Later, when the model performs predictions in production, we can compare the data and prediction behavior to the baseline, and detect drift.

The baseline data includes:

1. Features
2. Labels
3. Model predictions
4. Timestamp of inference
5. Id for each record (later used to correlate predictions with labels)

In [34]:
# add the prediction value, a timestamp and the label to the training features
baseline_data = X_train.assign(prediction=predictions[:,1].astype(float),ts=pd.Timestamp.now(),price=y_train)

# treat the Dataframe index as a record ID - for demonstration purpose only. 
baseline_df = baseline_data.reset_index().rename(columns={"index": "record_id"})

In [35]:
baseline_df

Unnamed: 0,record_id,carat,cut,color,clarity,depth,table,x,y,z,prediction,ts,price
0,19497,1.21,Ideal,H,VVS2,61.3,57.0,6.92,6.87,4.23,8059.330674,2021-10-11 08:47:30.706706,8131
1,31229,0.31,Ideal,E,VS2,62.0,56.0,4.38,4.36,2.71,606.623461,2021-10-11 08:47:30.706706,756
2,22311,1.21,Ideal,E,VS1,62.4,57.0,6.75,6.83,4.24,8567.480398,2021-10-11 08:47:30.706706,10351
3,278,0.81,Ideal,F,SI2,62.6,55.0,5.92,5.96,3.72,3030.194903,2021-10-11 08:47:30.706706,2795
4,6646,0.79,Ideal,I,VVS2,61.7,56.0,5.94,5.95,3.67,3861.810474,2021-10-11 08:47:30.706706,4092
...,...,...,...,...,...,...,...,...,...,...,...,...,...
37753,11284,1.05,Very Good,I,VS2,62.4,59.0,6.48,6.51,4.05,5309.736888,2021-10-11 08:47:30.706706,4975
37754,44732,0.47,Ideal,D,VS1,61.0,55.0,5.03,5.01,3.06,2373.297217,2021-10-11 08:47:30.706706,1617
37755,38158,0.33,Very Good,F,IF,60.3,58.0,4.49,4.46,2.70,1739.458927,2021-10-11 08:47:30.706706,1014
37756,860,0.90,Premium,J,SI1,62.8,59.0,6.13,6.03,3.82,2501.077034,2021-10-11 08:47:30.706706,2871


### Create a *Schema* object that describes the format and sematics of our Baseline data

The Schema object helps Superwise.ai interpret our data, for example - undertand which column prepresents predictions and which represents the labels.

In [36]:
schema = [
    DataEntity(name="x", type=FeatureType.NUMERIC, role=DataTypesRoles.FEATURE), #features...
    DataEntity(name="y", type=FeatureType.NUMERIC,  role=DataTypesRoles.FEATURE),
    DataEntity(name="z", type=FeatureType.NUMERIC, role=DataTypesRoles.FEATURE),
    DataEntity(name="table", type=FeatureType.NUMERIC, role=DataTypesRoles.FEATURE),
    DataEntity(name="depth", type=FeatureType.NUMERIC,  role=DataTypesRoles.FEATURE),
    DataEntity(name="clarity", type=FeatureType.CATEGORICAL,  role=DataTypesRoles.FEATURE),
    DataEntity(name="color", type=FeatureType.CATEGORICAL, role=DataTypesRoles.FEATURE),
    DataEntity(name="cut", type=FeatureType.CATEGORICAL,  role=DataTypesRoles.FEATURE),
    DataEntity(name="carat", type=FeatureType.NUMERIC,role=DataTypesRoles.FEATURE),
    DataEntity(name="price", type=FeatureType.NUMERIC,  role=DataTypesRoles.LABEL), # labels. When creating baseline, we use the training labels
    DataEntity(name="prediction", type=FeatureType.NUMERIC, role=DataTypesRoles.PREDICTION_VALUE), #model predictions
    DataEntity(name="ts", type=FeatureType.TIMESTAMP,role=DataTypesRoles.TIMESTAMP), #timestamp of the inference. for baseline we use now()
    DataEntity(name="record_id", type=FeatureType.CATEGORICAL, role=DataTypesRoles.ID) #the ID field
]

### Create a *Version* object

As explained above, a *Version* represents a concrete ML model we are tracking.

A *Version* solves a *Task*

A *Version* has a *Baseline*




In [66]:
diamond_version = Version(
    task_id=my_task.id,
    version_name="5.0",
    baseline_df=baseline_df,
    data_entities=schema,
)
my_version = sw.version.create(diamond_version, wait_until_complete=True)


In [67]:
sw.version.activate(my_version.id)


<Response [204]>

### Verifying the setup

TODO: screenshots of the UI - currently not working

## Part III - monitoring ongoing predictions

Now that we have a *Version* of the model setup with a *Baseline*, we can start sending ongoing model predictions to Superwise to monitor the model's performance in a production settings.

For this demo, we will treat the Test split of the data as our "ongoing predictions".


In [50]:
# insert the record ID as part of the input payload to the model
indexes = X_test.index.map(str)
X_test_inference = X_test.to_numpy()
X_test_inference = np.insert(X_test_inference, 0, indexes, axis=1)
X_test_inference



array([['1388', 0.24, 'Ideal', ..., 3.97, 4.0, 2.47],
       ['50052', 0.58, 'Very Good', ..., 5.44, 5.42, 3.26],
       ['41645', 0.4, 'Ideal', ..., 4.76, 4.74, 2.95],
       ...,
       ['24786', 1.51, 'Premium', ..., 7.42, 7.38, 4.5],
       ['1332', 0.71, 'Good', ..., 5.8, 5.9, 3.44],
       ['42527', 0.4, 'Ideal', ..., 4.79, 4.77, 2.9]], dtype=object)

In [51]:
predictions = predictor.predict(X_test_inference)
predictions

array([['1388', 711.5657050055879],
       ['50052', 3197.4041957920304],
       ['41645', 1943.0294818178354],
       ...,
       ['24786', 10609.792871186859],
       ['1332', 4105.708761255265],
       ['42527', 1993.888210273787]], dtype=object)

In [57]:
# Note: we provide the column names we declared in the Schema object, so that Superwise.ai will be able to interpret the data
ongoing_predictions = pd.DataFrame(data=predictions, columns=["record_id", "prediction"])

#Append the *Task* and *Version* Id's to uniquely identify the data origin
ongoing_predictions["task_id"] = my_task.id
ongoing_predictions["version_id"] = my_version.id
ongoing_predictions

Unnamed: 0,record_id,prediction,task_id,version_id
0,1388,711.566,35,9
1,50052,3197.4,35,9
2,41645,1943.03,35,9
3,42377,2080.08,35,9
4,17244,9871.56,35,9
...,...,...,...,...
16177,29577,639.123,35,9
16178,12564,5476.98,35,9
16179,24786,10609.8,35,9
16180,1332,4105.71,35,9


In [58]:
# Store the model predictions in a CSV and upload it to S3
ongoing_predictions.to_csv("test_predictions_log.csv")
logs = sess.upload_data(
    path="test_predictions_log.csv", bucket=bucket, key_prefix="sagemaker/sw_predictions"
)

In [59]:
# Upload the stored predictions to Superwise.ai
sw.data.log_file(logs)

### Optional - report ongoing lables to Superwise.ai

In some cases, our system is able to gather "ground truth" labels for it's predictions.
Often, this happens later on, after the prediciton was already given.

By sending these labels to Superwise.ai, we add another important layer of data to our monitoring solution.

For the purpose of this demo, we can use the test set's labels as the ground truth, simulating a label we collected in production.


In [62]:
# Note: we provide the column names we declared in the Schema object, so that Superwise.ai will be able to interpret the data
indexes = y_test.index.map(str)
ground_truth = pd.DataFrame(data=y_test, columns=['price'])
ground_truth['record_id'] = indexes
ground_truth["task_id"] = my_task.id
ground_truth["version_id"] = my_version.id

In [63]:
ground_truth

Unnamed: 0,price,record_id,task_id,version_id
1388,559,1388,35,9
50052,2201,50052,35,9
41645,1238,41645,35,9
42377,1304,42377,35,9
17244,6901,17244,35,9
...,...,...,...,...
29577,705,29577,35,9
12564,5278,12564,35,9
24786,13157,24786,35,9
1332,2960,1332,35,9


In [64]:
# Save the lables to a .csv and store it on S3
ground_truth.to_csv("test_labels_log.csv")
logs = sess.upload_data(
    path="test_labels_log.csv", bucket=bucket, key_prefix="sagemaker/sw_predictions"
)

In [65]:
# Report the labels to Superwise.ai
sw.data.log_file(logs)

### Verifying the setup
TODO: screenshots of the UI - currently not working

## Don't forget to delete the endpoint !

In [68]:
sm_boto3.delete_endpoint(EndpointName=predictor.endpoint)

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{'ResponseMetadata': {'RequestId': 'f45eb4e1-68a9-464d-b327-e2f8ea3dd8f4',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f45eb4e1-68a9-464d-b327-e2f8ea3dd8f4',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Mon, 11 Oct 2021 09:25:08 GMT'},
  'RetryAttempts': 0}}