# Amazon SageMaker BATCH Transform examples
With [Amazon SageMaker Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html), Use batch transform when you need to do the following:

    Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.

    Get inferences from large datasets.

    Run inference when you don't need a persistent endpoint.

    Associate input records with inferences to assist the interpretation of results.

To filter input data before performing inferences or to associate input records with inferences about those records, see Associate Prediction Results with Input Records. For example, you can filter input data to provide context for creating and interpreting reports about the output data.For applications which require consistently low inference latency, a traditional endpoint is still the best choice.

To split input files into mini-batches when you create a batch transform job, set the SplitType parameter value to Line. If SplitType is set to None or if an input file can't be split into mini-batches, SageMaker uses the entire input file in a single request. Note that Batch Transform doesn't support CSV-formatted input that contains embedded newline characters. You can control the size of the mini-batches by using the BatchStrategy and MaxPayloadInMB parameters. MaxPayloadInMB must not be greater than 100 MB. If you specify the optional MaxConcurrentTransforms parameter, then the value of (MaxConcurrentTransforms * MaxPayloadInMB) must also not exceed 100 MB.

If the batch transform job successfully processes all of the records in an input file, it creates an output file with the same name and the .out file extension. For multiple input files, such as input1.csv and input2.csv, the output files are named input1.csv.out and input2.csv.out. The batch transform job stores the output files in the specified location in Amazon S3, such as s3://awsexamplebucket/output/.



![](./cw_charts/BatchTransform.png)





### Contents

1. [Generate synthetic data for housing models](#Generate-synthetic-data-for-housing-models)
1. [TRANSFORM the raw housing data using Scikit Learn model](#Preprocess-synthetic-housing-data-using-scikit-learn)
1. [Clean up](#CleanUp)


## Section 1 - Generate synthetic data for housing models <a id='Generate-synthetic-data-for-housing-models'></a>

In this section, you will generate synthetic data that will be used to train the linear learner models.  The data generated consists of 6 numerical features - the year the house was built in, house size in square feet, number of bedrooms, number of bathroom, the lot size and number of garages and two categorial features - deck and front_porch.  

In [54]:
import numpy as np
import pandas as pd
import json
import datetime
import time
import boto3
import sagemaker
import os

from time import gmtime, strftime
from random import choice

from sagemaker import get_execution_role

from sagemaker.multidatamodel import MULTI_MODEL_CONTAINER_MODE
from sagemaker.multidatamodel import MultiDataModel

from sklearn.model_selection import train_test_split

In [55]:
NUM_HOUSES_PER_LOCATION = 1000
LOCATIONS  = ['NewYork_NY',    'LosAngeles_CA',   'Chicago_IL',    'Houston_TX',   'Dallas_TX',
              'Phoenix_AZ',    'Philadelphia_PA', 'SanAntonio_TX', 'SanDiego_CA',  'SanFrancisco_CA']
MAX_YEAR = 2019

In [56]:
def gen_price(house):
    """Generate price based on features of the house"""
    
    if house['FRONT_PORCH'] == 'y':
        garage = 1
    else:
        garage = 0
        
    if house['FRONT_PORCH'] == 'y':
        front_porch = 1
    else:
        front_porch = 0
        
    price = int(150 * house['SQUARE_FEET'] + \
                10000 * house['NUM_BEDROOMS'] + \
                15000 * house['NUM_BATHROOMS'] + \
                15000 * house['LOT_ACRES'] + \
                10000 * garage + \
                10000 * front_porch + \
                15000 * house['GARAGE_SPACES'] - \
                5000 * (MAX_YEAR - house['YEAR_BUILT']))
    return price

In [57]:
def gen_yes_no():
    """Generate values (y/n) for categorical features"""
    answer = choice(['y', 'n'])
    return answer

In [58]:
def gen_random_house():
    """Generate a row of data (single house information)"""
    house = {'SQUARE_FEET':    np.random.normal(3000, 750),
             'NUM_BEDROOMS':  np.random.randint(2, 7),
             'NUM_BATHROOMS': np.random.randint(2, 7) / 2,
             'LOT_ACRES':     round(np.random.normal(1.0, 0.25), 2),
             'GARAGE_SPACES': np.random.randint(0, 4),
             'YEAR_BUILT':    min(MAX_YEAR, int(np.random.normal(1995, 10))),
             'FRONT_PORCH':   gen_yes_no(),
             'DECK':          gen_yes_no()
            }
    
    price = gen_price(house)
    
    return [house['YEAR_BUILT'],   
            house['SQUARE_FEET'], 
            house['NUM_BEDROOMS'], 
            house['NUM_BATHROOMS'], 
            house['LOT_ACRES'],    
            house['GARAGE_SPACES'],
            house['FRONT_PORCH'],    
            house['DECK'], 
            price]

In [59]:
def gen_houses(num_houses):
    """Generate housing dataset"""
    house_list = []
    
    for _ in range(num_houses):
        house_list.append(gen_random_house())
        
    df = pd.DataFrame(
        house_list, 
        columns=[
            'YEAR_BUILT',    
            'SQUARE_FEET',  
            'NUM_BEDROOMS',            
            'NUM_BATHROOMS',
            'LOT_ACRES',
            'GARAGE_SPACES',
            'FRONT_PORCH',
            'DECK', 
            'PRICE']
    )
    return df

In [60]:
def save_data_locally(location, train, test): 
    """Save the housing data locally"""
    os.makedirs('data/{0}/train'.format(location), exist_ok=True)
    train.to_csv('data/{0}/train/train.csv'.format(location), sep=',', header=False, index=False)
       
    os.makedirs('data/{0}/test'.format(location), exist_ok=True)
    test.to_csv('data/{0}/test/test.csv'.format(location), sep=',', header=False, index=False) 

In [61]:
#Generate housing data for multiple locations.
#Change "PARALLEL_TRAINING_JOBS " to a lower number to limit the number of training jobs and models. Or to a higher value to experiment with more models.

PARALLEL_TRAINING_JOBS = 1

for loc in LOCATIONS[:PARALLEL_TRAINING_JOBS]:
    houses = gen_houses(NUM_HOUSES_PER_LOCATION)
    
    #Spliting data into train and test in 90:10 ratio
    #Not splitting the train data into train and val because its not preprocessed yet
    train, test = train_test_split(houses, test_size=0.1)
    save_data_locally(loc, train, test)


In [62]:
#Shows the first few lines of data.
houses.head()

Unnamed: 0,YEAR_BUILT,SQUARE_FEET,NUM_BEDROOMS,NUM_BATHROOMS,LOT_ACRES,GARAGE_SPACES,FRONT_PORCH,DECK,PRICE
0,2012,2118.932793,5,1.0,0.47,2,y,n,404889
1,1996,3483.347786,3,1.0,0.73,3,n,y,508452
2,1995,3552.208136,6,1.5,0.83,0,y,y,527781
3,1983,3040.415703,6,1.0,1.12,3,y,y,432862
4,1979,2717.399624,2,2.5,1.2,3,y,n,348109


## Section 2 - Preprocess the raw housing data using Scikit Learn <a id='Preprocess-synthetic-housing-data-using-scikit-learn'></a>

In this section, the categorical features of the data (deck and porch) are pre-processed using sklearn to convert them to one hot encoding representation.  

#### We launch 4 PARALLEL jobs and hence we seemingly create 4 Transformers, but in reality they are just the SAME estimator being run again

In [99]:
import joblib

In [124]:
%%writefile scripts/sklearn_preprocessor_batch.py
from __future__ import print_function

import argparse
import csv
import json
import os
import shutil
import sys
import time
from io import StringIO

import numpy as np
import pandas as pd
from sagemaker_containers.beta.framework import (
    content_types,
    encoders,
    env,
    modules,
    transformer,
    worker,
)
from sklearn.compose import ColumnTransformer

from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Binarizer, OneHotEncoder, StandardScaler

from sklearn.externals import joblib

# Since we get a headerless CSV file we specify the column names here.
feature_columns_names = [
    "YEAR_BUILT",
    "SQUARE_FEET",
    "NUM_BEDROOMS",
    "NUM_BATHROOMS",
    "LOT_ACRES",
    "GARAGE_SPACES",
    "FRONT_PORCH",
    "DECK",
]

label_column = "PRICE"

feature_columns_dtype = {
    "YEAR_BUILT": str,
    "SQUARE_FEET": np.float64,
    "NUM_BEDROOMS": np.float64,
    "NUM_BATHROOMS": np.float64,
    "LOT_ACRES": np.float64,
    "GARAGE_SPACES": np.float64,
    "FRONT_PORCH": str,
    "DECK": str,
}

label_column_dtype = {"PRICE": np.float64}


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
    parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
    parser.add_argument("--train", type=str, default=os.environ["SM_CHANNEL_TRAIN"])

    args = parser.parse_args()

    # Take the set of files and read them all into a single pandas dataframe
    input_files = [os.path.join(args.train, file) for file in os.listdir(args.train)]
    if len(input_files) == 0:
        raise ValueError(
            (
                "There are no files in {}.\n"
                + "This usually indicates that the train channel was incorrectly specified,\n"
                + "the data specification in S3 was incorrectly specified or the role specified\n"
                + "does not have permission to access the data.".format(args.train)
            )
        )

    for file in input_files:
        print("file :", file)

    raw_data = [pd.read_csv(file, header=None, names=feature_columns_names + [label_column])]

    concat_data = pd.concat(raw_data)

    print(concat_data)

    # This section is adapted from the scikit-learn example of using preprocessing pipelines:
    #
    # https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html
    #

    numeric_features = list(feature_columns_names)
    numeric_features.remove("FRONT_PORCH")
    numeric_features.remove("DECK")
    numeric_transformer = Pipeline(steps=[("scaler", StandardScaler())])

    categorical_features = ["FRONT_PORCH", "DECK"]
    categorical_transformer = Pipeline(steps=[("onehot", OneHotEncoder(handle_unknown="ignore"))])

    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, numeric_features),
            ("cat", categorical_transformer, categorical_features),
        ],
        remainder="drop",
    )

    preprocessor.fit(concat_data)
    
    joblib.dump(preprocessor, os.path.join(args.model_dir, "model.joblib"))

    print("saved model!")


def input_fn(input_data, content_type):
    """Parse input data payload

    We currently only take csv input. Since we need to process both labelled
    and unlabelled data we first determine whether the label column is present
    by looking at how many columns were provided.
    """
    if content_type == "text/csv":
        # Read the raw input data as CSV.
        df = pd.read_csv(StringIO(input_data), header=None)

        if len(df.columns) == len(feature_columns_names) + 1:
            # This is a labelled example, includes the ring label
            df.columns = feature_columns_names + [label_column]
        elif len(df.columns) == len(feature_columns_names):
            # This is an unlabelled example.
            df.columns = feature_columns_names

        return df
    else:
        raise ValueError("{} not supported by script!".format(content_type))


def output_fn(prediction, accept):
    """Format prediction output

    The default accept/content-type between containers for serial inference is JSON.
    We also want to set the ContentType or mimetype as the same value as accept so the next
    container can read the response payload correctly.
    """
    if accept == "application/json":
        instances = []
        for row in prediction.tolist():
            instances.append({"features": row})

        json_output = {"instances": instances}

        return worker.Response(json.dumps(json_output), mimetype=accept)
    elif accept == "text/csv":
        return worker.Response(encoders.encode(prediction, accept), mimetype=accept)
    else:
        raise RuntimeException("{} accept type is not supported by this script.".format(accept))


def predict_fn(input_data, model):
    """Preprocess input data

    We implement this because the default uses .predict(), but our model is a preprocessor
    so we want to use .transform().

    The output is returned in the following order:

        rest of features either one hot encoded or standardized
    """

    print("Input data type ", type(input_data))

    print(input_data)

    features = model.transform(input_data)

    print("features type ", type(features))

    print(features)

    features_array = features

    print("features_array ", type(features_array))

    print(features_array)

    if label_column in input_data:
        # Return the label (as the first column) and the set of features.
        return np.insert(features_array, 0, input_data[label_column], axis=1)
    else:
        # Return only the set of features
        return features


def model_fn(model_dir):
    """Deserialize fitted model"""
    preprocessor = joblib.load(os.path.join(model_dir, "model.joblib"))
    return preprocessor



Overwriting scripts/sklearn_preprocessor_batch.py


In [125]:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
sagemaker_session = sagemaker.Session()

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

BUCKET  = sagemaker_session.default_bucket()
print("BUCKET : ", BUCKET)

role = get_execution_role()
print("ROLE : ", role)

ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION = boto3.Session().region_name

DATA_PREFIX = 'DEMO_MME_LINEAR_LEARNER'
HOUSING_MODEL_NAME = 'housing'
MULTI_MODEL_ARTIFACTS = 'multi_model_artifacts'

BUCKET :  sagemaker-us-east-1-622343165275
ROLE :  arn:aws:iam::622343165275:role/service-role/AmazonSageMaker-ExecutionRole-20220208T115633


In [126]:
#Create the SKLearn estimator with the sklearn_preprocessor.py as the script
from sagemaker.sklearn.estimator import SKLearn

script_path = 'scripts/sklearn_preprocessor_batch.py'

sklearn_estimator = SKLearn(
    entry_point=script_path,
    role=role,
    instance_type="ml.c4.xlarge",
    framework_version="0.20.0",
    sagemaker_session=sagemaker_session)

In [127]:
#Upload the raw training data to S3 bucket, to be accessed by SKLearn
train_inputs = []

for loc in LOCATIONS[:PARALLEL_TRAINING_JOBS]:

    train_input = sagemaker_session.upload_data(
        path='data/{}/train/train.csv'.format(loc),
        bucket=BUCKET,
        key_prefix='housing-data/{}/train'.format(loc)
    )
    
    train_inputs.append(train_input)
    print("Raw training data uploaded to : ", train_input)

Raw training data uploaded to :  s3://sagemaker-us-east-1-622343165275/housing-data/NewYork_NY/train/train.csv


In [128]:
##Launch multiple scikit learn training to process the raw synthetic data generated for multiple locations.
##Before executing this, take the training instance limits in your account and cost into consideration.

sklearn_estimators = []
sklearn_estimator_jobs = []

for index, loc in enumerate(LOCATIONS[:PARALLEL_TRAINING_JOBS]):
    print("sklearn_estimator fit input data at ", index , " for loc ", loc)
     
    job_name='scikit-learnestimator-{}'.format(strftime('%Y-%m-%d-%H-%M-%S', gmtime()))
    
    sklearn_estimator.fit({'train': train_inputs[index]}, job_name=job_name, wait=False)

    sklearn_estimators.append(sklearn_estimator)
    sklearn_estimator_jobs.append(job_name)
    
    time.sleep(1)

sklearn_estimator fit input data at  0  for loc  NewYork_NY


In [130]:
#Wait for the preprocessor jobs to finish
for job_name in sklearn_estimator_jobs:
    print('Waiting for job {} to complete...'.format(job_name))
    
    waiter = sm_client.get_waiter('training_job_completed_or_stopped')
    waiter.wait(TrainingJobName=job_name)

Waiting for job scikit-learnestimator-2022-08-04-16-41-55 to complete...


## Section 2 Bring your own Model
Here we will work on the tar file created from the training job and create all 
needed jobs definetions from scratch and run the transform job. We will run 4 different kinds

    Run with the Input filter not There so only generate predictions
    Run with Input filter values so we can generate predictions and also combine to the outputs
    Run with Mini batch and instance count > 1

#### 2a ) We show how to create 'n' BATCH Job Transformer from the Estimator Object
All of these will run in Parallel there by saving time but leverage the same model which has been trained
Here we already have the Estimator with the Inference file with the model definetions
we will leverage that to create a transformer and run

In [148]:
PARALLEL_BATCH_JOBS=1

In [149]:
##Once the preprocessor is fit, use tranformer to preprocess the raw training data and store the transformed data right back into s3.
##Before executing this, take the training instance limits in your account and cost into consideration.

sklearn_estimator_transformers = []

for index, loc in enumerate(LOCATIONS[:PARALLEL_BATCH_JOBS]):
    print("Transform the raw data at ", index , " for loc ", loc)
       
    sklearn_estimator = sklearn_estimators[index]
    
    transformer = sklearn_estimator.transformer(
        instance_count=1,
        instance_type='ml.m4.xlarge',
        assemble_with='Line',
        accept='text/csv'
    )
    
    sklearn_estimator_transformers.append(transformer)

Transform the raw data at  0  for loc  NewYork_NY


In [150]:
# Preprocess training input
preprocessed_train_data_path = []

for index, transformer in enumerate(sklearn_estimator_transformers):
    transformer.transform(train_inputs[index], content_type='text/csv', wait=False)
    print('STARTING: batch transform job: {}'.format(transformer.latest_transform_job.job_name))
    preprocessed_train_data_path.append(transformer.output_path)

STARTING: batch transform job: sagemaker-scikit-learn-2022-08-04-17-28-50-218


In [69]:
#Wait for all the batch transform jobs to finish
for transformer in sklearn_estimator_transformers: 
    job_name=transformer.latest_transform_job.job_name
    print('Waiting for TRANSFORM job {} to complete...'.format(job_name))
    
    waiter = sm_client.get_waiter('transform_job_completed_or_stopped')
    waiter.wait(TransformJobName=job_name)

Waiting for TRANSFORM job sagemaker-scikit-learn-2022-08-04-00-14-06-343 to complete...


## Section 2b ) Bring your own Model as tar ball in S3
Here we will use the Tar ball as is and then create all the required artifacts from scratch

In [144]:
print(f"Using location of the TAR ball from {sklearn_estimators[0].model_data}")

Using location of the TAR ball from s3://sagemaker-us-east-1-622343165275/scikit-learnestimator-2022-08-04-16-41-55/output/model.tar.gz


In [131]:
sklearn_estimators[0].model_data

's3://sagemaker-us-east-1-622343165275/scikit-learnestimator-2022-08-04-16-41-55/output/model.tar.gz'

In [140]:
# Retrieve the Container image
container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework="sklearn", version="0.20.0") # 0.23-1"
model_data_new_loc = sklearn_estimators[0].model_data
#"s3://sagemaker-us-east-1-622343165275/scikit-learnestimator-2022-08-04-13-49-30/output/model.tar.gz"


In [141]:
from sagemaker.sklearn.model import SKLearnModel
sklearn_model_name = "DEMO-BATCH-SKLEARN-BYO-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

sklearn_model = SKLearnModel(
    name=sklearn_model_name,
    model_data=model_data_new_loc, #kmeans.model_data,
    role=role,
    sagemaker_session=sagemaker_session,
    entry_point="scripts/sklearn_preprocessor_batch.py",
    framework_version="0.20.0", #"0.23-1", #"0.20.0",
    image_uri=container,
    #source_dir="scripts",
)
print(sklearn_model.source_dir)
print(sklearn_model.entry_point)

None
scripts/sklearn_preprocessor_batch.py


In [142]:
batch_transformer = sklearn_model.transformer(
        instance_count=1,
        instance_type='ml.m4.xlarge',
        assemble_with='Line',
        accept='text/csv'
)
    


In [143]:
batch_transformer.transform(
    train_inputs[index], 
    content_type='text/csv', 
    wait=True, 
    logs=True
)

...................................[34mProcessing /opt/ml/code
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
[34mBuilding wheels for collected packages: sklearn-preprocessor-batch
  Building wheel for sklearn-preprocessor-batch (setup.py): started
  Building wheel for sklearn-preprocessor-batch (setup.py): finished with status 'done'
  Created wheel for sklearn-preprocessor-batch: filename=sklearn_preprocessor_batch-1.0.0-py2.py3-none-any.whl size=7596 sha256=4f7b3769240ec2e92c307f82093621ac1ae697b02a75aa80aba15362ce525450
  Stored in directory: /tmp/pip-ephem-wheel-cache-fusm0deg/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be5633

#### Now we list the data and view it

In [157]:
out_file_name = '{}/train.csv.out'.format(batch_transformer.output_path)
out_file_name 

's3://sagemaker-us-east-1-622343165275/DEMO-BATCH-SKLEARN-BYO-2022-08-04-17-18-2022-08-04-17-18-35-189/train.csv.out'

In [167]:
# - download the file
sagemaker.s3.S3Downloader().download(s3_uri=out_file_name, local_path='./data/output', sagemaker_session=sagemaker_session)  
output_df = pd.read_csv(filepath_or_buffer='./data/output/train.csv.out', header=None)
output_df.head(5)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,265736.0,-0.744936,-0.831786,-0.009418,-1.420279,0.106776,-1.312254,1.0,0.0,1.0,0.0
1,391170.0,-1.136893,0.188842,-1.422126,-1.420279,-0.045943,0.451792,0.0,1.0,1.0,0.0
2,438672.0,-0.450968,0.039764,1.403289,1.367619,-1.458591,-0.430231,1.0,0.0,0.0,1.0
3,646078.0,0.822892,1.401877,1.403289,-0.723305,1.099448,-0.430231,1.0,0.0,1.0,0.0
4,349224.0,-1.528849,-0.417606,1.403289,-0.02633,-0.809537,1.333814,1.0,0.0,0.0,1.0


#### Run the Transformer with a JOIN to the INPUT DATA set with column 'Year_built'

In [181]:
batch_transformer = sklearn_model.transformer(
    instance_count=1,
    instance_type='ml.m4.xlarge',
    assemble_with='Line',
    accept='text/csv',
    max_concurrent_transforms=8,
    strategy="MultiRecord",
    max_payload=6,
)

batch_transformer.transform(
    train_inputs[index], 
    content_type='text/csv', 
    input_filter=None,
    join_source="Input",
    output_filter='$[0,-11]',
    split_type='Line',
    wait=True, 
    logs=True
)

Using already existing model: DEMO-BATCH-SKLEARN-BYO-2022-08-04-17-18-24


..................................[34mProcessing /opt/ml/code
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
[34mBuilding wheels for collected packages: sklearn-preprocessor-batch
  Building wheel for sklearn-preprocessor-batch (setup.py): started
  Building wheel for sklearn-preprocessor-batch (setup.py): finished with status 'done'
  Created wheel for sklearn-preprocessor-batch: filename=sklearn_preprocessor_batch-1.0.0-py2.py3-none-any.whl size=7596 sha256=619f461e3ff117788d69f59010845e1ac426bae9455f6245e986ed66a13034fb
  Stored in directory: /tmp/pip-ephem-wheel-cache-_hn0ck04/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be56335

In [183]:
out_file_name = '{}/train.csv.out'.format(batch_transformer.output_path)
print(out_file_name)
# - download the file
sagemaker.s3.S3Downloader().download(s3_uri=out_file_name, local_path='./data/output', sagemaker_session=sagemaker_session)  
output_df2 = pd.read_csv(filepath_or_buffer='./data/output/train.csv.out', header=None)
output_df2.head(5)

s3://sagemaker-us-east-1-622343165275/DEMO-BATCH-SKLEARN-BYO-2022-08-04-17-18-2022-08-04-18-25-40-089/train.csv.out


Unnamed: 0,0,1
0,1987,265736.0
1,1983,391170.0
2,1990,438672.0
3,2003,646078.0
4,1979,349224.0


## Clean up<a id='CleanUp'></a>
Clean up the endpoint to avoid unneccessary costs.



In [None]:
#Delete the endpoint and underlying model
predictor.delete_model() 
predictor.delete_endpoint()
for t in preprocessor_transformers:
    t.delete_model()