# Amazon SageMaker Batch Transform: Associate prediction results with their corresponding input records
https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_batch_transform/batch_transform_associate_predictions_with_input/Batch%20Transform%20-%20breast%20cancer%20prediction%20with%20high%20level%20SDK.ipynb

## Setup

Let's start by specifying:

* The SageMaker role arn used to give training and batch transform access to your data. The snippet below will use the same role used by your SageMaker notebook instance. Otherwise, specify the full ARN of a role with the SageMakerFullAccess policy attached.
* The S3 bucket that you want to use for training and storing model objects.

In [1]:
import os
import boto3
import sagemaker

role = sagemaker.get_execution_role()
sess = sagemaker.Session()

bucket = sess.default_bucket()
prefix = "DEMO-breast-cancer-prediction-tf-batch-transform"

---
## Data sources

> https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29  
> https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

## Attributes
1) ID number  
2) Diagnosis (M = malignant(악성종양), B = benign(양성종양))  
3-32)  

Ten real-valued features are computed for each cell nucleus: (_mean: mean / _se: standard deviation / _worst: worst)

a) radius (mean of distances from center to points on the perimeter)  
b) texture (standard deviation of gray-scale values)  
c) perimeter  
d) area  
e) smoothness (local variation in radius lengths)  
f) compactness (perimeter^2 / area - 1.0)  
g) concavity (severity of concave portions of the contour)  
h) concave points (number of concave portions of the contour)  
i) symmetry  
j) fractal dimension ("coastline approximation" - 1)  

## Data preparation


Let's download the data and save it in the local folder with the name data.csv and take a look at it.

In [2]:
import pandas as pd
import numpy as np

s3 = boto3.client("s3")

filename = "wdbc.csv"
s3.download_file("sagemaker-sample-files", "datasets/tabular/breast_cancer/wdbc.csv", filename)
data = pd.read_csv(filename, header=None)

# specify columns extracted from wbdc.names
data.columns = [
    "id",
    "diagnosis",
    "radius_mean",
    "texture_mean",
    "perimeter_mean",
    "area_mean",
    "smoothness_mean",
    "compactness_mean",
    "concavity_mean",
    "concave points_mean",
    "symmetry_mean",
    "fractal_dimension_mean",
    "radius_se",
    "texture_se",
    "perimeter_se",
    "area_se",
    "smoothness_se",
    "compactness_se",
    "concavity_se",
    "concave points_se",
    "symmetry_se",
    "fractal_dimension_se",
    "radius_worst",
    "texture_worst",
    "perimeter_worst",
    "area_worst",
    "smoothness_worst",
    "compactness_worst",
    "concavity_worst",
    "concave points_worst",
    "symmetry_worst",
    "fractal_dimension_worst",
]

# save the data
data.to_csv("data.csv", sep=",", index=False)

data.head(8)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678
5,843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,...,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244
6,844359,M,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127,0.074,...,22.88,27.66,153.2,1606.0,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368
7,84458202,M,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366,0.05985,...,17.06,28.14,110.6,897.0,0.1654,0.3682,0.2678,0.1556,0.3196,0.1151


#### Key observations:
* The data has 569 observations and 32 columns.
* **The first field is the 'id' attribute that we will want to drop before batch inference and add to the final inference output next to the probability of malignancy.**
* Second field, 'diagnosis', is an indicator of the actual diagnosis ('M' = Malignant; 'B' = Benign).
* There are 30 other numeric features that we will use for training and inferencing.

Let's replace the M/B diagnosis with a 1/0 boolean value. 

In [3]:
data["diagnosis"] = data["diagnosis"].apply(lambda x: 0 if x == "M" else 1)
data.head(8)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,0,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,0,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,0,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678
5,843786,0,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,...,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244
6,844359,0,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127,0.074,...,22.88,27.66,153.2,1606.0,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368
7,84458202,0,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366,0.05985,...,17.06,28.14,110.6,897.0,0.1654,0.3682,0.2678,0.1556,0.3196,0.1151


Let's split the data as follows: 80% for training, 10% for validation and let's set 10% aside for our batch inference job. In addition, let's drop the 'id' field on the training set and validation set as 'id' is not a training feature. For our batch set however, we keep the 'id' feature. We'll want to filter it out prior to running our inferences so that the input data features match the ones of training set and then ultimately, we'll want to join it with inference result. We are however dropping the diagnosis attribute for the batch set since this is what we'll try to predict.

In [4]:
# data split in three sets, training, validation and batch inference
rand_split = np.random.rand(len(data))
train_list = rand_split < 0.8
val_list = (rand_split >= 0.8) & (rand_split < 0.9)
batch_list = rand_split >= 0.9

data_train = data[train_list].drop(["id"], axis=1)
data_train_x = data_train.drop(["diagnosis"], axis=1)
data_train_y = data_train["diagnosis"]

data_val = data[val_list].drop(["id"], axis=1)
data_val_x = data_val.drop(["diagnosis"], axis=1)
data_val_y = data_val["diagnosis"]

data_batch = data[batch_list].drop(["diagnosis"], axis=1)
data_batch_noID = data_batch.drop(["id"], axis=1)

In [5]:
train_dir = os.path.join(os.getcwd(), 'data/train')
os.makedirs(train_dir, exist_ok=True)

val_dir = os.path.join(os.getcwd(), 'data/val')
os.makedirs(val_dir, exist_ok=True)

batch_dir = os.path.join(os.getcwd(), 'data/batch')
os.makedirs(batch_dir, exist_ok=True)

In [62]:
np.save(os.path.join(train_dir, 'data_train_x.npy'), data_train_x.to_numpy())
np.save(os.path.join(train_dir, 'data_train_y.npy'), data_train_y.to_numpy())

np.save(os.path.join(val_dir, 'data_val_x.npy'), data_val_x.to_numpy())
np.save(os.path.join(val_dir, 'data_val_y.npy'), data_val_y.to_numpy())

# np.save(os.path.join(batch_dir, 'data_batch.npy'), data_train_x.to_numpy())
# np.save(os.path.join(batch_dir, 'data_batch_noID.npy'), data_train_y.to_numpy())

data_batch.to_csv(os.path.join(batch_dir, 'data_batch.csv'), sep=',', index=False, header=False)
data_batch_noID.to_csv(os.path.join(batch_dir, 'data_batch_noID.csv'), sep=',', index=False, header=False)

Let's upload those data sets in S3

In [63]:
!aws s3 sync ./data s3://{bucket}/{prefix}/data

upload: data/train/data_train_x.npy to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/train/data_train_x.npy
upload: data/val/data_val_x.npy to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/val/data_val_x.npy
upload: data/batch/data_batch_noID.csv to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/data_batch_noID.csv
upload: data/train/data_train_y.npy to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/train/data_train_y.npy
upload: data/batch/data_batch.csv to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/data_batch.csv
upload: data/val/data_val_y.npy to s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/val/data_val_y.npy


Verify if S3 upload has been completed as expected

In [64]:
!aws s3 ls s3://{bucket}/{prefix}/data --recursive

2022-01-04 12:23:44      12508 DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/.ipynb_checkpoints/data_batch-checkpoint.csv
2022-01-04 13:15:31      11431 DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/.ipynb_checkpoints/data_batch_noID-checkpoint.csv
2022-01-04 16:20:59      13290 DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/data_batch.csv
2022-01-04 16:20:59      12841 DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/data_batch_noID.csv
2022-01-04 16:20:59     106448 DEMO-breast-cancer-prediction-tf-batch-transform/data/train/data_train_x.npy
2022-01-04 16:20:59       3672 DEMO-breast-cancer-prediction-tf-batch-transform/data/train/data_train_y.npy
2022-01-04 16:20:59      15728 DEMO-breast-cancer-prediction-tf-batch-transform/data/val/data_val_x.npy
2022-01-04 16:20:59        648 DEMO-breast-cancer-prediction-tf-batch-transform/data/val/data_val_y.npy


---

## Training job and model creation

In [126]:
%%writefile source_dir/tf_batch.py
import argparse
import numpy as np
import os
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

def parse_args():
    
    parser = argparse.ArgumentParser()

    # 사용자가 전달한 하이퍼 파라미터를 command-line argument로 전달받아 사용함
    parser.add_argument('--epochs', type=int, default=1)
    parser.add_argument('--batch_size', type=int, default=64)
    parser.add_argument('--learning_rate', type=float, default=0.1)
    
    # data directories
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION'))
    
    # model directory: we will use the default set by SageMaker, /opt/ml/model
    parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    
    return parser.parse_known_args()

    
def get_train_data(train_dir):
    
    x_train = np.load(os.path.join(train_dir, 'data_train_x.npy'))
    y_train = np.load(os.path.join(train_dir, 'data_train_y.npy'))
    print('x_train', x_train.shape,'y_train', y_train.shape)

    return x_train, y_train


def get_validation_data(validation_dir):
    
    x_validation = np.load(os.path.join(validation_dir, 'data_val_x.npy'))
    y_validation = np.load(os.path.join(validation_dir, 'data_val_y.npy'))
    print('x_validation', x_validation.shape,'y_validation', y_validation.shape)

    return x_validation, y_validation

if __name__ == "__main__":
    args, _ = parse_args()
    
    x_train, y_train = get_train_data(args.train)
    x_validation, y_validation = get_validation_data(args.validation)
    
    device = '/cpu:0' 
    print(device)
    batch_size = args.batch_size
    epochs = args.epochs
    learning_rate = args.learning_rate
    print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate))

    with tf.device(device):
        model = tf.keras.Sequential([
                # input layer
                tf.keras.layers.Dense(30, input_shape=(30,), activation='relu'),
                tf.keras.layers.Dense(15, activation='relu'),
                tf.keras.layers.Dense(10,activation = 'relu'),
                # we use sigmoid for binary output
                # output layer
                tf.keras.layers.Dense(1, activation='sigmoid')
            ]
        )

        model.summary()
        
        model.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['accuracy', 'mse'])    
        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
                  validation_data=(x_validation, y_validation))

        # evaluate on test set
        scores = model.evaluate(x_validation, y_validation, batch_size, verbose=2)
        print("\nTest MSE :", scores)
        
        model.save(args.model_dir + '/1')

Overwriting source_dir/tf_batch.py


### Local mode training

In [132]:
from sagemaker.tensorflow import TensorFlow

model_dir = '/opt/ml/model'
instance_type = 'local'
hyperparameters = {'epochs': 5, 'batch_size': 10, 'learning_rate': 0.001}

local_estimator = TensorFlow(source_dir='source_dir',
                             entry_point='tf_batch.py',
                             model_dir=model_dir,
                             instance_type=instance_type,
                             instance_count=1,
                             hyperparameters=hyperparameters,
                             role=sagemaker.get_execution_role(),
                             base_job_name='tf-batch-transform',
                             framework_version='2.1',
                             py_version='py3')

In [133]:
inputs = {'train': f'file://{train_dir}',
          'validation': f'file://{val_dir}'}

local_estimator.fit(inputs)

Creating gidya6i13m-algo-1-e3pb9 ... 
Creating gidya6i13m-algo-1-e3pb9 ... done
Attaching to gidya6i13m-algo-1-e3pb9
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,607 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,614 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,767 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,783 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,797 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mgidya6i13m-algo-1-e3pb9 |[0m 2022-01-04 16:49:56,808 sagemaker-training-toolkit INFO     Invoking user script
[36mgidya6i13m-algo-1-e3pb9 |[0m 
[36mgidy

In [134]:
local_predictor = local_estimator.deploy(instance_type='local', initial_instance_count=1)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


Attaching to psel7oqy88-algo-1-r6i9n
[36mpsel7oqy88-algo-1-r6i9n |[0m INFO:__main__:starting services
[36mpsel7oqy88-algo-1-r6i9n |[0m INFO:tfs_utils:using default model name: model
[36mpsel7oqy88-algo-1-r6i9n |[0m INFO:tfs_utils:tensorflow serving model config: 
[36mpsel7oqy88-algo-1-r6i9n |[0m model_config_list: {
[36mpsel7oqy88-algo-1-r6i9n |[0m   config: {
[36mpsel7oqy88-algo-1-r6i9n |[0m     name: "model",
[36mpsel7oqy88-algo-1-r6i9n |[0m     base_path: "/opt/ml/model",
[36mpsel7oqy88-algo-1-r6i9n |[0m     model_platform: "tensorflow"
[36mpsel7oqy88-algo-1-r6i9n |[0m   }
[36mpsel7oqy88-algo-1-r6i9n |[0m }
[36mpsel7oqy88-algo-1-r6i9n |[0m 
[36mpsel7oqy88-algo-1-r6i9n |[0m 
[36mpsel7oqy88-algo-1-r6i9n |[0m INFO:__main__:using default model name: model
[36mpsel7oqy88-algo-1-r6i9n |[0m INFO:__main__:tensorflow serving model config: 
[36mpsel7oqy88-algo-1-r6i9n |[0m model_config_list: {
[36mpsel7oqy88-algo-1-r6i9n |[0m   config: {
[36mpsel7oqy88-algo-1-

In [135]:
input = {
  'instances': np.random.rand(30).reshape(-1, 30)
}
result = local_predictor.predict(input)
result

[36mpsel7oqy88-algo-1-r6i9n |[0m 172.18.0.1 - - [04/Jan/2022:16:50:06 +0000] "POST /invocations HTTP/1.1" 200 43 "-" "python-urllib3/1.26.7"


{'predictions': [[0.588115573]]}

In [136]:
local_predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


### Managed training

In [137]:
from sagemaker.tensorflow import TensorFlow

model_dir = '/opt/ml/model'
instance_type = 'ml.c5.xlarge'
hyperparameters = {'epochs': 200, 'batch_size': 10, 'learning_rate': 0.001}

estimator = TensorFlow(source_dir='source_dir',
                       entry_point='tf_batch.py',
                       model_dir=model_dir,
                       instance_type=instance_type,
                       instance_count=1,
                       hyperparameters=hyperparameters,
                       role=sagemaker.get_execution_role(),
                       base_job_name='tf-batch-transform',
                       framework_version='2.1',
                       py_version='py3')

In [138]:
inputs = {'train': f's3://{bucket}/{prefix}/data/train',
          'validation': f's3://{bucket}/{prefix}/data/val'}

estimator.fit(inputs, wait=True)

2022-01-04 16:50:33 Starting - Starting the training job...
2022-01-04 16:50:35 Starting - Launching requested ML instancesProfilerReport-1641315032: InProgress
......
2022-01-04 16:51:46 Starting - Preparing the instances for training......
2022-01-04 16:53:01 Downloading - Downloading input data
2022-01-04 16:53:01 Training - Downloading the training image...
2022-01-04 16:53:26 Training - Training image download completed. Training in progress.[34m2022-01-04 16:53:19,656 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2022-01-04 16:53:19,662 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-04 16:53:19,991 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-04 16:53:20,006 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-04 16:53:20,020 sagemaker-training-toolkit INFO     No GPUs 

---
## Batch Transform

In SageMaker Batch Transform, we introduced 3 new attributes - **input_filter, join_source and output_filter**. In the below cell, we use the SageMaker Python SDK to kick-off several Batch Transform jobs using different configurations of these 3 new attributes. Please refer to this page to learn more about how to use them.

#### Prepare `inference.py`
https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html?highlight=inference.py

In [139]:
%%writefile source_dir/inference.py
import json

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    Args:
        data (obj): the request data, in format of dict or string
        context (Context): an object containing request and configuration details
    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """
    if context.request_content_type == 'text/csv':
        request = data.read().decode('utf-8').rstrip('\n')
        request = [float(x) for x in request.split(',')]
        request.pop(0) # Remove "ID" column
        
        return json.dumps({
            'instances': [request]
        })

    raise ValueError('{{"error": "unsupported content type {}"}}'.format(
        context.request_content_type or "unknown"))


def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    Args:
        data (obj): the TensorFlow serving response
        context (Context): an object containing request and configuration details
    Returns:
        (bytes, string): data to return to client, response content type
    """
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    
    prediction = json.loads(data.content.decode("utf-8"))['predictions'][0][0]
    output = json.dumps({'predictions': prediction})

    return output, response_content_type

Overwriting source_dir/inference.py


#### Create TensorflowModel from saved model artifacts

In [140]:
model_artefect_s3_location = estimator.model_data  #'s3://BUCKET/PREFIX/model.tar.gz'
model_artefect_s3_location

's3://sagemaker-ap-northeast-2-889750940888/tf-batch-transform-2022-01-04-16-50-32-853/output/model.tar.gz'

In [141]:
from sagemaker.tensorflow import TensorFlowModel

tf_model = TensorFlowModel(
    model_data=model_artefect_s3_location,
    role=role,
    framework_version="2.1.0",
    source_dir="source_dir",
    entry_point="inference.py",
)

sm_transformer = tf_model.transformer(
    instance_count=1,
    instance_type='ml.c5.xlarge',
#     instance_type='local',
    accept='text/csv',
    strategy='SingleRecord',   # MultiRecord|SingleRecord
    assemble_with='Line',
    output_path='s3://{}/{}/batch_transform'.format(bucket, prefix)
)

#### Batch inference

In [142]:
input_location = 's3://{}/{}/data/batch/data_batch.csv'.format(bucket, prefix)  # use input data without ID column
# input_location = 's3://sinjoonk-demo-seoul/temp/SampleData.csv'
input_location

's3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/data/batch/data_batch.csv'

In [143]:
!aws s3 ls {input_location}

2022-01-04 16:20:59      13290 data_batch.csv


In [144]:
sm_transformer.transform(
    data=input_location,
    data_type='S3Prefix',
    content_type='text/csv',
    split_type='Line',         # None(default)|LIne|RecordID|TFRecord
#     input_filter='$[2:]',
#     join_source='None',
#     output_filter='$',
    wait=True
)

........................[34mINFO:__main__:starting services[0m
[34mINFO:__main__:using default model name: model[0m
[34mINFO:__main__:tensorflow serving model config: [0m
[34mmodel_config_list: {
  config: {
    name: "model",
    base_path: "/opt/ml/model",
    model_platform: "tensorflow"
  }[0m
[34m}[0m
[34mINFO:__main__:nginx config: [0m
[34mload_module modules/ngx_http_js_module.so;[0m
[34mworker_processes auto;[0m
[34mdaemon off;[0m
[34mpid /tmp/nginx.pid;[0m
[34merror_log  /dev/stderr error;[0m
[34mworker_rlimit_nofile 4096;[0m
[34mevents {
  worker_connections 2048;[0m
[34m}[0m
[34mhttp {
  include /etc/nginx/mime.types;
  default_type application/json;
  access_log /dev/stdout combined;
  js_include tensorflow-serving.js;
  upstream tfs_upstream {
    server localhost:10001;
  }
  upstream gunicorn_upstream {
    server unix:/tmp/gunicorn.sock fail_timeout=1;
  }
  server {
    listen 8080 deferred;
    client_max_body_size 0;
    client_body_buffe

#### Check Batch Transform results

In [145]:
batch_results = sm_transformer.output_path
batch_results

's3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/batch_transform'

In [146]:
batch_result_dir = os.path.join(os.getcwd(), 'batch')
os.makedirs(batch_result_dir, exist_ok=True)

In [147]:
!aws s3 cp {batch_results} {batch_result_dir} --recursive

download: s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/batch_transform/data_batch.csv.out to batch/data_batch.csv.out
download: s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/batch_transform/tensorflow-inference-2022-01-04-15-20-24-855/data_batch.csv.out to batch/tensorflow-inference-2022-01-04-15-20-24-855/data_batch.csv.out
download: s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/batch_transform/tensorflow-inference-2022-01-04-15-21-08-704/data_batch.csv.out to batch/tensorflow-inference-2022-01-04-15-21-08-704/data_batch.csv.out
download: s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cancer-prediction-tf-batch-transform/batch_transform/tensorflow-inference-2022-01-04-15-21-46-442/data_batch.csv.out to batch/tensorflow-inference-2022-01-04-15-21-46-442/data_batch.csv.out
download: s3://sagemaker-ap-northeast-2-889750940888/DEMO-breast-cance