## Batch Transform

Now we are going to use "today's" features to create predictions, that the business unit is going to use as an input for promotions. 

For this, we are going to deploy the model created on the best training job from the hyperparameter tunning job and use the resulting endpoint for inference. 

In [None]:
import sagemaker
import boto3
from sagemaker.estimator import Estimator
from sagemaker.tuner import HyperparameterTuner
import numpy as np                                # For matrix operations and numerical processing
import pandas as pd                               # For munging tabular data
import os 
import time
from sagemaker.predictor import csv_serializer,RealTimePredictor

# take the best training job from notebook #PROD2
best_training_job = 'hpo-invoice-pred-191009-1624-002-2086aff7'
role = sagemaker.get_execution_role()
prefix = 'predictions'

In [None]:
%store -r bucket

In [None]:
df = pd.read_csv('to_predict.csv',header=None)

In [None]:
df.shape

In [None]:
id_reseller = pd.read_csv('id_reseller_to_predict.csv',header=None)[0]

In [None]:
id_reseller.shape

Make sure you stored the best_job variable in <a href='./PROD2.ModelTrain.ipynb'>notebook 2 </a>

In [None]:
%store -r best_job

In [None]:
model = Estimator.attach(best_job)

In [None]:
model_predictor = model.deploy(initial_instance_count=1,
                            instance_type='ml.t2.medium')

In [None]:
# In case you interrupt the notebook, you can create the predictor using the endpoint name.
#model_predictor = RealTimePredictor('########')

In [None]:
model_predictor.content_type = 'text/csv'
model_predictor.serializer = csv_serializer
model_predictor.deserializer = None

In [None]:
def predict(data, rows=500):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, model_predictor.predict(array).decode('utf-8')])

    return np.fromstring(predictions[1:], sep=',')

predictions = predict(df.values)

In [None]:
predictions.shape

In [None]:
df_predictions  = pd.DataFrame({'id_reseller':id_reseller,'prediction':predictions})

In [None]:
df_predictions.head()

Finally we upload predictions to S3

In [None]:
df_predictions.to_csv('predictions.csv',index=False)

In [None]:
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'predictions.csv')).upload_file('predictions.csv')