# Building a Predictive Maintenance Solution Using AWS AutoML and No-code Tools 

# Part 2. Making the inference

It is the second notebook of the two ones accompanying our blog-article with the above title. 

Here we show how to use the model created and deployed using the previous notebook.
Again, the workflow implemented here requires no in-depth knowledge in Machine Learning and Data Science. The maior stage of this workflow is the usage of the <b> AWS SageMaker Autopilot </b> service.

We demonstrate two variants of the usage of the model in production.

1. <b> Batch mode predictions.</b> This mode implies that the values of the features are organized in a table, e.g. in a CSV-file. Each row contains a full set of the features, e.g. readings of various sensors. The predictions of the model are written into another CSV-file. This mode does not require a deployed model (an endpoint), but the predictions are performed with a noticeable delay.

2. <b> Real-time predictions using an endpoint </b> This mode is well suitable for real-time online predictions, especially for fully automated inference.

For batch prediction we have to indicate the location (S3 location) of the file we want the prediction for. In our case we use csv files. Also, we have to indicate the best model, that will be employed for batch transform job, which is an arctifact form the AutoPilot experiment. The results of the prediction will be stored in the direction stated in output path (S3 location). There might be the problem if we try to overwrite already existing file in S3. 

By using real time endpoint we just need to make request to the real time enpoint that was created after AutoPilot experiment.

In [None]:
# Import all the necessary modules
import sagemaker
from sagemaker import AutoML
import pandas as pd
import numpy as np
import boto3
from tqdm import tqdm
import io
import json
import itertools
from collections import OrderedDict

# Batch prediction

In [None]:
# Batch prediction can be done with the usage of sagemaker API

AUTO_ML_JOB_NAME = 'test-notebook-experiment-sm05'  # define the name of experiment
automl = AutoML.attach(auto_ml_job_name=AUTO_ML_JOB_NAME)  # initiate automl object with given experiment name

In [None]:
best_candidate = automl.describe_auto_ml_job()['BestCandidate'] # select the best candidate
best_candidate_name = best_candidate['CandidateName']
OUTPUT_PATH = "s3://anomaly-detection-bucket-test/datasets/turbofan_nasa_data" # define the directory, where output of prediction will be stored

In [None]:
BATCH_INPUT = "s3://anomaly-detection-bucket-test/datasets/turbofan_nasa_data/turbofan_sensors-orig_test_with_time.csv" # input file, for predictions

In [None]:
# Here is basically where we create batch predictions.
model = automl.create_model(name=best_candidate_name, candidate=best_candidate)
transformer = model.transformer(instance_count=1, instance_type='ml.m5.xlarge', assemble_with='Line', output_path=OUTPUT_PATH)
transformer.transform(data=BATCH_INPUT, split_type='Line', content_type='text/csv', wait=True) # the output is stashed in OUTPUT_PATH,
# with the same name as BATCH_INPUT file, but with out extension. It contains only one column with predictions.
# Need to be carefull and check whether file with such name already exists in S3. If it does, the error may occure. Better to remove .out file before predictions. 

..................................[34m2022-09-28 10:56:41,004 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[34m2022-09-28 10:56:41,007 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[34m2022-09-28 10:56:41,008 INFO - sagemaker-containers - nginx config: [0m
[34mworker_processes auto;[0m
[34mdaemon off;[0m
[34mpid /tmp/nginx.pid;[0m
[34merror_log  /dev/stderr;[0m
[34mworker_rlimit_nofile 4096;[0m
[34mevents {
  worker_connections 2048;[0m
[35m2022-09-28 10:56:41,004 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[35m2022-09-28 10:56:41,007 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[35m2022-09-28 10:56:41,008 INFO - sagemaker-containers - nginx config: [0m
[35mworker_processes auto;[0m
[35mdaemon off;[0m
[35mpid /tmp/nginx.pid;[0m
[35merror_log  /dev/stderr;[0m
[35mworker_rlimit_nofile 4096;[0m
[35mevents {
  worker_

In [None]:
# Checking our predictions

data = pd.read_csv(f"{BATCH_INPUT}.out") 
data.head()

Unnamed: 0,89.37210845947266
0,147.091446
1,145.016769
2,146.628479
3,144.053741
4,144.532181


In [None]:
# Here we just make the results more readable: predictions + features
prediction_result = pd.read_csv(BATCH_INPUT)
prediction_result['predicted_rul'] = data.values

In [None]:
prediction_result

Unnamed: 0,time,sensor_2,sensor_3,sensor_4,sensor_7,sensor_8,sensor_9,sensor_11,sensor_12,sensor_13,sensor_14,sensor_15,sensor_17,sensor_20,sensor_21,predicted_rul
0,1,642.38,1589.49,1395.48,554.76,2387.97,9071.27,47.24,522.19,2387.99,8141.69,8.4081,390,38.97,23.4073,147.091446
1,2,642.57,1583.11,1395.97,553.97,2388.00,9078.21,47.00,522.30,2388.02,8148.24,8.4216,391,38.85,23.5043,145.016769
2,3,642.25,1589.44,1397.74,554.95,2387.95,9063.37,47.18,522.36,2387.98,8148.83,8.4258,392,39.07,23.4113,146.628479
3,4,642.31,1585.59,1397.85,553.30,2388.06,9068.16,47.07,522.09,2388.06,8150.30,8.4175,392,39.12,23.4616,144.053741
4,5,642.25,1587.23,1402.54,554.64,2388.04,9066.99,47.31,522.00,2388.01,8149.40,8.4099,391,38.94,23.4781,144.532181
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1961,213,643.49,1604.45,1426.52,550.85,2388.21,9085.94,48.16,519.63,2388.24,8153.07,8.5340,397,38.25,23.1600,0.180914
1962,214,643.24,1596.72,1426.05,551.15,2388.21,9081.74,47.99,519.70,2388.27,8156.26,8.5041,396,38.54,23.1833,0.815821
1963,215,643.39,1603.19,1430.73,551.09,2388.20,9085.55,48.18,519.49,2388.16,8151.41,8.5356,396,38.30,23.0166,0.256143
1964,216,643.60,1604.76,1427.66,551.46,2388.26,9074.32,48.02,519.68,2388.21,8155.37,8.5259,395,38.36,23.0164,0.381913


In [None]:
# We can upload the file back to S3
s3_client = boto3.client('s3')
prediction_result.to_csv('data/prediction_result.csv')
response = s3_client.upload_file('data/prediction_result.csv', 'anomaly-detection-bucket-test', f'{OUTPUT_PATH}/prediction.csv')

# Prediction using endpoint

In [None]:
# Check the data for prediction
data_for_prediction = pd.read_csv(BATCH_INPUT)
data_for_prediction.head()

Unnamed: 0,time,sensor_2,sensor_3,sensor_4,sensor_7,sensor_8,sensor_9,sensor_11,sensor_12,sensor_13,sensor_14,sensor_15,sensor_17,sensor_20,sensor_21
0,1,642.38,1589.49,1395.48,554.76,2387.97,9071.27,47.24,522.19,2387.99,8141.69,8.4081,390,38.97,23.4073
1,2,642.57,1583.11,1395.97,553.97,2388.0,9078.21,47.0,522.3,2388.02,8148.24,8.4216,391,38.85,23.5043
2,3,642.25,1589.44,1397.74,554.95,2387.95,9063.37,47.18,522.36,2387.98,8148.83,8.4258,392,39.07,23.4113
3,4,642.31,1585.59,1397.85,553.3,2388.06,9068.16,47.07,522.09,2388.06,8150.3,8.4175,392,39.12,23.4616
4,5,642.25,1587.23,1402.54,554.64,2388.04,9066.99,47.31,522.0,2388.01,8149.4,8.4099,391,38.94,23.4781


In [None]:
ENDPOINT_NAME = 'test-notebook-autopilot-experiment-endpoint' # name of real time endpoint for serving the model
predictor = sagemaker.predictor.Predictor(endpoint_name=ENDPOINT_NAME,
                                          sagemaker_session=sagemaker.Session(),
                                          serializer=sagemaker.serializers.CSVSerializer(),
                                          deserializer=sagemaker.deserializers.CSVDeserializer())

In [None]:
# Now, we are going to use endpoint to get predictions. Also, we format the output, so it is 
predictions = predictor.predict(data=data_for_prediction.values)
predictions = list(itertools.chain.from_iterable(predictions))
predictions = list(map(lambda x: float(x), predictions))
print(predictions[:25]) # just check the output

[147.09144592285156, 145.0167694091797, 146.62847900390625, 144.05374145507812, 144.5321807861328, 142.5323944091797, 144.07423400878906, 144.27813720703125, 142.218017578125, 143.81121826171875, 142.1516571044922, 141.72940063476562, 146.11703491210938, 144.21458435058594, 141.76951599121094, 141.5436553955078, 142.25294494628906, 141.0680694580078, 141.89476013183594, 141.85653686523438, 138.91421508789062, 141.4103240966797, 142.88807678222656, 140.606689453125, 138.7768096923828]


In [None]:
# again, we can add our predictions to the features DataFrame in order to make the data more readable.
prediction_result = pd.read_csv(BATCH_INPUT)
prediction_result['predicted_rul'] = predictions
prediction_result.head()

Unnamed: 0,time,sensor_2,sensor_3,sensor_4,sensor_7,sensor_8,sensor_9,sensor_11,sensor_12,sensor_13,sensor_14,sensor_15,sensor_17,sensor_20,sensor_21,predicted_rul
0,1,642.38,1589.49,1395.48,554.76,2387.97,9071.27,47.24,522.19,2387.99,8141.69,8.4081,390,38.97,23.4073,147.091446
1,2,642.57,1583.11,1395.97,553.97,2388.0,9078.21,47.0,522.3,2388.02,8148.24,8.4216,391,38.85,23.5043,145.016769
2,3,642.25,1589.44,1397.74,554.95,2387.95,9063.37,47.18,522.36,2387.98,8148.83,8.4258,392,39.07,23.4113,146.628479
3,4,642.31,1585.59,1397.85,553.3,2388.06,9068.16,47.07,522.09,2388.06,8150.3,8.4175,392,39.12,23.4616,144.053741
4,5,642.25,1587.23,1402.54,554.64,2388.04,9066.99,47.31,522.0,2388.01,8149.4,8.4099,391,38.94,23.4781,144.532181


## One-time request prediction using endpoint

This is just another interface for getting the prediction. It might be useful, for example, for experimentation.

We manually enter the feature values for a single datapoint, i.e for a single engine and for a single time value. The order of features is very crucial for correct predictions.

In [None]:
# This is just another format of getting the prediction. Might be usefull just for experimentation.
# We manually enter the feature values for one time prediction. Order of features is very crucial for correct predictions.
request_json = OrderedDict()
request_json['time'] = 1 
request_json['sensor_2'] = 642.38
request_json['sensor_3'] = 1589.49
request_json['sensor_4'] = 1395.48
request_json['sensor_7'] = 554.76 
request_json['sensor_8'] = 2387.97
request_json['sensor_9'] = 9071.27 
request_json['sensor_11'] = 47.24
request_json['sensor_12'] = 522.19 
request_json['sensor_13'] = 2387.99
request_json['sensor_14'] = 8141.69
request_json['sensor_15'] = 8.4081 
request_json['sensor_17'] = 390 
request_json['sensor_20'] = 38.97 
request_json['sensor_21'] = 23.4073

In [None]:
predictions = predictor.predict(data=list(request_json.values()))
predictions = list(itertools.chain.from_iterable(predictions))
predictions = list(map(lambda x: float(x), predictions))
print(predictions) # that was just first row from our test dataset. We can double check it by just looking at previous results.

[147.09144592285156]
