Import some basic libraries.
* Pandas - provided data frames
* matplotlib.pyplot - plotting support

Use Magic %matplotlib to display graphics inline instead of in a popup window.


In [53]:
import pandas as pd # pandas is a dataframe library
import boto3
import io
import matplotlib.pyplot as plt      # matplotlib.pyplot plots data

%matplotlib inline

# Using your trained Model

## Load trained model from file

In [54]:
import joblib 
import pickle
import tempfile

# Read from local folder
# lr_cv_model = joblib.load("./model/diabetes-trained-model.pkl")

s3_model_bucket = 'demo-predict-diabetes'
model_file = 'model/diabetes-model.pkl'

s3client = boto3.client('s3')
response = s3client.get_object(Bucket=s3_model_bucket, Key=model_file)
lr_cv_model = pickle.loads(response['Body'].read())

    
#body = response['Body']
#lr_cv_model = pickle.loads(body.read())


## Test Prediction on data

Once the model is loaded we can use it to predict on some data.  In this case the data file contains a few rows from the original Pima CSV file.


In [55]:
# get data from local truncated data file
# df_predict = pd.read_csv("./data/feature-data.csv")

feature_data_bucket = 'demo-predict-diabetes-feature-data'
feature_file_key = 'data/feature-data.csv'

s3uri = 's3://{}/{}'.format(feature_data_bucket, feature_file_key)
df_predict = pd.read_csv(s3uri)

print(df_predict.shape)

(5, 10)


In [56]:
df_predict

Unnamed: 0,num_preg,glucose_conc,diastolic_bp,thickness,insulin,bmi,diab_pred,age,skin,diabetes
0,1,89,66,23,94,28.1,0.167,21,0.9062,False
1,2,197,70,45,543,30.5,0.158,53,1.773,True
2,7,100,0,0,0,30.0,0.484,32,0.0,True
3,1,103,30,38,83,43.3,0.183,33,1.4972,False
4,1,93,70,31,0,30.4,0.315,23,1.2214,False


The truncated file contained 4 rows from the original CSV.

Data is the same is in same format as the original CSV file's data.  Therefore, just like the original data, we need to transform it before we can make predictions on the data.  

Note: If the data had been previously "cleaned up" this would not be necessary.

We do this by executed the same transformations as we did to the original data

Start by dropping the "skin" which is the same as thickness, with different units.

In [57]:
del df_predict['skin']
df_predict

Unnamed: 0,num_preg,glucose_conc,diastolic_bp,thickness,insulin,bmi,diab_pred,age,diabetes
0,1,89,66,23,94,28.1,0.167,21,False
1,2,197,70,45,543,30.5,0.158,53,True
2,7,100,0,0,0,30.0,0.484,32,True
3,1,103,30,38,83,43.3,0.183,33,False
4,1,93,70,31,0,30.4,0.315,23,False


We need to drop the diabetes column since that is what we are predicting.  
Store data without the column with the prefix X as we did with the X_train and X_test to indicate that it contains only the columns we are prediction.

In [58]:
X_predict = df_predict
del X_predict['diabetes']
X_predict

Unnamed: 0,num_preg,glucose_conc,diastolic_bp,thickness,insulin,bmi,diab_pred,age
0,1,89,66,23,94,28.1,0.167,21
1,2,197,70,45,543,30.5,0.158,53
2,7,100,0,0,0,30.0,0.484,32
3,1,103,30,38,83,43.3,0.183,33
4,1,93,70,31,0,30.4,0.315,23


Data has 0 in places it should not.  

Just like test or test datasets we will use imputation to fix this.

In [59]:
#Impute with mean all 0 readings
from sklearn.impute import SimpleImputer
fill_0 = SimpleImputer(missing_values=0, strategy="mean") #, axis=0)
X_predict = fill_0.fit_transform(X_predict)

At this point our data is ready to be used for prediction.  

## Predict diabetes with the prediction data.  Returns 1 if True, 0 if false

In [60]:
lr_cv_model.predict(X_predict)

array([0, 1, 0, 0, 0])