## Compute Instance

In this sample notebook, we will create a super basic model leveraging our Compute Instance.

Leveraging an Azure ML Compute Instance we can train a model, and leverage it for prediction  

After the model is created, what were the metrics of the model, how well did our model do on predictions, what data set was used for creating the model?  We don't have **traceablity** - but that is where Azure ML will come into play.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve


diabetes = pd.read_parquet('./data/diabetes.parquet')
diabetes.info()

In [None]:
diabetes.describe()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline 

# Plot and log the count of diabetic vs non-diabetic patients
diabetes = pd.read_parquet('./data/diabetes.parquet')

diabetic_counts = diabetes['Diabetic'].value_counts()
fig = plt.figure(figsize=(6,6))
ax = fig.gca()    
diabetic_counts.plot.bar(ax = ax) 
ax.set_title('Patients with Diabetes') 
ax.set_xlabel('Diagnosis') 
ax.set_ylabel('Patients')
plt.show()

In [None]:

# Get the experiment run context

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_parquet('./data/diabetes.parquet')

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))


# Save the trained model in the outputs folder
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')



In [None]:
predictions = model.predict(X_test)

In [None]:
# Calculate model performance metrics
from sklearn.metrics import classification_report

report = classification_report(y_test, predictions)
print(report)

In [None]:
# Confirm model can be reloaded from disk and will generate identical predictions

filename = 'outputs/diabetes_model.pkl'
loaded_model = joblib.load(filename)

y_hat = loaded_model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)


y_scores = loaded_model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
