# Model creation workflow

This notebook shows a sample workflow on how to create and train (with data from Azure blob storage) a machine learning model and upload it to Azure blob storage, where it can be consumed by the pipedesign api.

In [1]:
import os
import json
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

In [2]:
from src.ml import preprocessor
from src.ml.features import pipe_features
from src.infrastructure import blobhandler

### Downloading training data from Azure blob storage

In [3]:
handler = blobhandler.BlobHandler()
proc = preprocessor.Preprocessor()
blobs = handler.download_blobs(os.environ["CONTAINER_NAME_DATA"], number_of_blobs=1000)
training_data = proc.create_training_data(blobs)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


### Training a model and making a sample prediction

In [4]:
X_train, X_test, y_train, y_test = train_test_split(training_data[pipe_features],
    pd.factorize(training_data["viability.viable"])[0], test_size=0.2, random_state=42)

clf = GaussianNB()
clf.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [5]:
with open("data/json/0a234fea9682454facab730c0a7f83f0.json") as json_file:
    pipedesign_json = json.load(json_file)
    
pipedesign_sample = proc.flatten_pipesegments(pipedesign_json)[pipe_features]

In [6]:
label = clf.predict(pipedesign_sample)
prob = clf.predict_proba(pipedesign_sample)

In [7]:
print("Label: {}, confidence: {}".format(label, prob))

Label: [1], confidence: [[7.89763809e-04 9.99210236e-01]]


### Uploading a model to Azure blob storage (to be consumed by api)

The handler takes care of pickling the model such that it can be stored as a blob. If `blob_name` already exists in the blob container, the existing model will be overwritten.

In [8]:
upload_success = handler.model_to_azure_blob(
    model=clf,
    container_name=os.environ["CONTAINER_NAME_MODELS"],
    blob_name="test_model_1_do_not_delete"
)

In [9]:
print(upload_success)

(True,)
