## A end-to-end MLOps pipeline using the following tools
1. Prefect (workflow)
2. MLFlow (experiment tracking)
3. Seldon (Model serving)

### We'll simply be using scikit-learn and some common data processing libraries.

In [9]:
import pandas as pd
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import f1_score, precision_score, accuracy_score
from sklearn.model_selection import train_test_split
import sklearn.metrics as metrics

import warnings
warnings.filterwarnings('ignore')

## Fetch Data

In [10]:
def fetch_data():
    csv_url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
    # create list of column names
    col_name = ['sepal_length', 'sepal-width', 'petal-length', 'petal-width', 'class']
    data = pd.read_csv(csv_url, names=col_name)
    return data

data = fetch_data()
data.head()

Unnamed: 0,sepal_length,sepal-width,petal-length,petal-width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## Training
Setting up the training and the evaluation metrics for the model in this case KNN

In [3]:
def eval_metrics(actual, pred, average=None):
   # Precision = metrics.precision_score(actual, pred)
    #F1_score = metrics.f1_score(actual, pred)
    Accuracy = metrics.accuracy_score(actual, pred)
    return Accuracy

def train_model(data, leaf_size=5, n_neighbors=6):
    train, test = train_test_split(data, test_size=30)
   
    # the predicted column is the class which will be dropped from the x-train values 
    train_x = train.drop(["class"], axis=1)
    test_x = test.drop(["class"], axis=1)
    train_y = train[["class"]]
    test_y = test[["class"]]
    
    KNN_model = KNeighborsClassifier(leaf_size=leaf_size, n_neighbors=n_neighbors)
    KNN_model.fit(train_x, train_y)
    
    predicted_classes = KNN_model.predict(test_x)
    Accuracy = eval_metrics(test_y, predicted_classes)
  #  Precision, F1_score, Accuracy = eval_metrics(test_y, predicted_classes)
    
     # Print out the accuracy metrics
    print("KNearestClassifier model (leaf_size=%f, n_neighbors=%f):" % (leaf_size, n_neighbors))
  #  print("  Precision: %s" % Precision)
  #  print("  F1 Score: %s" % F1_score)
    print("  Accuracy: %s" % Accuracy)
    
data = fetch_data()
train_model(data)

KNearestClassifier model (leaf_size=5.000000, n_neighbors=6.000000):
  Accuracy: 1.0


### Experiment Tracking with MLFlow
The goal is to modify the training functions above to include tracking with MLFlow

### Starting the MLFlow server
In a terminal, run the following;  
mlflow server  
This should start the tracking server at http://127.0.0.1:5000

In [4]:
import mlflow


def train_model(data, mlflow_experiment_id, leaf_size=5, n_neighbors=6):
    mlflow.set_tracking_uri("http://127.0.0.1:5000") #Localhost

    train, test = train_test_split(data, test_size=30)

    # The predicted column is "class" 
    train_x = train.drop(["class"], axis=1)
    test_x = test.drop(["class"], axis=1)
    train_y = train[["class"]]
    test_y = test[["class"]]
    
    with mlflow.start_run(experiment_id=mlflow_experiment_id):
        KNN_model = KNeighborsClassifier(leaf_size=leaf_size, n_neighbors=n_neighbors)
        KNN_model.fit(train_x, train_y)
        predicted_classes = KNN_model.predict(test_x)
        Accuracy = eval_metrics(test_y, predicted_classes)

        print("KNearestClassifier model (leaf_size=%f, n_neighbors=%f):" % (leaf_size, n_neighbors))
        print("  Accuracy: %s" % Accuracy)

        mlflow.log_param("leaf_size", leaf_size)
        mlflow.log_param("n_neighbors", n_neighbors)
        mlflow.log_metric("Accuracy", Accuracy)

        mlflow.sklearn.log_model(KNN_model, "model")

Note some important changes in the new train_model method above:

1. The new parameter, mlflow_experiment_id, which lets us associated runs with a specific experiment
2. We set the URI of our MLFlow instance.
3. The new with block that starts a run on MLFlow
4. The mlflow.log_metric calls which post our results back to MLFlow
5. The log_model call at the end which saves the entire trained model to MLFlow. This is useful to ensure we never lose a model that we previously trained.

#### Now we train the model the model on new parameters

In [5]:
train_model(data, mlflow_experiment_id=0, leaf_size=10, n_neighbors=10)
train_model(data, mlflow_experiment_id=0, leaf_size=5, n_neighbors=6)

KNearestClassifier model (leaf_size=10.000000, n_neighbors=10.000000):
  Accuracy: 0.9
KNearestClassifier model (leaf_size=5.000000, n_neighbors=6.000000):
  Accuracy: 0.9333333333333333



### We set up Prefect to fetch the data and retrain the model every two minutes for demonstration purposes

First, let's modify the fetch_data and train_model functions by adding the @task decorator to each. This will indicate to Prefect that these are units of work which need to be run, and which may depend on each other in some way.

In [6]:
from prefect import task, Flow, Parameter, Client
from prefect.run_configs import LocalRun
from prefect.schedules import IntervalSchedule
import requests
from datetime import timedelta

@task
def fetch_data():
    csv_url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
    
    # create list of column names
    col_name = ['sepal_length', 'sepal-width', 'petal-length', 'petal-width', 'class']
    data = pd.read_csv(csv_url, names=col_name)
    return data

@task
def train_model(data, mlflow_experiment_id, leaf_size=5, n_neighbors=6):
    mlflow.set_tracking_uri("http://127.0.0.1:5000") #Localhost

    train, test = train_test_split(data, test_size=30)

    # The predicted column is "class"
    train_x = train.drop(["class"], axis=1)
    test_x = test.drop(["class"], axis=1)
    train_y = train[["class"]]
    test_y = test[["class"]]
    
    with mlflow.start_run(experiment_id=mlflow_experiment_id):
        KNN_model = KNeighborsClassifier(leaf_size=leaf_size, n_neighbors=n_neighbors)
        KNN_model.fit(train_x, train_y)
        predicted_classes = KNN_model.predict(test_x)
        Accuracy = eval_metrics(test_y, predicted_classes)
        print("KNearestClassifier model (leaf_size=%f, n_neighbors=%f):" % (leaf_size, n_neighbors))

        mlflow.log_param("leaf_size", leaf_size)
        mlflow.log_param("n_neighbors", n_neighbors)
        mlflow.log_metric("Accuracy", Accuracy)

        mlflow.sklearn.log_model(KNN_model, "model")

### Prefect flow

Note that you'll need to setup your own prefect cloud account to view the prefect dashboard  
You'll also need to login through the prefect CLI.
Follow this tutorial to achieve the above steps;   
https://docs.prefect.io/orchestration/getting-started/set-up.html#server-or-cloud

In [7]:
def create_prefect_flow():    
    schedule = IntervalSchedule(interval=timedelta(minutes=2))

    with Flow("iris-data-model", schedule) as flow:
        data = fetch_data()
        train_model(data=data, mlflow_experiment_id=0, leaf_size=10, n_neighbors=10)

    flow.register(project_name="Iris_Class_Prediction")
    flow.run_agent(token="lz80rpY0iqZhRg528KqATQ")
    
create_prefect_flow()

Flow URL: https://cloud.prefect.io/judeleonard86-gmail-com-s-account/flow/208f3ed5-9f8e-41eb-b368-6adda02d9f0e
 └── ID: 606dbfb9-502f-413a-8565-98a83963c09f
 └── Project: Iris_Class_Prediction
 └── Labels: ['Jude']
[2021-09-02 11:42:31,289] INFO - agent | Registering agent...
[2021-09-02 11:42:31,831] INFO - agent | Registration successful!

 ____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                           |___/

[2021-09-02 11:42:47,342] INFO - agent | Starting LocalAgent with labels ['Jude']
[2021-09-02 11:42:47,342] INFO - agent | Agent documentation can be found at https://docs.prefect.io/orchestration/
[2021-09-02 11:42:47,357] INFO - agent | Waiting for flow runs...
[2021-09-02 11:44:49,0

#### The error message is as a result of shutting down the runtime

### Next, let's deploy the model to a seldon API

We specify a list of dependencies and versions in a conda.yaml file. These will be saved with the model in MLFLow, and used by Seldon when running the model too.