# Iris Flower Classification with Scikit-Learn and Hopsworks

In this notebook we will, 

1. Load the Iris Flower dataset into Pandas from a CSV file
2. Save the features to a feature group
3. Create a feature view from the feature group
4. Read the train/test features and labels using the feature view
5. Train a KNN Model using SkLearn
6. Save the trained model to Hopsworks
7. Launch a serving instance to serve the trained model (KServe)
8. Send a prediction request to the served model
9. Start a Gradio UI to interactively make predictions using the input features for the Iris Model

In [None]:
!pip install -U hopsworks --quiet

## Import libraries

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import joblib
import numpy as np
import time
import json
import random
import hopsworks
import pandas as pd
from sklearn import preprocessing

### Not app.hopsworks.ai ?

If you are running your own Hopsworks cluster (not app.hopsworks.ai):

 * uncomment the cell below
 * fill in details for your cluster
 * run the cel

In [None]:
#import os
#key=""
#with open("api-key.txt", "r") as f:
#    key = f.read().rstrip()
#os.environ['HOPSWORKS_PROJECT']="dowlingj"
#os.environ['HOPSWORKS_HOST']="35.187.178.84"
#os.environ['HOPSWORKS_API_KEY']=key    

### Connect to your Hopsworks cluster

If you only set the HOPSWORKS_API_KEY, it will assume you are connecting to app.hopsworks.ai.
Set HOPSWORKS_HOST and HOPSWORKS_PROJECT environment variables to connect to a different Hopsworks cluster.

In [None]:
project = hopsworks.login()
fs = project.get_feature_store()
mr = project.get_model_registry()
ms = project.get_model_serving()

## Prepare Training Dataset

### Load Iris Dataset (csv)

In [None]:
iris_df = pd.read_csv("https://repo.hops.works/master/hopsworks-tutorials/data/iris.csv")
iris_df.head()

In [None]:
iris_df.info()

### Save Features to the Feature Store

We can save two feature groups (hive tables), one called `iris_features` that contains the iris features and the corresponding numeric label, and another feature group called `iris_labels_lookup` for converting the numeric iris label back to categorical.

**Note**: To be able to run the feature store code, you first have to enable the Feature Store Service in your project. To do this, go to the "Settings" tab in your project, select the feature store service and click "Save". 

In [None]:
iris_fg = fs.create_feature_group(name="iris",
                                  version=1,
                                  primary_key=["sepal_length","sepal_width","petal_length","petal_width"],
                                  description="Iris flower dataset"
                                 )
iris_fg.insert(iris_df)

### Create a Feature View to Read with

Feature views are used to read features for training and inference

In [None]:
query = iris_fg.select_all()
feature_view = fs.create_feature_view(name="iris",
                                      version=1,
                                      description="Read from Iris flower dataset",
                                      labels=["variety"],
                                      query=query)

In [None]:
feature_view = fs.get_feature_view(name="iris", version=1)

### Create training data

Return training data as Pandas DataFrames, split into train/test sets

* X_train is the train features
* Y_train is the train labels
* X_test is the test features
* Y_test is the test labels

In [None]:
X_train, y_train, X_test, y_test = feature_view.train_test_split(0.2)

### Train a model
Train a KNN (k-nearest neighbors) model with Scikit-learn. Use a label encoder to map the categorical labels to numbers.

In [None]:
from sklearn import preprocessing

le = preprocessing.LabelEncoder()
y_train_encoded=le.fit_transform(y_train['variety'])
y_test_encoded=le.fit_transform(y_test['variety'])

model = KNeighborsClassifier(n_neighbors=4)
model.fit(X_train, y_train_encoded) 

### Evalute model performance

Compute the MSE of the model against the test set

In [None]:
from sklearn.metrics import mean_squared_error

y_pred = model.predict(X_test)

rmse = mean_squared_error(y_test_encoded, y_pred)

metrics = {
    "rmse" : rmse
}
print(metrics)

### Create the Model for the Model Registry

Save the following pickled objects as .pkl files locally to a directory that will be uploaded later to the model registry:

 * the model object, **model** saved as knn_iris_model.pkl
 * the label encoder object, **le** saved as knn_iris_encoder.pkl, so that we can reconstruct categorical names 
    from the encoded predictions (numbers) 
    
The model input schema is the same set of features as in the *x_train* DataFrame.

The model output schema is the same label as in the *y_train* DataFrame.

Finally, lazily create the model that will be register, including all files (artifacts) in the directory (containing the pickled label encoder object and the pickled model object), the model's input/output schema, and a sample input row (**input_example**). The model registry is the **mr** object, and for our Scikit-Learn model, we create a model of type Python with **mr.python.create_model()**. For TensorFlow, there is *mr.tensorflow.create_model()*.

In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema
import os

model_dir="iris_model"
if os.path.isdir(model_dir) == False:
    os.mkdir(model_dir)
# Put the pickled model and the predictor script in the 'iris_model' directory
# Then save the whole 'iris_model' directory to the model registry
pickle='knn_iris_model.pkl'
le_pickle='knn_iris_encoder.pkl'

joblib.dump(model, model_dir + "/" + pickle)
joblib.dump(le, model_dir + "/" + le_pickle)


input_example = X_train.sample()
input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema, output_schema)

iris_model = mr.python.create_model(
    version=1,
    name="iris_model", 
    metrics=metrics,
    model_schema=model_schema,
    input_example=input_example, 
    description="Iris Flower Predictor")

### Predictor script for Python models

Scikit-learn models are deployed as Python models, in which case you need to provide a **Predict** class that implements the **predict** method. The **predict()** method invokes the model on the inputs and returns the prediction as a list.

The **init()** method is run when the predictor is loaded into memory, loading the model from the local directory it is materialized to, *ARTIFACT_FILES_PATH*.

The directive "%%writefile" writes out the cell before to the given Python file. We will use the **iris_predictor.py** file to create a deployment for our Scikit-Learn K-NN model. 

In [None]:
%%writefile iris_model/iris_predictor.py

import joblib
import os

class Predict(object):
    
    def __init__(self):
        # NOTE: env var ARTIFACT_FILES_PATH has the local path to the model artifact files        
        self.model = joblib.load(os.environ["ARTIFACT_FILES_PATH"] + "/knn_iris_model.pkl")


    def predict(self, inputs):
        """ Serves a prediction request from a trained model"""
        return self.model.predict(inputs).tolist()

### Register the model with the Model Registry

Register the model and its artifacts in the 'iris_model' directory to the model registry, including the model, label encoder object, and the predictor script

In [None]:
iris_model.save(model_dir)

### Create the model deployment

Provide the predictor script because it is a Python model (Scikit-Learn)

In [None]:
predictor_script_path = iris_model.version_path + "/iris_predictor.py"
irisclassifier = iris_model.deploy(name = "irisdeployed",
                                   script_file=predictor_script_path,  
                                   model_server="PYTHON", 
                                   serving_tool="KSERVE")
irisclassifier.describe()

### Start the model running

Check the logs.

In [None]:
irisclassifier.start()
irisclassifier.get_logs()

### Send Prediction Requests to the Deployed Model

For making inference requests you can use the utility method `.predict()` from the deployment object.

In [None]:
input_list = list(iris_model.input_example)

data = {"instances" : [input_list]}
res = irisclassifier.predict(data)
print(input_list)
print(le.inverse_transform([res["predictions"][0]]))

## Try out your Model Interactively with a Gradio UI

We will build a user interface with Gradio to allow you to enter the 4 feature values (sepal length/width and petal length/width), producing a prediction of the type of iris flower.

First, we have to install the gradio library.

In [None]:
!pip install gradio --quiet
!pip install typing-extensions==4.3.0

### Check in the Hopsworks UI that your model deployment is running 

If your model is already deployed, you can get a reference to it.

Your iris model deployment needs to be running for the Gradio UI to work.

In [None]:
irisclassifier = ms.get_deployment("irisdeployed")
irisclassifier.describe()

### Download your model artifacts from the Model Registry

You  need to download the label_encoder from the model registry to transform predictions to the labels. This could alternatively be done in a transformer script in KServe.

In [None]:
iris_model = mr.get_model("iris_model", version=1)
model_dir = iris_model.download()

le = joblib.load(model_dir + "/knn_iris_encoder.pkl")

### Run Gradio

Start the Gradio UI. Users enter the 4 feature values and a prediction is returned. We use the label encoder object to transform the number returned to the categorical value (stringified name of the Iris Flower).

In [None]:
import gradio as gr


def iris(sl, sw, pl, pw):
    list_inputs = []
    list_inputs.append(sl)
    list_inputs.append(sw)
    list_inputs.append(pl)
    list_inputs.append(pw)
    data = {
        "instances": [list_inputs]
    }
    res = irisclassifier.predict(data)
    # Convert the numerical representation of the label back to it's original iris flower name.
    return le.inverse_transform([res["predictions"][0]])[0]

demo = gr.Interface(
    fn=iris,
    title="Iris Flower Predictive Analytics",
    description="Experiment with sepal/petal lengths/widths to predict which flower it is.",
    allow_flagging="never",
    inputs=[
        gr.inputs.Number(default=1.0, label="sepal length (cm)"),
        gr.inputs.Number(default=1.0, label="sepal width (cm)"),
        gr.inputs.Number(default=1.0, label="petal length (cm)"),
        gr.inputs.Number(default=1.0, label="petal width (cm)"),
        ],
    outputs="text")

demo.launch(share=True)