# Iris Flower Classification with Scikit-Learn and Hopsworks

In this notebook we will, 

1. Load the Iris Flower dataset into Pandas from a CSV file
2. Save the features to a feature group
3. Create a feature view from the feature group
4. Read the train/test features and labels using the feature view
5. Train a KNN Model using SkLearn
6. Save the trained model to Hopsworks
7. Launch a serving instance to serve the trained model (KServe)
8. Send a prediction request to the served model
9. Start a Gradio UI to interactively make predictions using the input features for the Iris Model

## Import libraries

In [12]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import joblib
#from pyspark.ml.feature import StringIndexer
#from pyspark.sql.types import IntegerType
import numpy as np
import time
import json
from confluent_kafka import Producer, Consumer, KafkaError
import random
import hopsworks
import pandas as pd
from sklearn import preprocessing



### Not app.hopsworks.ai ?

If you are running your own Hopsworks cluster (not app.hopsworks.ai):

 * uncomment the cell below
 * fill in details for your cluster
 * run the cel

In [7]:
#key=""
#with open("api-key.txt", "r") as f:
#    key = f.read().rstrip()
#os.environ['HOPSWORKS_PROJECT']="cjsurf"
#os.environ['HOPSWORKS_HOST']="35.187.178.84"
#os.environ['HOPSWORKS_API_KEY']=key    

### Connect to your Hopsworks cluster

If you only set the HOPSWORKS_API_KEY, it will assume you are connecting to app.hopsworks.ai.
Set HOPSWORKS_HOST and HOPSWORKS_PROJECT environment variables to connect to a different Hopsworks cluster.

In [13]:
project = hopsworks.login()
fs = project.get_feature_store()
mr = project.get_model_registry()

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/135
Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.


## Prepare Training Dataset

### Load Iris Dataset (csv)

In [132]:
iris_df = pd.read_csv("https://repo.hops.works/master/hopsworks-tutorials/data/iris.csv")
iris_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


In [133]:
iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   variety       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


### Save Features to the Feature Store

We can save two feature groups (hive tables), one called `iris_features` that contains the iris features and the corresponding numeric label, and another feature group called `iris_labels_lookup` for converting the numeric iris label back to categorical.

**Note**: To be able to run the feature store code, you first have to enable the Feature Store Service in your project. To do this, go to the "Settings" tab in your project, select the feature store service and click "Save". 

In [134]:
iris_fg = fs.create_feature_group(name="iris",
                                  version=2,
                                  primary_key=["sepal_length","sepal_width","petal_length","petal_width"],
                                  description="Iris flower dataset"
                                 )
iris_fg.insert(iris_df)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/135/fs/81/fg/282
Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/135/jobs/named/iris_2_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f2ae4724700>, None)

2022-07-13 22:53:37,225 ERROR: Socket exception: Network is unreachable (101)


In [135]:
query = iris_fg.select_all()
feature_view = fs.create_feature_view(name="iris",
                                      version=2,
                                      description="Read from Iris flower dataset",
                                      labels=["variety"],
                                      query=query)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/135/fs/81/fv/iris/version/2


In [136]:
X_train, y_train, X_test, y_test = feature_view.train_test_split(0.2)

2022-07-13 23:38:16,777 INFO: USE `cjsurf_featurestore`
2022-07-13 23:38:17,737 INFO: SELECT `fg0`.`sepal_length` `sepal_length`, `fg0`.`sepal_width` `sepal_width`, `fg0`.`petal_length` `petal_length`, `fg0`.`petal_width` `petal_width`, `fg0`.`variety` `variety`
FROM `cjsurf_featurestore`.`iris_2` `fg0`




In [137]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y_train_encoded=le.fit_transform(y_train['variety'])
y_test_encoded=le.fit_transform(y_test['variety'])

model = KNeighborsClassifier(n_neighbors=4)
model.fit(X_train, y_train_encoded) 

In [138]:
from sklearn.metrics import mean_squared_error

y_pred = model.predict(X_test)

rmse = mean_squared_error(y_test_encoded, y_pred)

metrics = {
    "rmse" : rmse
}
print(metrics)

{'rmse': 0.0}


In [140]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

model_dir="iris_model"
if os.path.isdir(model_dir) == False:
    os.mkdir(model_dir)
# Put the pickled model and the predictor script in the 'iris_model' directory
# Then save the whole 'iris_model' directory to the model registry
pickle='knn_iris_model.pkl'

joblib.dump(model, model_dir + "/" + pickle)

input_example = X_train.sample()
input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema, output_schema)

iris_model = mr.python.create_model(
    version=6,
    name="iris_model", 
    metrics=metrics,
    model_schema=model_schema,
    input_example=input_example, 
    description="Iris Flower Predictor")

In [141]:
%%writefile iris_model/iris_predictor.py

import joblib
import os

class Predict(object):
    
    def __init__(self):
        # NOTE: env var ARTIFACT_FILES_PATH has the local path to the model artifact files        
        self.model = joblib.load(os.environ["ARTIFACT_FILES_PATH"] + "/knn_iris_model.pkl")


    def predict(self, inputs):
        """ Serves a prediction request from a trained model"""
        return self.model.predict(inputs).tolist()

Overwriting iris_model/iris_predictor.py


In [142]:
# Save the whole 'iris_model' directory to the model registry, including the model and the predictor script
iris_model.save(model_dir)

  0%|          | 0/6 [00:00<?, ?it/s]

Model created, explore it at https://c.app.hopsworks.ai:443/p/135/models/iris_model/6


Model(name: 'iris_model', version: 6)

In [143]:
predictor_script_path = iris_model.version_path + "/iris_predictor.py"
irisclassifier = iris_model.deploy(name = "irisdeployed",
                                   script_file=predictor_script_path,  
                                   model_server="PYTHON", 
                                   serving_tool="KSERVE")
irisclassifier.describe()

Deployment created, explore it at https://c.app.hopsworks.ai:443/p/135/deployments/126
Before making predictions, start the deployment by using `.start()`
{
    "artifact_version": 1,
    "batching_configuration": {
        "batching_enabled": false
    },
    "created": "2022-07-13T21:40:35.566Z",
    "creator": "Jim Dowling",
    "id": 126,
    "inference_logging": "NONE",
    "model_name": "iris_model",
    "model_path": "/Projects/cjsurf/Models/iris_model",
    "model_server": "PYTHON",
    "model_version": 6,
    "name": "irisdeployed",
    "predictor": "iris_predictor.py",
    "predictor_resources": {
        "limits": {
            "cores": 1,
            "gpus": 0,
            "memory": 1024
        },
        "requests": {
            "cores": 1,
            "gpus": 0,
            "memory": 1024
        }
    },
    "requested_instances": 1,
    "serving_tool": "KSERVE"
}


In [144]:
irisclassifier.start()

  0%|          | 0/1 [00:00<?, ?it/s]

Start making predictions by using `.predict()`


In [152]:
irisclassifier.get_logs()

Explore all the logs and filters in the Kibana logs at https://c.app.hopsworks.ai:443/p/135/deployments/126

Instance name: irisdeployed-predictor-default-00001-deployment-66cfbf958f7dwp2
      File "/srv/hops/anaconda/envs/theenv/lib/python3.8/site-packages/sklearn/neighbors/_classification.py", line 219, in predict
        neigh_ind = self.kneighbors(X, return_distance=False)
      File "/srv/hops/anaconda/envs/theenv/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 745, in kneighbors
        X = self._validate_data(X, accept_sparse="csr", reset=False, order="C")
      File "/srv/hops/anaconda/envs/theenv/lib/python3.8/site-packages/sklearn/base.py", line 600, in _validate_data
        self._check_n_features(X, reset=reset)
      File "/srv/hops/anaconda/envs/theenv/lib/python3.8/site-packages/sklearn/base.py", line 400, in _check_n_features
        raise ValueError(
    ValueError: X has 5 features, but KNeighborsClassifier is expecting 4 features as input.
[E 220713 21

### Send Prediction Requests to the Deployed Model

For making inference requests you can use the utility method `.predict()` from the deployment object.

In [149]:
input_list = list(iris_model.input_example)

data = {"instances" : [input_list]}
res = irisclassifier.predict(data)
print(input_list)
print(le.inverse_transform([res["predictions"][0]]))

[5.2, 3.5, 1.5, 0.2]
['Setosa']


## Try out your Model Interactively

We will build a user interface with Gradio to allow you to enter the 4 feature values (sepal length/width and petal length/width), producing a prediction of the type of iris flower.

First, we have to install the gradio library.

In [126]:
!pip install gradio --quiet
!pip install typing-extensions==4.3.0



In [153]:
import gradio as gr

def iris(sl, sw, pl, pw):
    list_inputs = []
    list_inputs.append(sl)
    list_inputs.append(sw)
    list_inputs.append(pl)
    list_inputs.append(pw)
    data = {
        "instances": [list_inputs]
    }
    res = irisclassifier.predict(data)
    # Convert the numerical representation of the label back to it's original iris flower name.
    return le.inverse_transform([res["predictions"][0]])[0]

demo = gr.Interface(
    fn=iris,
    title="Iris Flower Predictive Analytics",
    description="Experiment with sepal/petal lengths/widths to predict which flower it is.",
    allow_flagging="never",
    inputs=[
        gr.inputs.Number(default=1.0, label="sepal length (cm)"),
        gr.inputs.Number(default=1.0, label="sepal width (cm)"),
        gr.inputs.Number(default=1.0, label="petal length (cm)"),
        gr.inputs.Number(default=1.0, label="petal width (cm)"),
        ],
    outputs="text")

demo.launch(share=True)



Running on local URL:  http://127.0.0.1:7867/
2022-07-13 23:46:22,156 INFO: Connected (version 2.0, client OpenSSH_7.6p1)
2022-07-13 23:46:23,207 INFO: Authentication (publickey) successful!
Running on public URL: https://53628.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<gradio.routes.App at 0x7f2ae45fbc70>,
 'http://127.0.0.1:7867/',
 'https://53628.gradio.app')