In [1]:
!python -c "import sys; print(sys.executable)"

/home/haseebullah/anaconda3/Assignment-Mlops/mlops-assignment/bin/python


# MLFlow lab

In [2]:
import pandas as pd

In [3]:
pd.__version__

'2.0.1'

### Setting up MLFlow tracking server

Server can be accessed on `localhost:5004`


In [4]:
%%bash --bg

mlflow server --host 0.0.0.0 \
    --port 5004 \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns

### MLProject file

Here we have defined entry points and configuered Ml flow steps.

Defined parameters of main function, file and max depth

In [5]:
%cat MLproject

name: basic_mlflow

# this file is used to configure Python package dependencies.
# it uses Anaconda, but it can be also alternatively configured to use pip.
conda_env: conda.yaml

# entry points can be ran using `mlflow run <project_name> -e <entry_point_name>
entry_points:
  main:
    # parameters is a key-value collection.
    parameters:
      file_name:
        type: str
        default: "wine.csv"
      max_depth:
        type: int
        default: 5
    command: "python train.py {file_name} {max_depth}"


## Training

The model can be trained now that is defined in train.py.

We use random forest model from our previous supervised learning assignment.

the model can verified by checking `localhost:5004`:


In [6]:
import sklearn

In [7]:
sklearn.__version__

'1.2.2'

In [8]:
%%bash
source mlflow_env_vars.sh
mlflow run . 


2023/05/05 10:09:36 INFO mlflow.utils.conda: Conda environment mlflow-dd0fbdd40ba98798131458f29496394bd1a3fb33 already exists.
2023/05/05 10:09:36 INFO mlflow.projects.utils: === Created directory /tmp/tmpitvazew4 for downloading remote URIs passed to arguments of type 'path' ===
2023/05/05 10:09:36 INFO mlflow.projects.backend.local: === Running command 'source /home/haseebullah/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-dd0fbdd40ba98798131458f29496394bd1a3fb33 1>&2 && python train.py wine.csv 5' in run with ID '2b071c6a40514bb9be7053e8d7318baa' === 
Registered model 'sklearn_rfc' already exists. Creating a new version of this model...
2023/05/05 10:09:39 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: sklearn_rfc, version 11
Created version '11' of model 'sklearn_rfc'.
2023/05/05 10:09:39 INFO mlflow.projects: === Run (ID '2b071c6a40514bb9be7053e8d7318baa') succeeded ===


## Inspecting stored models

The trained models are stored in `mlruns/0`.

These directories contain artifacts and config that is needed to serve them.

The path is same that was defined in class example

In [9]:
%%bash
last_model_path=$(ls -tr mlruns/0/ | tail -1)
cat mlruns/0/$last_model_path/artifacts/knn/MLmodel

artifact_path: knn
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.10.6
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.2.2
mlflow_version: 2.3.1
model_uuid: ea83736a0f804e099ea9bc17ea1f735f
run_id: 0659857acaa94fe0ad0a80bfbbe483cf
utc_time_created: '2023-05-03 11:09:26.836996'


In [10]:
import mlflow

In [11]:
mlflow.__version__

'2.3.1'

## Serving model

The model has been traine, which can seen on `localhost:5004`.

The latest version 10 is moved to production stage.

The following cell will serve the model at localhost on port 5005.

In [12]:
%%bash --bg
source mlflow_env_vars.sh
mlflow --version
mlflow models serve -m models:/sklearn_rfc/Production -p 5005 --env-manager=conda 


# Prediction

We have load data that we can feed into prediction server.

In [13]:
df = pd.read_csv("wine.csv")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


Now predicting the quality of wine

**warning: this might fail at first because the prediction server didn't spin up; in this case wait a minute**

In [16]:
%%bash
data='[[7.4,0.7,0,1.9,0.076,11], [7.4,0.7,0,1.9,0.077,10]]'
echo $data

curl -d "{\"inputs\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5005/invocations

[[7.4,0.7,0,1.9,0.076,11], [7.4,0.7,0,1.9,0.077,10]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    87  100    23  100    64   7504  20880 --:--:-- --:--:-- --:--:-- 29000


{"predictions": [5, 5]}

In [17]:
%%bash
data='[[7.4,0.7,0,1.9,0.076,11], [0.0,0.1,1,1.9,0.077,10]]'
echo $data

curl -d "{\"instances\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5005/invocations

[[7.4,0.7,0,1.9,0.076,11], [0.0,0.1,1,1.9,0.077,10]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    90  100    23  100    67  10962  31935 --:--:-- --:--:-- --:--:-- 45000


{"predictions": [5, 6]}

In [18]:
%%bash
data='[[0.4,0.7,1,0.5,0.076,11], [0.0,0.1,0,1.9,0.000,13]]'
columns='["fixed acidity", "volatile acidity", "citric acid", "residual sugar","chlorides", "free sulfur dioxide"]'
echo $data

curl -d "{\"dataframe_split\":{\"columns\":[\"fixed acidity\", \"volatile acidity\", \"citric acid\", \"residual sugar\",\"chlorides\", \"free sulfur dioxide\"],\"data\": $data}}" -H 'Content-Type: application/json' 127.0.0.1:5005/invocations

[[0.4,0.7,1,0.5,0.076,11], [0.0,0.1,0,1.9,0.000,13]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   221  100    23  100   198   6104  52547 --:--:-- --:--:-- --:--:-- 73666


{"predictions": [4, 7]}

## Hurrah!, Now we can see the prediction of our model :-).