# Ryan tries to deploy this thing
No staging button in model registry for my single model. Cannot seem to deploy...

So uh, this is a problem. In the GUI, it says I have a registered model. However, here and when I go to deploy, it says there is no registered model.

Let's try again.

In [24]:
import pandas as pd
import lakefs_client
from lakefs_client.client import LakeFSClient
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from mlflow.models import infer_signature
import mlflow
from mlflow import MlflowClient
from pprint import pprint

In [3]:
# lakeFS credentials and endpoint
configuration = lakefs_client.Configuration()
configuration.username = ''
configuration.password = ''
configuration.host = '' # yeah so you do need to make a hole in the security group
                         # even if the two ec2 instances are in the same SG.
                         # These are private IPs, if you are wondering - hopefully this
                         # does not go over the net...
client = LakeFSClient(configuration)

In [4]:
# This grabs the data.csv file out of the main branch of the countries repository in lakeFS 
file = client.objects.get_object('countries','main','data.csv')
df = pd.read_csv(file)

In [5]:
features = df[["Longitude", "Latitude"]].to_numpy()
target = df[["CID"]].to_numpy()

In [6]:
clf = make_pipeline(StandardScaler(), LinearSVC(dual="auto", random_state=0, tol=1e-5))

In [7]:
clf.fit(features, target.ravel())

In [10]:
X_train, X_test, y_train, y_test = train_test_split(features, target.ravel(), test_size=0.2, random_state=42)

In [17]:
params = {"pipeline":True, "scaler": "standard", "dual": "auto", "random_state": 0, "tol":1e-5}

In [12]:
y_pred = clf.predict(X_test)

In [13]:
accuracy = accuracy_score(y_test, y_pred)

accuracy

0.07402298850574712

Well that's still garbage, but better than SGD. Let's see if we can't deploy this puppy.

In [16]:
# Infer the model signature
signature = infer_signature(X_train, clf.predict(X_train))

In [21]:
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")

In [33]:
# Start an MLflow run
with mlflow.start_run() as run:
    # Log the hyperparameters
    mlflow.log_params(params)

    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Country prediction take 2")

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="countries_model",
        signature=signature,
        input_example=X_train[:10],
        registered_model_name="tracking-countries",
    )


Registered model 'tracking-countries' already exists. Creating a new version of this model...
2023/12/03 03:21:00 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: tracking-countries, version 2
Created version '2' of model 'tracking-countries'.


Yep that's not right. I resetted the instance to get mlflow model serve working... I don't think anything persisted after that... nor is this new run showing in the GUI... perhaps I should hook up the tracking server to RDS?

So mlflow server needs to be run sudo...

In [25]:
client = MlflowClient()
for rm in client.search_registered_models():
    pprint(dict(rm), indent=4)


{   'aliases': {},
    'creation_timestamp': 1701570256668,
    'description': '',
    'last_updated_timestamp': 1701570311957,
    'latest_versions': [   <ModelVersion: aliases=[], creation_timestamp=1701570256679, current_stage='Production', description='', last_updated_timestamp=1701570311957, name='tracking-countries', run_id='a2cdf1d1e13744929fec496d15068a73', run_link='', source='mlflow-artifacts:/0/a2cdf1d1e13744929fec496d15068a73/artifacts/countries_model', status='READY', status_message='', tags={}, user_id='', version='1'>],
    'name': 'tracking-countries',
    'tags': {}}


In [34]:
mlflow.end_run()

In [35]:
model_uri = "runs:/{}/model".format(run.info.run_id)

In [36]:
model_uri

'runs:/f150aeec44c74b3697cc9d18033802a0/model'

I tooled around in the console for hours trying to get the local prediction server up, mostly in the console. After installing pyenv, I finally got it working by running: 

`mlflow models serve -m 'models:/countries-tracking/Production' -h 0.0.0.0 --port 5002 --no-conda`

I was able to query it using 

```
curl -d '{"dataframe_split": {
"columns": ["lon","lat"],
"data": [[-98,36]]}}' \
-H 'Content-Type: application/json' -X POST localhost:5002/invocati
```

If you are seeing something about http or https, it cannot find your model.ons