# MLOps with `vetiver`

## Build a model
Data scientists can still use the tools they are most comfortable with for the bulk of their workflow.

In [1]:
import pandas as pd
import numpy as np
from sklearn import model_selection, preprocessing, pipeline
from sklearn.ensemble import RandomForestRegressor
import rsconnect
import vetiver
from vetiver import vetiver_pin_write, vetiver_endpoint

import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

api_key = os.getenv("API_KEY")
rsc_url = os.getenv("RSC_URL")
np.random.seed(500)

We can read in our data, and fit a pipeline that has both the preprocessing steps and the model.

In [2]:
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = pd.DataFrame(raw)

In [3]:
df = df[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()
X, y = df.iloc[:,1:],df['like_count']
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y,test_size=0.2)

le = preprocessing.OrdinalEncoder().fit(X)
rf = RandomForestRegressor().fit(le.transform(X_train), y_train)

In [4]:
rf_pipe = pipeline.Pipeline([('label_encoder',le), ('random_forest', rf)])

## Version a model

Users first create a deployable model object, `VetiverModel()`. This holds all the pieces necessary to deploy the model later.

*In R, you saw the equivalent, `vetiver_model()`.*

In [5]:
v = vetiver.VetiverModel(
    rf_pipe, 
    ptype_data=X_train, 
    model_name = "isabel.zimmerman/superbowl_rf"
)

In [6]:
import pins 
board = pins.board_rsconnect(api_key=api_key, server_url=rsc_url, allow_pickle_read=True)

vetiver_pin_write(board, v)

Writing pin:
Name: 'isabel.zimmerman/superbowl_rf'
Version: 20220809T143202Z-fd402


## Deploy a model
Next, intialize the API endpoint with `VetiverAPI()`. To run the API locally, use `.run()`

*In R, you saw the equivalents, `vetiver_api()` and `pr_run()`.*

In [13]:
app = vetiver.VetiverAPI(v, check_ptype=True)
app.run()

This is a great start to debug my API, but my end goal is to *NOT* run my model on my personal machine. We can instead deploy to a remote server, such as RStudio Connect. This will involve setting up a connection with the server and deploying our pinned model to RSConnect.

In [8]:
connect_server = rsconnect.api.RSConnectServer(url = rsc_url, api_key = api_key)

We can deploy our model, which is strongly linked to the version we just pinned above. Note: this model is already deployed, so no need to run this chunk again, unless we want to update our model.

In [None]:
#| eval: false

vetiver.deploy_rsconnect(
    connect_server = connect_server, 
    board = board, 
    pin_name = "isabel.zimmerman/superbowl_rf", 
    version = "59869")

With the model deployed, we can interact with the API endpoint as if it were a model in memory.

In [12]:
connect_endpoint = vetiver_endpoint("https://colorado.rstudio.com/rsc/ads/predict")

response = vetiver.predict(data = X_test.head(5), endpoint = connect_endpoint)
response

Unnamed: 0,prediction
0,452.581548
1,15054.536775
2,8830.437135
3,9872.934486
4,181.150403


Vetiver also helps make deployment easier for other cloud providers by offering functions to automatically write `app.py` files and Dockerfiles.

In [None]:
# write app to be deployed within docker, or to other cloud provider
vetiver.write_app(board, "isabel.zimmerman/superbowl_rf", version = "59869")

# write Dockerfile
vetiver.write_docker()