# MLOps with `vetiver`

## Build a model
Data scientists can still use the tools they are most comfortable with for the bulk of their workflow.

![](images/ml_ops_cycle.png)

In [1]:
import pandas as pd
import numpy as np
from sklearn import model_selection, preprocessing, metrics
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

import vetiver

np.random.seed(500)

In [2]:
raw = pd.read_csv('data.csv').drop(columns = 'Unnamed: 0')
raw


Unnamed: 0,ridership,Austin,Quincy_Wells,Belmont,Archer_35th,Oak_Park,Western,Clark_Lake,Clinton,Merchandise_Mart,...,Blackhawks_Away,Blackhawks_Home,Bulls_Away,Bulls_Home,Bears_Away,Bears_Home,WhiteSox_Away,WhiteSox_Home,Cubs_Away,Cubs_Home
0,15.732,1.463,8.371,4.599,2.009,1.421,3.319,15.561,2.403,6.481,...,0,0,0,0,0,0,0,0,0,0
1,15.762,1.505,8.351,4.725,2.088,1.429,3.344,15.720,2.402,6.477,...,0,0,0,1,0,0,0,0,0,0
2,15.872,1.519,8.359,4.684,2.108,1.488,3.363,15.558,2.367,6.405,...,0,0,1,0,0,0,0,0,0,0
3,15.874,1.490,7.852,4.769,2.166,1.445,3.359,15.745,2.415,6.489,...,0,0,0,0,0,0,0,0,0,0
4,15.423,1.496,7.621,4.720,2.058,1.415,3.271,15.602,2.416,5.798,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,19.546,2.305,8.105,5.953,3.302,2.033,5.076,20.632,4.193,7.107,...,0,0,0,0,0,0,1,1,1,1
4996,6.888,1.164,2.073,3.437,1.715,0.865,2.417,6.775,1.685,2.027,...,0,0,0,0,0,0,1,1,1,1
4997,5.701,0.808,1.729,2.605,1.136,0.573,1.840,5.744,1.588,1.242,...,0,0,0,0,0,1,1,1,1,1
4998,20.330,2.318,8.573,5.789,3.347,1.997,5.206,20.039,3.999,7.692,...,0,0,0,0,0,0,0,0,0,0


We can read in our data, and fit a pipeline that has both the preprocessing steps and the model.

In [3]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    raw.drop(columns = ['ridership']),
    raw['ridership'],
    test_size=0.2
)

In [4]:
rf = RandomForestRegressor().fit(X_train, y_train)

## Version a model

Users first create a deployable model object, `VetiverModel()`. This holds all the pieces necessary to deploy the model later.

*In R, you saw the equivalent, `vetiver_model()`.*

In [5]:
v = vetiver.VetiverModel(
    rf, 
    ptype_data=X_train, 
    model_name = "chicago_ridership"
)

In [7]:
import pins 
board = pins.board_folder(path = ".", allow_pickle_read=True)

vetiver.vetiver_pin_write(board, v)

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'chicago_ridership'
Version: 20221003T123443Z-a28ef


In [None]:
board.pin_meta("linear_model").user
#vetiver.model_card()

{'ptype': '{"Austin": 0.604, "Quincy_Wells": 1.662, "Belmont": 1.936, "Archer_35th": 0.949, "Oak_Park": 0.458, "Western": 1.04, "Clark_Lake": 3.665, "Clinton": 0.961, "Merchandise_Mart": 0.529, "Irving_Park": 1.567, "Washington_Wells": 0.827, "Harlem": 0.933, "Monroe": 1.25, "Polk": 0.669, "Ashland": 0.534, "Kedzie": 1.112, "Addison": 1.162, "Jefferson_Park": 2.441, "Montrose": 0.603, "California": 0.421, "temp_min": 55.0, "temp": 63.05, "temp_max": 79.0, "temp_change": 24.0, "dew": 40.45, "humidity": 46.5, "pressure": 30.15, "pressure_change": 0.109999999999999, "wind": 9.8, "wind_max": 17.3, "gust": 0.0, "gust_max": 27.6, "percip": 0.0, "percip_max": 0.0, "weather_rain": 0.0, "weather_snow": 0.0, "weather_cloud": 0.692307692307692, "weather_storm": 0.115384615384615, "Blackhawks_Away": 0.0, "Blackhawks_Home": 0.0, "Bulls_Away": 0.0, "Bulls_Home": 0.0, "Bears_Away": 0.0, "Bears_Home": 0.0, "WhiteSox_Away": 1.0, "WhiteSox_Home": 1.0, "Cubs_Away": 1.0, "Cubs_Home": 1.0}',
 'required_pkg

## Deploy a model
Next, intialize the API endpoint with `VetiverAPI()`. To run the API locally, use `.run()`

*In R, you saw the equivalents, `vetiver_api()` and `pr_run()`.*

In [None]:
app = vetiver.VetiverAPI(v, check_ptype=True)
app.run()

INFO:     Started server process [18627]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)


INFO:     127.0.0.1:63789 - "GET / HTTP/1.1" 307 Temporary Redirect
INFO:     127.0.0.1:63789 - "GET /__docs__ HTTP/1.1" 200 OK
INFO:     127.0.0.1:63789 - "GET /openapi.json HTTP/1.1" 200 OK
INFO:     127.0.0.1:63810 - "POST /predict HTTP/1.1" 200 OK
INFO:     127.0.0.1:49282 - "POST /predict HTTP/1.1" 200 OK


INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [18627]


This is a great start to debug my API, but my end goal is to *NOT* run my model on my personal machine. We can instead deploy to a remote server, such as RStudio Connect. This will involve setting up a connection with the server and deploying our pinned model to RSConnect.

We can deploy our model, which is strongly linked to the version we just pinned above. Note: this model is already deployed, so no need to run this chunk again, unless we want to update our model.

In [None]:
# connect_server = rsconnect.api.RSConnectServer(url = rsc_url, api_key = api_key)

# vetiver.deploy_rsconnect(
#     connect_server = connect_server, 
#     board = board, 
#     pin_name = "superbowl_rf", 
#     version = "20220901T144702Z-fd402")

Vetiver also helps make deployment easier for other cloud providers by offering functions to automatically write `app.py` files and Dockerfiles.

In [None]:
# write app to be deployed within docker, or to other cloud provider
vetiver.write_app(board, "chicago_ridership", version = "20221002T211623Z-7bad9")
vetiver.write_docker()

With the model deployed, we can interact with the API endpoint as if it were a model in memory.

In [None]:
docker_endpoint = vetiver_endpoint("http://0.0.0.0:8080/predict")

response = vetiver.predict(data = X_test, endpoint = docker_endpoint)
response

Unnamed: 0,prediction
0,452.581548
1,15054.536775
2,8830.437135
3,9872.934486
4,181.150403
5,176.770317
6,417.763305
7,32092.437069
8,2791.377251
9,15054.536775


In [8]:
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]
td = timedelta(weeks = 1)

In [9]:
monitored_data = pd.read_csv("monitored_data.csv").drop(columns = 'Unnamed: 0')
monitored_data.head(3)

Unnamed: 0,ridership,Austin,Quincy_Wells,Belmont,Archer_35th,Oak_Park,Western,Clark_Lake,Clinton,Merchandise_Mart,...,Bulls_Away,Bulls_Home,Bears_Away,Bears_Home,WhiteSox_Away,WhiteSox_Home,Cubs_Away,Cubs_Home,date,preds
0,20.588,2.484,8.415,5.743,3.419,2.151,5.322,20.547,3.839,8.155,...,0,0,0,0,0,0,0,0,2014-10-01,20.168709
1,20.561,2.45,8.489,5.865,3.472,2.146,5.394,20.832,3.956,7.89,...,0,0,0,0,0,0,0,0,2014-10-02,20.675898
2,20.26,2.443,7.818,6.042,3.349,2.066,5.105,20.354,3.913,7.247,...,0,0,0,0,0,0,0,0,2014-10-03,19.970072


In [10]:
m = vetiver.compute_metrics(data = monitored_data, 
                    date_var="date", 
                    period = td, 
                    metric_set=metric_set, 
                    truth="ridership", 
                    estimate="preds")
m

Unnamed: 0,index,n,metric,estimate
0,2014-10-01,7,mean_absolute_error,0.708226
1,2014-10-01,7,mean_squared_error,0.716034
2,2014-10-08,7,mean_absolute_error,1.464859
3,2014-10-08,7,mean_squared_error,3.385447
4,2014-10-15,7,mean_absolute_error,0.607319
...,...,...,...,...
195,2016-08-10,7,mean_squared_error,1.105440
196,2016-08-17,7,mean_absolute_error,0.889162
197,2016-08-17,7,mean_squared_error,0.888310
198,2016-08-24,5,mean_absolute_error,1.095029


In [11]:
monitor_plot = vetiver.plot_metrics(m)
monitor_plot.update_yaxes(matches=None)
monitor_plot.show()

In [None]:
monitor.write_html("images/monitor.html")