# MLflow 

The sktime custom model flavor enables logging of sktime models in MLflow format via the `sktime.utils.mlflow_sktime.save_model()` and `sktime.utils.mlflow_sktime.log_model()` methods. These methods also add the `pyfunc` flavor to the MLflow Models that they produce, allowing the model to be interpreted as generic Python functions for inference via `sktime.utils.mlflow_sktime.pyfunc.load_model()`. This loaded PyFunc model can only be scored with a DataFrame input. You can also use the `sktime.utils.mlflow_sktime.load_model()` method to load MLflow Models with the sktime model flavor in native sktime formats.

The `pyfunc` flavor of the model supports sktime predict methods `predict`,  `predict_interval`, `predict_quantiles` and `predict_var`.

The interface for utilizing a sktime model loaded as a `pyfunc` type for generating forecasts requires passing an exogenous regressor as Pandas DataFrame to the `pyfunc.predict()` method (an empty DataFrame can be passed if no exogenous regressor is used). The configuration of predict methods and parameter values passed to the predict methods is defined by a dictionary to be saved as an attribute of the fitted sktime model instance (see code example below). 

Signature logging for sktime from a non-pyfunc artifact will not function correctly for `predict_interval` or `predict_quantiles`. The output of the native sktime model flavor for these methods is not a recognized signature type due to the MultiIndex column structure of the returned DataFrame. MLflow's ``infer_schema`` will function correctly if using the ``pyfunc`` flavor of the model, though.

## 1. Setup
### 1.1 Config

In [41]:
model_path = "model"

### 1.1 Imports

In [47]:
import mlflow

from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.utils import mlflow_sktime

### 1.2 Load sample data

In [13]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

## 2. Example usage of native `sktime flavor` and `pyfunc flavor`

### 2.1 Create prediction config for pyfunc flavor
- Pyfunc prediction config can be defined in two ways:
    - `Dict[str, dict]` if parameter values are be passed to the arguments of predict method, for example  `{"predict_method": {"predict": None, "predict_interval": {"coverage": [0.1, 0.9]}}`
    - `Dict[str, list]`, if using default parameters in predict method, for example  `{"predict_method": ["predict", "predict_interval"}`

In [36]:
pyfunc_predict_conf = {
    "predict_method": {
        "predict": None,
        "predict_interval": {"coverage": [0.1, 0.9]},
        "predict_quantiles": {"alpha": [0.1, 0.9]},
        "predict_var": {"cov": False},
    }
}

### 2.2 Train and save model

In [37]:
with mlflow.start_run():

    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3],
    )
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.save_model(forecaster, model_path)

### 2.3 Load model

#### 2.3.1 Native sktime flavor

In [38]:
loaded_model = mlflow_sktime.load_model(model_path)

#### 2.3.2 Pyfunc flavor

In [39]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)

Could not find sktime flavor configuration during model loading process. Assuming 'pickle' serialization format.


### 2.4 Generate predictions

#### 2.4.1 Native sktime flavor

In [40]:
print(loaded_model.predict(X=X_test))
print("\n", loaded_model.predict_interval(X=X_test))
print("\n", loaded_model.predict_quantiles(X=X_test))
print("\n", loaded_model.predict_var(X=X_test))

1959    66513.0
1960    66513.0
1961    66513.0
Freq: A-DEC, dtype: float64

           Coverage              
               0.9              
             lower         upper
1959  64211.598663  68814.401337
1960  63258.327017  69767.672983
1961  62526.855956  70499.144044

          Quantiles              
              0.05          0.95
1959  64211.598663  68814.401337
1960  63258.327017  69767.672983
1961  62526.855956  70499.144044

                  0
1959  1.957628e+06
1960  3.915256e+06
1961  5.872885e+06


#### 2.4.2 Pyfunc flavor

In [42]:
loaded_pyfunc.predict(X_test)

Unnamed: 0,predict__0,predict_interval__Coverage__0.1__lower,predict_interval__Coverage__0.1__upper,predict_interval__Coverage__0.9__lower,predict_interval__Coverage__0.9__upper,predict_quantiles__Quantiles__0.1,predict_quantiles__Quantiles__0.9,predict_var__0
1959,66513.0,66337.180592,66688.819408,64211.598663,68814.401337,64719.913711,68306.086289,1957628.0
1960,66513.0,66264.353808,66761.646192,63258.327017,69767.672983,63977.193051,69048.806949,3915256.0
1961,66513.0,66208.471852,66817.528148,62526.855956,70499.144044,63407.283445,69618.716555,5872885.0


## 3. Model deployment example

### 3.1 Create experiment

In [56]:
artifact_path = "model"

mlflow.set_experiment("Test Sktime")

with mlflow.start_run() as run:

    forecaster = NaiveForecaster()
    forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)

run_id = run.info.run_id
print(f"MLflow run id: {run_id}")

MLflow run id: 1fe2a80b9842443aa4041a447a21d734


### 3.2 Deploy pyfunc model to local REST API endpoint
- Open a terminal window and cd into `examples`directory
- In the terminal run: `mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>`
    - where you substitute `<RUN_ID>` by the `run_id` and `<HOST>` by the network address to listen on (e.g. `127.0.0.1`) 
- More details here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve

### 3.3 Request predictions from local REST API endpoint

- For mor details see: https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools

#### 3.3.1 JSON input using `dataframe_split` field with pandas DataFrame in the `split` orientation

In [49]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)

# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}


#### 3.3.2 JSON input using `dataframe_records` field with pandas DataFrame in the `records` orientation

In [52]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}


#### 3.3.3 CSV input using valid `pd.DataFrame` csv representation

In [55]:
headers = {
    "Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()

,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0

