# Get started with Metrics Tracking

This notebook demonstrates how to use MLFlow to:
- Log metrics, params, and artifacts to MLFlow.
- Log, register, and load models using a local MLflow Tracking Server.
- Interact with the MLflow Tracking Server using the MLflow fluent API.
- Perform inference on Pandas DataFrames by loading models as generic Python Functions (pyfunc).

In [None]:
%load_ext autoreload
%autoreload 2

import joblib
import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn
from mlflow.tracking import MlflowClient
import pandas as pd
from pathlib import Path
from sklearn import ensemble, model_selection
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Train model and calculate metrics

## Load Data

More information about the dataset can be found in UCI machine learning repository: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Acknowledgement: Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg

In [None]:
# Download original dataset with: python src/pipelines/load_data.py 

raw_data = pd.read_csv("../data/raw_data.csv")
raw_data.head()

## Prepare data

In [None]:
target = 'cnt'
prediction = 'prediction'
numerical_features = ['temp', 'atemp', 'hum', 'windspeed', 'mnth', 'hr', 'weekday']
categorical_features = ['season', 'holiday', 'workingday', ]

In [None]:
sample_data = raw_data.set_index('dteday').loc['2011-01-01 00:00:00':'2011-01-28 23:00:00'].reset_index()

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    sample_data[numerical_features + categorical_features],
    sample_data[target],
    test_size=0.3
)

print(X_train.shape)
print(X_test.shape)

## Train a  Model

In [None]:
model = ensemble.RandomForestRegressor(random_state = 0, n_estimators = 50)
model.fit(X_train, y_train) 

model_path = Path('../models/model.joblib')
joblib.dump(model, model_path)

In [None]:
model

## Calculate Metrics

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

preds = model.predict(X_test)

me = mean_squared_error(y_test, preds)
mae = mean_absolute_error(y_test, preds)

print(me, mae)

# Metrics Tracking with MLflow

## Set up MLFlow

In [None]:
MLFLOW_TRACKING_URI = "http://localhost:5000"

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)


## Log params, metrics and artifacts

In [None]:
with mlflow.start_run() as run: 

    # Log params 
    mlflow.log_param('model', 'RandomForest') 
    mlflow.log_params({'random_state': 0, 'n_estimators': 50})

    # Log metrics
    mlflow.log_metric('me', round(me, 3))
    mlflow.log_metric('mae', round(mae, 3))

    # Log the sklearn model and register as version 1
    mlflow.log_artifact("../data/raw_data.csv")
    mlflow.log_artifact("../models/model.joblib")


    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("random-forest", "Random Forest Classifier")

    # Infer the model signature
    signature = infer_signature(X_train, model.predict(X_train))

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=model,
        artifact_path="rf_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="1-get-started-random-forest",
    )

## Load our saved model as a Python Function

Although we can load our model back as a native scikit-learn format with `mlflow.sklearn.load_model()`, below we are loading the model as a generic Python Function, which is how this model would be loaded for online model serving. We can still use the `pyfunc` representation for batch use cases, though, as is shown below.

In [None]:
model_info.model_uri

In [None]:
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

loaded_model

## Use `loaded_model` to get predictions for `X_test` dataset

In [None]:
predictions = loaded_model.predict(X_test)

# Convert X_test validation feature data to a Pandas DataFrame
result = pd.DataFrame(X_test)#, columns=[numerical_features + categorical_features])

# Add the actual classes to the DataFrame
result["actual_class"] = y_test

# Add the model predictions to the DataFrame
result["predicted_class"] = predictions

result[:4]