# Fundamentals of Machine Learning - Exercise 12
Goal of the excercise is to learn how to save trained models and use selected advanced libraries like Plotly or Optuna.


![meme01](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_01.png?raw=true)

In [None]:
# For Google Colab
!pip install optuna

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import optuna
import joblib

import sklearn.datasets as skd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, accuracy_score

# 📊 Plotly
https://plotly.com/python/getting-started/

* The plotly Python library is an interactive, open-source plotting library that supports over chart types covering a wide range of statistical, financial, geographic or scientific use-cases
* Built on top of the Plotly JavaScript library (plotly.js)
* Plotly enables Python users to create **interactive web-based visualizations** that can be displayed in Jupyter notebooks

## 📒 Here we have some examples of commonly used plots
* 💡 Express API is easy to grasp and it is very similar to Seaborn

## Scatter plot

In [None]:
df = px.data.iris()
df.head()

In [None]:
px.scatter(df, x="sepal_width", y="sepal_length", color="species", symbol="species")

## Line plot

In [None]:
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()

In [None]:
px.line(df, x='year', y='lifeExp', color='country', markers=True)

## Bar plot

In [None]:
df = px.data.medals_long()
df.head()

In [None]:
px.bar(df, x="medal", y="count", color="nation", text="nation", barmode='group')

## Box plot

In [None]:
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()

In [None]:
px.box(df, x='country', color="country", y="lifeExp")

## Heatmap

In [None]:
df = px.data.iris()
df.head()

In [None]:
df_corr = df.iloc[:, :-2].corr()
df_corr

In [None]:
fig = px.imshow(df_corr, text_auto=True, color_continuous_scale="blues", aspect="auto")
fig.update_xaxes(side="bottom")
fig.show()

## 📌 Parallel categories diagram
* How to read it?

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/master/datasets/titanic.csv', index_col=0)
df.head()

In [None]:
px.parallel_categories(df, dimensions=['Embarked', 'Sex', 'Survived'], color="Survived", color_continuous_scale=px.colors.diverging.Spectral)

![meme02](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_02.jpg?raw=true)m

# 🚀 Optuna
https://optuna.org/

* An open source hyperparameter optimization framework to automate hyperparameter search
* You can use it with any machine learning or deep learning framework
    * Scikit-learn, TF2, PyTorch, Keras, ...
 


## ⚡ Using Optuna is very simple
* You just need to define the `objective` which will be used for each trial
* Then you define the parameter ranges through `suggest_XYZ` function and use is as a regular parameter
* After that you can start tuning the parameters

In [None]:
X, y = skd.load_iris(return_X_y=True, as_frame=True)

In [None]:
X.head()

In [None]:
y.head()

In [None]:
def objective(trial, X, y):
    n_estimators = trial.suggest_int('n_estimators', 2, 20)
    max_depth = int(trial.suggest_int('max_depth', 1, 32))
    criterion = trial.suggest_categorical('criterion', ["gini", "entropy"])
    random_state = 13
    
    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion, random_state=random_state)

    acc_scorer = make_scorer(accuracy_score)
    cv_res = cross_val_score(clf, X, y, n_jobs=-1, cv=5, scoring=acc_scorer)

    return np.mean(cv_res)

In [None]:
study = optuna.create_study(direction='maximize', storage="sqlite:///db.sqlite3", study_name="Iris-RF-Tuning")
study.optimize(lambda trial: objective(trial, X, y), n_trials=100)

trial = study.best_trial

print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

## 💡 Dashboard
* Logs are hard to read - it is usually better to vizualize the tuning process
* You have two options with `Optuna`
    * You can use the basic online tool https://optuna.github.io/optuna-dashboard/
    * You can run local instance of https://github.com/optuna/optuna-dashboard for more advanced usage

![meme03](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_03.jpg?raw=true)

# ⚡ Model deploy
* How are ML/DL models used in production?
    * Do we train it every time from scratch?
* How would you deploy the model?

## Train the model on full data with the best parameter setup

In [None]:
params = study.best_trial.params
params

In [None]:
clf = RandomForestClassifier(**params, random_state=13)

In [None]:
clf.fit(X, y)

In [None]:
df_feat_imp = pd.DataFrame({'Feature': X.columns, 'Importance': clf.feature_importances_}).sort_values(by='Importance')
df_feat_imp

In [None]:
px.bar(df_feat_imp, y='Feature', x='Importance', orientation='h')

In [None]:
y_pred = clf.predict(X)
accuracy_score(y_true=y, y_pred=y_pred)

# Save the model using `joblib`
* There are other alternatives
    * https://skops.readthedocs.io/en/stable/
    * https://onnx.ai/sklearn-onnx/

In [None]:
filename = 'rf_best.bin'
joblib.dump(clf, filename)

# 📈 Load the model from disk 

In [None]:
loaded_model = joblib.load(filename)

## Check if everything works fine 🙂

In [None]:
y_pred = loaded_model.predict(X)
accuracy_score(y_true=y, y_pred=y_pred)

In [None]:
df_feat_imp = pd.DataFrame({'Feature': X.columns, 'Importance': loaded_model.feature_importances_}).sort_values(by='Importance')
df_feat_imp

In [None]:
px.bar(df_feat_imp, y='Feature', x='Importance', orientation='h')

![meme04](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/thats_all.jpg?raw=true)