# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Model Training</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/{project_name}/{notebook_name}.ipynb)


## 🗒️ This notebook is divided into the following sections:

1. Load the training data
2. Train the model
3. Register model in Hopsworks model registry

![part3](../../images/03_model.png) 

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 
mr = project.get_model_registry()

## <span style="color:#ff5f27;"> ✨ Load Training Data </span>

First, we'll need to fetch the training dataset that we created in the previous notebook.

To retrieve training dataset from Feature Store we retrieve **Feature View** using `FeatureStore.get_feature_view()` method.

Then we can use **Feature View** in order to retrieve **training dataset** using `FeatureView.get_train_test_split()` method.

In [None]:
feature_view = fs.get_feature_view(
    name = "credit_scores",
    version = 1
)

In [None]:
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(
    training_dataset_version = 1
)

---
## <span style="color:#ff5f27;"> 🤖 Modeling</span>

### <span style="color:#ff5f27;">📝 Imports</span>

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

### <span style="color:#ff5f27;"> 🧑🏻‍🔬 RandomForestClassifier</span>

In [None]:
pos_class_weight = 0.9

model = RandomForestClassifier(
    n_estimators = 25,
    max_features = 'sqrt',
    class_weight = {0: 1.0 - pos_class_weight, 1: pos_class_weight},
    n_jobs = -1,
    random_state = 42
)

model.fit(X_train,y_train)

In [None]:
preds = model.predict(X_test)

accuracy_score(y_test, preds)

## <span style='color:#ff5f27'>👮🏼‍♀️ Model Registry</span>

In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

In [None]:
import joblib

joblib.dump(model,'credit_scores_model.pkl')

In [None]:
model = mr.sklearn.create_model(
    name="credit_scores_model",
    metrics={"f1": "0.5"},
    description="Random Forest Classifier for Credit Scores Project",
    input_example=X_train.sample(),
    model_schema=model_schema
)

model.save('credit_scores_model.pkl')

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 05 </span>

In the following notebook, we will retrieve pretrained model from Model Registry and use it for prediction.