# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Model Training</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/credit_scores/4_model_training.ipynb)


## 🗒️ This notebook is divided into the following sections:

1. Load the training data
2. Train the model
3. Register model in Hopsworks model registry

![part3](../../images/03_model.png) 

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [1]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/398
Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ✨ Load Training Data </span>

First, we'll need to fetch the training dataset that we created in the previous notebook.

To retrieve training dataset from Feature Store we retrieve **Feature View** using `FeatureStore.get_feature_view()` method.

Then we can use **Feature View** in order to retrieve **training dataset** using `FeatureView.get_train_test_split()` method.

In [2]:
feature_view = fs.get_feature_view(
    name = "credit_scores",
    version = 1
)

In [3]:
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(
    training_dataset_version = 1
)

In [4]:
X_train

Unnamed: 0,credit_active,credit_currency,days_credit,credit_day_overdue,days_credit_enddate,cnt_credit_prolong,amt_credit_sum,amt_credit_sum_debt,amt_credit_sum_overdue,credit_type,...,cnt_drawings_atm_current,cnt_drawings_current,cnt_drawings_other_current,cnt_drawings_pos_current,cnt_instalment_mature_cum,name_contract_status,sk_dpd,sk_dpd_def,sk_id_curr,previous_loan_counts
0,0,0,-0.354918,,0.043456,,0.030196,-0.411252,,1,...,-0.349365,-0.317357,,-0.297573,0.066469,1,4.726566,4.726566,1.384651,-1.514632
1,0,0,-0.354918,,0.043456,,0.030196,-0.411252,,1,...,-0.349365,-0.317357,,-0.297573,0.066469,1,4.726566,4.726566,1.384651,-1.514632
2,0,0,-0.102718,,0.483802,,0.177937,-0.152704,,1,...,0.601637,-0.261681,,-0.297573,0.066469,1,-0.211570,-0.211570,0.040524,-1.514632
3,0,0,-0.102718,,0.483802,,0.177937,-0.152704,,1,...,0.601637,-0.261681,,-0.297573,0.066469,1,-0.211570,-0.211570,0.040524,-1.514632
4,0,0,-0.102718,,0.483802,,0.177937,-0.152704,,1,...,0.601637,-0.261681,,-0.297573,0.066469,1,-0.211570,-0.211570,0.040524,-1.514632
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1092,1,0,1.160908,,-0.175138,,-0.696197,-0.606118,,0,...,1.552639,0.406428,,0.316242,-1.187386,1,-0.211570,-0.211570,1.680781,1.405493
1093,1,0,1.160908,,-0.175138,,-0.696197,-0.606118,,0,...,1.552639,0.406428,,0.316242,-1.187386,1,-0.211570,-0.211570,1.680781,1.405493
1094,1,0,1.160908,,-0.175138,,-0.696197,-0.606118,,0,...,1.552639,0.406428,,0.316242,-1.187386,1,-0.211570,-0.211570,1.680781,1.405493
1095,1,0,1.160908,,-0.175138,,-0.696197,-0.606118,,0,...,1.552639,0.406428,,0.316242,-1.187386,1,-0.211570,-0.211570,1.680781,1.405493


In [5]:
X_train.isna().sum().sum()

15358

In [6]:
X_train = X_train.dropna()
X_test = X_test.dropna()
X_train.isna().sum().sum()

0.0

In [9]:
X_train

Unnamed: 0,credit_active,credit_currency,days_credit,credit_day_overdue,days_credit_enddate,cnt_credit_prolong,amt_credit_sum,amt_credit_sum_debt,amt_credit_sum_overdue,credit_type,...,cnt_drawings_atm_current,cnt_drawings_current,cnt_drawings_other_current,cnt_drawings_pos_current,cnt_instalment_mature_cum,name_contract_status,sk_dpd,sk_dpd_def,sk_id_curr,previous_loan_counts


---
## <span style="color:#ff5f27;"> 🤖 Modeling</span>

### <span style="color:#ff5f27;">📝 Imports</span>

In [7]:
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

### <span style="color:#ff5f27;"> 🧑🏻‍🔬 RandomForestClassifier</span>

In [8]:
pos_class_weight = 0.9

model = RandomForestClassifier(
    n_estimators = 25,
    max_features = 'sqrt',
    class_weight = {0: 1.0 - pos_class_weight, 1: pos_class_weight},
    n_jobs = -1,
    random_state = 42
)

model.fit(X_train,y_train)

ValueError: Found array with 0 sample(s) (shape=(0, 118)) while a minimum of 1 is required by RandomForestClassifier.

In [None]:
preds = model.predict(X_test)

accuracy_score(y_test, preds)

## <span style='color:#ff5f27'>👮🏼‍♀️ Model Registry</span>

In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

In [None]:
import joblib

joblib.dump(model,'credit_scores_model.pkl')

In [None]:
model = mr.sklearn.create_model(
    name="credit_scores_model",
    metrics={"f1": "0.5"},
    description="Random Forest Classifier for Credit Scores Project",
    input_example=X_train.sample().to_numpy(),
    model_schema=model_schema
)

model.save('credit_scores_model.pkl')

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 05 </span>

In the following notebook, we will retrieve pretrained model from Model Registry and use it for prediction.