# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Model Training</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/credit_scores/4_model_training.ipynb)


## 🗒️ This notebook is divided into the following sections:

1. Load the training data
2. Train the model
3. Register model in Hopsworks model registry

![part3](../../images/03_model.png) 

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [1]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/398
Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ✨ Load Training Data </span>

First, we'll need to fetch the training dataset that we created in the previous notebook.

To retrieve training dataset from Feature Store we retrieve **Feature View** using `FeatureStore.get_feature_view()` method.

Then we can use **Feature View** in order to retrieve **training dataset** using `FeatureView.get_train_test_split()` method.

In [2]:
feature_view = fs.get_feature_view(
    name = "credit_scores",
    version = 1
)

In [32]:
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(
    training_dataset_version = 1
)

In [33]:
X_train

Unnamed: 0,credit_active,credit_currency,days_credit,credit_day_overdue,days_credit_enddate,cnt_credit_prolong,amt_credit_sum,amt_credit_sum_debt,amt_credit_sum_overdue,credit_type,...,cnt_drawings_atm_current,cnt_drawings_current,cnt_drawings_other_current,cnt_drawings_pos_current,cnt_instalment_mature_cum,name_contract_status,sk_dpd,sk_dpd_def,sk_id_curr,previous_loan_counts
0,0,0,-0.392316,,-0.330261,,0.073673,0.076460,,4,...,0.407915,-0.218790,,-0.319551,0.460995,1,,,0.649401,-0.898296
1,0,0,-0.392316,,-0.330261,,0.073673,0.076460,,4,...,0.407915,-0.218790,,-0.319551,0.460995,1,,,0.649401,-0.898296
2,0,0,-0.392316,,-0.330261,,0.073673,0.076460,,4,...,0.407915,-0.218790,,-0.319551,0.460995,1,,,0.649401,-0.898296
3,0,0,-0.392316,,-0.330261,,0.073673,0.076460,,4,...,0.407915,-0.218790,,-0.319551,0.460995,1,,,0.649401,-0.898296
4,0,0,-0.392316,,-0.330261,,0.073673,0.076460,,4,...,0.407915,-0.218790,,-0.319551,0.460995,1,,,0.649401,-0.898296
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3350,1,0,1.321252,,-0.315577,,-0.251525,-0.168686,,1,...,-0.420006,-0.424635,,-0.319551,0.077404,1,,,-0.239804,0.513312
3351,1,0,1.321252,,-0.315577,,-0.251525,-0.168686,,1,...,-0.420006,-0.424635,,-0.319551,0.077404,1,,,-0.239804,0.513312
3352,1,0,1.321252,,-0.315577,,-0.251525,-0.168686,,1,...,-0.420006,-0.424635,,-0.319551,0.077404,1,,,-0.239804,0.513312
3353,1,0,1.321252,,-0.315577,,-0.251525,-0.168686,,1,...,-0.420006,-0.424635,,-0.319551,0.077404,1,,,-0.239804,0.513312


In [34]:
# def get_cols_with_missing_values(df):
#     cols = []
#     for item in df.columns:
#         if df[item].isna().sum() > 0:
#             cols.append(item)        
#     return cols

# cols = get_cols_with_missing_values(X_train)
# X_train = X_train.drop(columns=cols)
# X_test = X_test.drop(columns=cols)

0

---
## <span style="color:#ff5f27;"> 🤖 Modeling</span>

### <span style="color:#ff5f27;">📝 Imports</span>

In [38]:
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

### <span style="color:#ff5f27;"> 🧑🏻‍🔬 RandomForestClassifier</span>

In [39]:
pos_class_weight = 0.9

model = RandomForestClassifier(
    n_estimators = 25,
    max_features = 'sqrt',
    class_weight = {0: 1.0 - pos_class_weight, 1: pos_class_weight},
    n_jobs = -1,
    random_state = 42
)

model.fit(X_train,y_train)



In [40]:
preds = model.predict(X_test)

accuracy_score(y_test, preds)

1.0

## <span style='color:#ff5f27'>👮🏼‍♀️ Model Registry</span>

In [41]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

{'input_schema': {'columnar_schema': [{'name': 'credit_active',
    'type': 'int64'},
   {'name': 'credit_currency', 'type': 'int64'},
   {'name': 'days_credit', 'type': 'float64'},
   {'name': 'days_credit_enddate', 'type': 'float64'},
   {'name': 'amt_credit_sum', 'type': 'float64'},
   {'name': 'amt_credit_sum_debt', 'type': 'float64'},
   {'name': 'credit_type', 'type': 'int64'},
   {'name': 'days_credit_update', 'type': 'float64'},
   {'name': 'name_contract_type', 'type': 'int64'},
   {'name': 'code_gender', 'type': 'int64'},
   {'name': 'flag_own_car', 'type': 'int64'},
   {'name': 'flag_own_realty', 'type': 'int64'},
   {'name': 'cnt_children', 'type': 'float64'},
   {'name': 'amt_income_total', 'type': 'float64'},
   {'name': 'amt_annuity', 'type': 'float64'},
   {'name': 'amt_goods_price', 'type': 'float64'},
   {'name': 'name_type_suite', 'type': 'int64'},
   {'name': 'name_income_type', 'type': 'int64'},
   {'name': 'name_education_type', 'type': 'int64'},
   {'name': 'name

In [42]:
import joblib

joblib.dump(model,'credit_scores_model.pkl')

['credit_scores_model.pkl']

In [43]:
model = mr.sklearn.create_model(
    name="credit_scores_model",
    metrics={"f1": "0.5"},
    description="Random Forest Classifier for Credit Scores Project",
    input_example=X_train.sample().to_numpy(),
    model_schema=model_schema
)

model.save('credit_scores_model.pkl')

  0%|          | 0/6 [00:00<?, ?it/s]

Model created, explore it at https://c.app.hopsworks.ai:443/p/398/models/credit_scores_model/1


Model(name: 'credit_scores_model', version: 1)

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 05 </span>

In the following notebook, we will retrieve pretrained model from Model Registry and use it for prediction.