# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Predictions</span>


## 🗒️ In this notebook we will see how to create a training dataset from the feature groups: 

1. Loading the training data.
2. Train the model.
3. Register model in Hopsworks model registry.

![part3](images/03_model.png) 

## <span style='color:#ff5f27'> 📝 Imports

In [None]:
import pandas as pd

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import f1_score

import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login() 

fs = project.get_feature_store() 

## <span style="color:#ff5f27;"> 🪝 Feature View and Training Dataset Retrieval </span>

In [None]:
feature_view = fs.get_feature_view(
    name = 'air_quality_fv',
    version = 1
)

In [None]:
train_data = feature_view.get_training_data(1)[0]

train_data.head()

---
## <span style="color:#ff5f27;"> 🤖 GradientBossing model </span>

In [None]:
train_data = train_data.sort_values(by=["date", 'city'], ascending=[False, True]).reset_index(drop=True)
train_data["aqi_next_day"] = train_data.groupby('city')['aqi'].shift(1)

train_data.head(5)

In [None]:
X = train_data.drop(columns=["date"]).dropna()
y = X.pop("aqi_next_day")

### <span style='color:#ff5f27'> 🧑🏻‍🔬 Model Fitting

In [None]:
gb = GradientBoostingRegressor()
gb.fit(X, y)

### <span style='color:#ff5f27'> 👨🏻‍⚖️ Model Validation

In [None]:
f1_score(y.astype('int'),[int(pred) for pred in gb.predict(X)],average='micro')

In [None]:
pred_df = pd.DataFrame({
    'aqi_real':y.iloc[:2].values,
    'aqi_pred': map(int,gb.predict(X.iloc[:2]))
    },
    index=["kyiv", "stockholm"]
)
pred_df

## <span style='color:#ff5f27'>👮🏼‍♀️ Model Registry</span>

In [None]:
mr = project.get_model_registry()

In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X)
output_schema = Schema(y)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

In [None]:
import joblib

joblib.dump(gb, 'model.pkl')

In [None]:
model = mr.sklearn.create_model(
    name="gradient_boost_model",
    metrics={"f1": "0.5"},
    description="Gradient Boost Regressor.",
    input_example=X.sample(),
    model_schema=model_schema
)

model.save('model.pkl')

---