# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Predictions</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/{project_name}/{notebook_name}.ipynb)


## 🗒️ This notebook is divided into the following sections:

1. Loading the training data
2. Train the model
3. Register model in Hopsworks model registry

![part3](../../images/03_model.png) 

### <span style='color:#ff5f27'> 📝 Imports

In [15]:
import pandas as pd

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import f1_score

import warnings
warnings.filterwarnings("ignore")

---

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [16]:
import hopsworks

project = hopsworks.login() 

fs = project.get_feature_store() 

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Multiple projects found. 

	 (1) ID2223_Ernest
	 (2) ID2223_Anton

Enter project to access: 1

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5476
Connected. Call `.close()` to terminate connection gracefully.


---

## <span style="color:#ff5f27;">🪝 Feature View and Training Dataset Retrieval</span>

In [17]:
feature_view = fs.get_feature_view(
    name = 'poland_air_quality_fv',
    version = 1
)

In [None]:
train_data = feature_view.get_training_data(1)[0]

train_data.head()

---

## <span style="color:#ff5f27;">🧬 Modeling</span>

In [6]:
X = train_data.drop(columns=["date"]).fillna(0)
y = X.pop("aqi_next_day")

In [7]:
gb = GradientBoostingRegressor()
gb.fit(X, y)

GradientBoostingRegressor()

### <span style='color:#ff5f27'> 👨🏻‍⚖️ Model Validation

In [8]:
f1_score(y.astype('int'),[int(pred) for pred in gb.predict(X)],average='micro')

0.47368421052631576

In [9]:
y.iloc[4:10].values

array([10,  6,  0, 23,  4, 10])

In [10]:
pred_df = pd.DataFrame({
    'aqi_real': y.iloc[4:10].values,
    'aqi_pred': map(int, gb.predict(X.iloc[4:10]))
}
)
pred_df

Unnamed: 0,aqi_real,aqi_pred
0,10,9
1,6,6
2,0,0
3,23,22
4,4,3
5,10,9


---

## <span style='color:#ff5f27'>🗄 Model Registry</span>

One of the features in Hopsworks is the model registry. This is where you can store different versions of models and compare their performance. Models from the registry can then be served as API endpoints.

In [11]:
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


### <span style="color:#ff5f27;">⚙️ Model Schema</span>

The model needs to be set up with a [Model Schema](https://docs.hopsworks.ai/machine-learning-api/latest/generated/model_schema/), which describes the inputs and outputs for a model.

A Model Schema can be automatically generated from training examples, as shown below.

In [12]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X)
output_schema = Schema(y)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

{'input_schema': {'columnar_schema': [{'name': 'city', 'type': 'int64'},
   {'name': 'aqi', 'type': 'int64'},
   {'name': 'iaqi_h', 'type': 'float64'},
   {'name': 'iaqi_p', 'type': 'float64'},
   {'name': 'iaqi_pm10', 'type': 'float64'},
   {'name': 'iaqi_t', 'type': 'float64'},
   {'name': 'o3_avg', 'type': 'float64'},
   {'name': 'o3_max', 'type': 'float64'},
   {'name': 'o3_min', 'type': 'float64'},
   {'name': 'pm10_avg', 'type': 'float64'},
   {'name': 'pm10_max', 'type': 'float64'},
   {'name': 'pm10_min', 'type': 'float64'},
   {'name': 'pm25_avg', 'type': 'float64'},
   {'name': 'pm25_max', 'type': 'float64'},
   {'name': 'pm25_min', 'type': 'float64'},
   {'name': 'tempmax', 'type': 'float64'},
   {'name': 'tempmin', 'type': 'float64'},
   {'name': 'temp', 'type': 'float64'},
   {'name': 'feelslikemax', 'type': 'float64'},
   {'name': 'feelslikemin', 'type': 'float64'},
   {'name': 'feelslike', 'type': 'float64'},
   {'name': 'dew', 'type': 'float64'},
   {'name': 'humidity',

In [13]:
import joblib

joblib.dump(gb, 'model.pkl')

['model.pkl']

In [14]:
model = mr.sklearn.create_model(
    name="gradient_boost_model",
    metrics={"f1": "0.5"},
    description="Gradient Boost Regressor.",
    input_example=X.sample().to_numpy(),
    model_schema=model_schema
)

model.save('model.pkl')

  0%|          | 0/6 [00:00<?, ?it/s]

Model created, explore it at https://c.app.hopsworks.ai:443/p/5476/models/gradient_boost_model/2


Model(name: 'gradient_boost_model', version: 2)