# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Predictions</span>


## 🗒️ In this notebook we will see how to create a training dataset from the feature groups: 

1. Loading the training data.
2. Train the model.
3. Register model in Hopsworks model registry.

![part3](images/03_model.png) 

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import pandas as pd

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import f1_score

import warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login() 

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/167




Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> 🪝 Feature View and Training Dataset Retrieval </span>

In [3]:
feature_view = fs.get_feature_view(
    name = 'air_quality_fv',
    version = 1
)

In [4]:
train_data = feature_view.get_training_data(1)[0]

train_data.head()

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,...,windgust,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,conditions
0,2,6,1663372800000,0.15132,-1.493063,-0.509993,-0.447406,0.090878,-0.879904,0.165723,...,0.596078,0.898881,1.495199,-1.30442,0.655855,0.509843,-0.929005,-0.908356,-0.713147,1
1,2,16,1663027200000,0.167835,-0.502415,-0.083022,0.482885,-0.142809,0.033842,-0.192239,...,1.162439,0.461859,-0.946659,-0.719795,1.160345,-1.379013,-1.617922,-1.629403,-1.845793,2
2,1,4,1663459200000,-0.977159,-0.460792,-0.296508,-0.019975,-1.077559,-1.10834,-0.371219,...,-0.111874,-0.231348,1.271256,-0.788574,-0.260959,0.223652,0.116486,0.133156,0.419498,0
3,3,3,1662940800000,1.109152,-0.061202,-0.936964,-0.092889,-1.544933,-1.10834,-1.445104,...,-0.874888,-0.939625,0.474173,-0.143768,1.121538,0.652938,-1.332316,-1.288908,-1.27947,2
4,0,23,1662584400000,-0.052357,0.413309,-0.083022,0.332027,1.960378,1.861335,1.597569,...,1.988383,1.818134,-1.562817,0.475246,0.869293,0.509843,-0.319712,-0.347542,0.419498,2


---
## <span style="color:#ff5f27;"> 🤖 GradientBossing model </span>

In [5]:
train_data = train_data.sort_values(by=["date", 'city'], ascending=[False, True]).reset_index(drop=True)
train_data["aqi_next_day"] = train_data.groupby('city')['aqi'].shift(1)

train_data.head(5)

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,...,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,conditions,aqi_next_day
0,0,7,1663718400000,0.454083,1.195838,-0.509993,-2.081702,-0.843871,-0.879904,-0.371219,...,-0.894416,-0.016729,1.352183,-2.705794,0.481224,1.070238,1.054494,0.985821,3,
1,1,9,1663718400000,1.114657,1.28741,0.77092,-2.659991,-1.077559,-1.565213,-0.013258,...,-1.647902,0.751254,1.249014,-2.429295,0.481224,0.58038,0.553767,0.419498,0,
2,2,6,1663718400000,0.013701,1.11259,-0.296508,-1.905701,-0.376496,-1.336777,0.165723,...,-1.075253,0.376752,1.137247,-0.746045,0.481224,0.062827,0.05304,-0.146824,0,
3,0,19,1663632000000,0.178844,0.413309,-0.083022,-0.39712,-0.376496,-0.651467,-0.192239,...,-0.186139,1.490138,0.715974,-1.750174,0.481224,0.592497,0.573796,0.419498,0,7.0
4,1,4,1663632000000,0.399035,0.371686,-0.296508,-0.774265,-1.077559,-0.423031,-0.5502,...,0.41665,1.587559,0.509636,-0.202748,0.509843,-0.470305,-0.487745,-0.146824,1,9.0


In [6]:
X = train_data.drop(columns=["date"]).dropna()
y = X.pop("aqi_next_day")

### <span style='color:#ff5f27'> 🧑🏻‍🔬 Model Fitting

In [7]:
gb = GradientBoostingRegressor()
gb.fit(X, y)

### <span style='color:#ff5f27'> 👨🏻‍⚖️ Model Validation

In [8]:
f1_score(y.astype('int'),[int(pred) for pred in gb.predict(X)],average='micro')

0.5

In [9]:
pred_df = pd.DataFrame({
    'aqi_real':y.iloc[:2].values,
    'aqi_pred': map(int,gb.predict(X.iloc[:2]))
    },
    index=["kyiv", "stockholm"]
)
pred_df

Unnamed: 0,aqi_real,aqi_pred
kyiv,7,7
stockholm,9,8


## <span style='color:#ff5f27'>👮🏼‍♀️ Model Registry</span>

In [10]:
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


In [11]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X)
output_schema = Schema(y)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

{'input_schema': {'columnar_schema': [{'name': 'city', 'type': 'object'},
   {'name': 'aqi', 'type': 'object'},
   {'name': 'iaqi_h', 'type': 'float64'},
   {'name': 'iaqi_p', 'type': 'float64'},
   {'name': 'iaqi_pm10', 'type': 'float64'},
   {'name': 'iaqi_t', 'type': 'float64'},
   {'name': 'o3_avg', 'type': 'float64'},
   {'name': 'o3_max', 'type': 'float64'},
   {'name': 'o3_min', 'type': 'float64'},
   {'name': 'pm10_avg', 'type': 'float64'},
   {'name': 'pm10_max', 'type': 'float64'},
   {'name': 'pm10_min', 'type': 'float64'},
   {'name': 'pm25_avg', 'type': 'float64'},
   {'name': 'pm25_max', 'type': 'float64'},
   {'name': 'pm25_min', 'type': 'float64'},
   {'name': 'uvi_avg', 'type': 'float64'},
   {'name': 'uvi_max', 'type': 'float64'},
   {'name': 'uvi_min', 'type': 'float64'},
   {'name': 'tempmax', 'type': 'float64'},
   {'name': 'tempmin', 'type': 'float64'},
   {'name': 'temp', 'type': 'float64'},
   {'name': 'feelslikemax', 'type': 'float64'},
   {'name': 'feelslikemi

In [12]:
import joblib

joblib.dump(gb, 'model.pkl')

['model.pkl']

In [13]:
model = mr.sklearn.create_model(
    name="gradient_boost_model",
    metrics={"f1": "0.5"},
    description="Gradient Boost Regressor.",
    input_example=X.sample(),
    model_schema=model_schema
)

model.save('model.pkl')

  0%|          | 0/6 [00:00<?, ?it/s]

Model created, explore it at https://c.app.hopsworks.ai:443/p/167/models/gradient_boost_model/1


Model(name: 'gradient_boost_model', version: 1)

---