# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## 🗒️ This notebook is divided into the following sections:

1. Load batch data.
2. Retrieve your trained model from the Model Registry.
3. Load batch data.
4. Predict batch data.

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import joblib
import datetime
import pandas as pd

## <span style="color:#ff5f27;"> 📡 Connect to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5242
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>

In [3]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1,
)

## <span style="color:#ff5f27;">🗄 Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;">🪝 Retrieve model from Model Registry</span>

In [5]:
# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

Downloading model artifact (0 dirs, 6 files)... DONE

In [6]:
# Load the XGBoost regressor model and label encoder from the saved model directory
retrieved_xgboost_model = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
retrieved_encoder = joblib.load(saved_model_dir + "/label_encoder.pkl")

# Display the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style="color:#ff5f27;">✨ Load Batch Data of last days</span>

First, you will need to fetch the training dataset that you created in the previous notebook.

In [7]:
# Get the current date
today = datetime.date.today()

# Calculate a date threshold 30 days ago from the current date
date_threshold = today - datetime.timedelta(days=30)

# Convert the date threshold to a string format
str(date_threshold)

'2024-02-11'

In [8]:
# Initialize batch scoring
feature_view.init_batch_scoring(1)

# Retrieve batch data from the feature view with a start time set to the date threshold
batch_data = feature_view.get_batch_data(
    start_time=date_threshold,
)

Finished: Reading data from Hopsworks, using ArrowFlight (7.68s) 


### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [9]:
# Transform the 'city_name' column in the batch data using the retrieved label encoder
encoded = retrieved_encoder.transform(batch_data['city_name'])

# Concatenate the label-encoded 'city_name' with the original batch data
X_batch = pd.concat([batch_data, pd.DataFrame(encoded)], axis=1)

# Drop unnecessary columns ('date', 'city_name', 'unix_time') from the batch data
X_batch = X_batch.drop(columns=['date', 'city_name', 'unix_time'])

# Rename the newly added column with label-encoded city names to 'city_name_encoded'
X_batch = X_batch.rename(columns={0: 'city_name_encoded'})

# Extract the target variable 'pm2_5' from the batch data
y_batch = X_batch.pop('pm2_5')

X_batch.head(3)

Unnamed: 0,pm_2_5_previous_1_day,pm_2_5_previous_2_day,pm_2_5_previous_3_day,pm_2_5_previous_4_day,pm_2_5_previous_5_day,pm_2_5_previous_6_day,pm_2_5_previous_7_day,mean_7_days,mean_14_days,mean_28_days,...,temperature_max,temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,wind_speed_max,wind_gusts_max,wind_direction_dominant,city_name_encoded
0,5.3,16.2,16.3,15.4,25.0,17.2,10.2,15.085714,15.242857,12.939286,...,14.2,8.4,0.0,0.0,0.0,0.0,34.7,59.4,302,20
1,12.2,17.5,12.1,14.5,12.3,16.8,14.3,14.242857,19.857143,18.432143,...,13.1,10.5,20.9,20.9,0.0,22.0,40.1,64.8,212,24
2,11.6,8.0,18.2,12.7,9.9,6.8,12.5,11.385714,9.921429,8.271429,...,10.8,5.2,9.1,13.65,0.0,5.0,14.8,43.2,180,30


In [10]:
# Make predictions on the batch data using the retrieved XGBoost regressor model
predictions = retrieved_xgboost_model.predict(X_batch)

# Display the first 5 predictions
predictions[:5]

array([ 5.9190893,  6.7028375,  7.9574   , 15.73646  ,  8.050383 ],
      dtype=float32)

---
## <span style="color:#ff5f27;">👾 Now try out the Streamlit App!</span>

In [11]:
# !python3 -m streamlit run streamlit_app.py

---

### <span style="color:#ff5f27;">🥳 <b> Next Steps  </b> </span>
Congratulations you've now completed the Air Quality tutorial for Managed Hopsworks.

Check out our other tutorials on ➡ https://github.com/logicalclocks/hopsworks-tutorials

Or documentation at ➡ https://docs.hopsworks.ai