# Demand Feature Pipeline Notebook

This notebook processes demand data and uploads it to the Hopsworks feature store. It replicates the functionality of the feature_pipeline.py script in an interactive format.

In [1]:
# Import necessary libraries
import pandas as pd
import hopsworks
import os
from datetime import datetime
from dotenv import load_dotenv

## Load Environment Variables

We'll load environment variables for Hopsworks connection credentials.

In [2]:
# Load environment variables
load_dotenv()

# Configure parameters (these can be modified as needed)
project_name = 'many_models'
feature_group_name = 'demand_features'
version = 1

## Connect to Hopsworks Feature Store

Establish connection to the Hopsworks Feature Store using credentials from environment variables.

In [3]:
print("Connecting to Hopsworks Feature Store")

# Connect to Hopsworks
project = hopsworks.login(
    project="test2"
)

fs = project.get_feature_store()
print(f"Connected to feature store in project: {project_name}")

Connecting to Hopsworks Feature Store
2025-05-09 10:14:21,166 INFO: Initializing external client
2025-05-09 10:14:21,166 INFO: Base URL: https://10.87.43.175:28181
2025-05-09 10:14:22,897 INFO: Python Engine initialized.

Logged in to project, explore it here https://10.87.43.175:28181/p/123
Connected to feature store in project: many_models


## Load Source Data

Load the demand data from CSV file and prepare it for the feature store.

In [4]:
print("Loading source data")
demand_df = pd.read_csv('../data/demand_qty_item_loc.csv')

# Display first few rows to inspect the data
display(demand_df.head())

Loading source data


Unnamed: 0,sp_id,loc_id,time_bucket,repetitive_demand_quantity
0,9684698,3,202104,55.0
1,9684698,3,202105,117.0
2,9684698,3,202106,62.0
3,9684698,3,202107,45.0
4,9684698,3,202108,77.0


In [5]:

# Add datetime column for feature store
demand_df['datetime'] = datetime.now()

# Display the transformed dataframe
display(demand_df.head())

Unnamed: 0,sp_id,loc_id,time_bucket,repetitive_demand_quantity,datetime
0,9684698,3,202104,55.0,2025-05-09 10:14:27.519113
1,9684698,3,202105,117.0,2025-05-09 10:14:27.519113
2,9684698,3,202106,62.0,2025-05-09 10:14:27.519113
3,9684698,3,202107,45.0,2025-05-09 10:14:27.519113
4,9684698,3,202108,77.0,2025-05-09 10:14:27.519113


## Create Feature Group and Upload Data

Define the feature group schema and upload the prepared data to the feature store.

In [6]:
print("⬆ Creating/getting feature group")
# Define the feature group
demand_fg = fs.get_or_create_feature_group(
    name=feature_group_name,
    version=version,
    description="Item demand by location and time",
    primary_key=['sp_id', 'loc_id', 'time_bucket'],
    event_time='datetime',
)

⬆ Creating/getting feature group


In [7]:
print("⬆ Uploading data to the Feature Store")
# Upload data to the feature store
demand_fg.insert(demand_df, write_options={"wait_for_job": True})
print("Feature pipeline completed successfully")

⬆ Uploading data to the Feature Store
Feature Group created successfully, explore it at 
https://10.87.43.175:28181/p/123/fs/71/fg/47


Uploading Dataframe: 100.00% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| Rows 9600/9600 | Elapsed Time: 00:07 | Remaining Time: 00:00


Launching job: demand_features_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://10.87.43.175:28181/p/123/jobs/named/demand_features_1_offline_fg_materialization/executions
2025-05-09 10:14:55,180 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-05-09 10:14:58,413 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-05-09 10:16:23,004 INFO: Waiting for execution to finish. Current state: SUCCEEDING. Final status: UNDEFINED
2025-05-09 10:16:29,486 INFO: Waiting for execution to finish. Current state: FINISHED. Final status: SUCCEEDED
2025-05-09 10:16:29,984 INFO: Waiting for log aggregation to finish.
2025-05-09 10:16:29,986 INFO: Execution finished successfully.
Feature pipeline completed successfully
