# Demand Feature Pipeline Notebook

This notebook processes demand data and uploads it to the Hopsworks feature store. It replicates the functionality of the feature_pipeline.py script in an interactive format.

In [None]:
# Import necessary libraries
import pandas as pd
import hopsworks
import os
from datetime import datetime
from dotenv import load_dotenv

## Load Environment Variables

We'll load environment variables for Hopsworks connection credentials.

In [None]:
# Load environment variables
load_dotenv()

# Configure parameters (these can be modified as needed)
project_name = 'many_models'
feature_group_name = 'demand_features'
version = 1

## Connect to Hopsworks Feature Store

Establish connection to the Hopsworks Feature Store using credentials from environment variables.

In [None]:
print("Connecting to Hopsworks Feature Store")

# Connect to Hopsworks
project = hopsworks.login(
    host=os.getenv("HOST"),
    port=os.getenv("PORT"),
    api_key_value=os.getenv("HOPSWORKS_API_KEY"),
    project=project_name or os.getenv("PROJECT")
)

fs = project.get_feature_store()
print(f"Connected to feature store in project: {project_name}")

## Load Source Data

Load the demand data from CSV file and prepare it for the feature store.

In [None]:
print("Loading source data")
demand_df = pd.read_csv('../data/demand_qty_item_loc.csv')

# Display first few rows to inspect the data
display(demand_df.head())

In [None]:
# Convert column headers to match the data model
demand_df.columns = ['sp_id', 'loc_id', 'time_bucket', 'repetitive_demand_quantity']

# Add datetime column for feature store
demand_df['datetime'] = datetime.now()

# Display the transformed dataframe
display(demand_df.head())

## Analyze Data Dimensions

Explore the dimensionality of our demand data.

In [None]:
# Analyze data dimensions
unique_items = demand_df['sp_id'].nunique()
unique_locations = demand_df['loc_id'].nunique()
unique_time_periods = demand_df['time_bucket'].nunique()

print(f"ðŸš€ Found {unique_items} items Ã— {unique_locations} locations Ã— {unique_time_periods} time periods")

# Optional: Create a more detailed summary
summary = {
    "Items": unique_items,
    "Locations": unique_locations, 
    "Time Periods": unique_time_periods,
    "Total Records": len(demand_df)
}
pd.DataFrame([summary]).T.rename(columns={0: "Count"})

## Create Feature Group and Upload Data

Define the feature group schema and upload the prepared data to the feature store.

In [None]:
print("â¬† Creating/getting feature group")
# Define the feature group
demand_fg = fs.get_or_create_feature_group(
    name=feature_group_name,
#    version=version, # Uncomment if you want to specify a version else it will be auto-incremented
    description="Item demand by location and time",
    primary_key=['sp_id', 'loc_id', 'time_bucket'],
    event_time='datetime',
)

In [None]:
print("â¬† Uploading data to the Feature Store")
# Upload data to the feature store
demand_fg.insert(demand_df, write_options={"wait_for_job": True})
print("Feature pipeline completed successfully")

## Verify Feature Group 

Retrieve and inspect the feature group schema and data to verify successful upload.

In [None]:
# Retrieve feature descriptions
print("Feature Group Schema:")
display(demand_fg.describe())

In [None]:
# Sample data from the feature group
print("Sample Data from Feature Group:")
sample_data = demand_fg.read()
display(sample_data.head(10))