# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Feature Backfill</span>

## 🗒️ This notebook is divided into the following sections:
1. Connect to the Hopsworks AI Lakehouse.
2. Fetch users and sites
3. Create feature groups and insert them to the feature store.

![tutorial-flow](images/01_featuregroups.png)

In [2]:
# connect to Hopsworks

import hopsworks

project = hopsworks.login()
fs = project.get_feature_store()

2024-09-18 22:17:24,728 INFO: Python Engine initialized.

Logged in to project, explore it here https://hopsworks0.logicalclocks.com/p/119


In [3]:
# other imports

import pandas as pd

#### Fetch users

In [20]:
users_df = pd.read_csv("users.csv")

In [22]:
users_df.reset_index(drop=True, inplace=True)
users_df = users_df.astype(str)
users_df["uid"] = users_df["uid"].astype(int)

#### Fetch sites

In [None]:
sites_df = pd.read_json("https://transport.integration.sl.se/v1/sites")

In [None]:
# Feature engineering:
# - Drop unnecessary columns
# - Clean up null values
# - Fill missing values in relevant columns

sites_df.reset_index(drop=True, inplace=True)
sites_df["valid_from"] = sites_df['valid'].apply(lambda x: x['from'])
sites_df.drop(labels=["gid", "abbreviation", "alias", "valid"], axis=1, inplace=True)
sites_df.dropna(subset=['lat'], inplace=True)
sites_df.dropna(subset=['lon'], inplace=True)
sites_df.fillna({'note': 'n/d'}, inplace=True)

#### Create Feature Groups and write features into Hopsworks

In [23]:
users_fg = fs.get_or_create_feature_group(
    name="users",
    description="Users information including home location, office location and whether they are time optimists or not.",
    version=1,
    primary_key=["uid"],
    online_enabled=True
)

users_fg.insert(users_df)

Uploading Dataframe: 100.00% |██████████| Rows 5/5 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: users_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://hopsworks0.logicalclocks.com/p/119/jobs/named/users_1_offline_fg_materialization/executions


(Job('users_1_offline_fg_materialization', 'SPARK'), None)

In [6]:
sites_fg = fs.get_or_create_feature_group(
    name="sites",
    description="Sites information including ID, latitude and longitud.",
    version=1,
    primary_key=["id"],
    online_enabled=False
)
sites_fg.save(sites_df)

Feature Group created successfully, explore it at 
https://hopsworks0.logicalclocks.com/p/119/fs/67/fg/14


Uploading Dataframe: 100.00% |██████████| Rows 6470/6470 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: sites_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://hopsworks0.logicalclocks.com/p/119/jobs/named/sites_1_offline_fg_materialization/executions


(Job('sites_1_offline_fg_materialization', 'SPARK'), None)

In [None]:
# def convert_site_id(site_id):
#     return int(str(site_id)[-5:])

# sites_df[sites_df["id"] == convert_site_id(300109192)]