# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>


## 🗒️ This notebook is divided in 3 sections:
1. Loading the data 
2. Connect to the Hopsworks feature store,
3. Create feature groups and insert them to the feature store.

![tutorial-flow](images/01_featuregroups.png)

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import pandas as pd

## <span style='color:#ff5f27'> 💽 Loading Historical Data</span>


#### <span style='color:#ff5f27'> 🚖 Rides Data

In [3]:
df_rides = pd.read_csv('data/rides500.csv', index_col=0)

df_rides.head()

Unnamed: 0,ride_id,pickup_datetime,pickup_longitude,dropoff_longitude,pickup_latitude,dropoff_latitude,passenger_count,taxi_id,driver_id,distance,pickup_distance_to_jfk,dropoff_distance_to_jfk,pickup_distance_to_ewr,dropoff_distance_to_ewr,pickup_distance_to_lgr,dropoff_distance_to_lgr,year,weekday,hour
0,9e68bedf5543e631ae5913e4f1582aae,1592984700000,-73.36097,-73.90945,41.32026,41.67243,3,161,153,37.386229,51.711501,71.571093,60.81615,69.298319,46.087372,61.902562,2020,2,7
1,2b4ccdbf3227b7ee69570987a3a13292,1578172300000,-72.89369,-73.73435,40.56219,40.56483,4,34,148,44.125785,46.716115,5.760495,67.738962,24.636067,53.472777,16.378563,2020,5,21
2,3ea79ec54a53315835f1409fce08dae5,1598876300000,-73.3931,-73.92265,41.70746,40.63226,3,163,133,79.229918,76.336996,7.604696,81.222024,13.779885,68.97884,10.313417,2020,0,12
3,cd44a15a227ecbdafadf2fe96d2abce8,1586180900000,-74.00832,-74.00135,41.45902,41.74677,2,57,28,19.884865,57.758348,77.257408,53.868622,73.602454,47.645692,67.337151,2020,0,13
4,abf4265bc0d16ea15c1ecc2f4200d806,1591487300000,-73.35522,-72.98109,41.48516,41.11893,2,137,6,31.896808,62.327991,53.127891,69.587521,69.022923,55.8892,52.24746,2020,5,23


#### <span style='color:#ff5f27'> 💸 Fares Data

In [4]:
df_fares = pd.read_csv('data/fares500.csv', index_col=0)

df_fares.head()

Unnamed: 0,total_fare,tip,tolls,taxi_id,driver_id,ride_id
0,207.0,3.0,2.0,133,95,9e68bedf5543e631ae5913e4f1582aae
1,211.0,13.0,4.0,67,104,2b4ccdbf3227b7ee69570987a3a13292
2,3.0,15.0,0.0,88,116,3ea79ec54a53315835f1409fce08dae5
3,31.0,33.0,4.0,47,92,cd44a15a227ecbdafadf2fe96d2abce8
4,31.0,6.0,4.0,76,189,abf4265bc0d16ea15c1ecc2f4200d806


## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [5]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/164




Connected. Call `.close()` to terminate connection gracefully.


In [6]:
project

Project('romankah', 'roman.kaharlytskyi@logicalclocks.com', 'Default project')

## <span style="color:#ff5f27;">🪄 Creating Feature Groups</span>

#### <span style='color:#ff5f27'> 🚖 Rides Data

In [7]:
rides_fg = fs.get_or_create_feature_group(name="rides_fg",
                                          version=1,
                                          primary_key=["ride_id"],
                                          event_time=["pickup_datetime"],
                                          expectation_suite=expectation_suite,
                                          description="Rides features",
                                          time_travel_format="HUDI",     
                                          online_enabled=True,                                                
                                          statistics_config=True)
rides_fg.insert(df_rides)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fg/871


Uploading Dataframe: 0.00% |          | Rows 0/16 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/air_quality_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f7b806cd1c0>, None)

#### <span style='color:#ff5f27'> 💸 Fares Data

In [8]:
fares_fg = fs.get_or_create_feature_group(name="fares_fg",
                                          version=1,
                                          primary_key=["ride_id"], 
                                          description="Taxi fares features",
                                          expectation_suite=expectation_suite,
                                          time_travel_format="HUDI",  
                                          online_enabled=True,
                                          statistics_config=True)   
fares_fg.insert(df_fares)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fg/872


Uploading Dataframe: 0.00% |          | Rows 0/16 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/weather_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f7b806ecc70>, None)

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 02 </span>

In the next notebook, we will be generating new data for our Feature Groups.

---