# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/{project_name}/{notebook_name}.ipynb)


## 🗒️ This notebook is divided into the following sections:
1. Fetch historical data
2. Connect to the Hopsworks feature store
3. Create feature groups and insert them to the feature store

![tutorial-flow](../../images/01_featuregroups.png)

In [None]:
!pip install hopsworks

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting hopsworks
  Downloading hopsworks-3.0.5.tar.gz (35 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting hsfs[python]<3.1.0,>=3.0.0
  Downloading hsfs-3.0.5.tar.gz (120 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.6/120.6 KB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting hsml<3.1.0,>=3.0.0
  Downloading hsml-3.0.3.tar.gz (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 KB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyhumps==1.6.1
  Downloading pyhumps-1.6.1-py3-none-any.whl (5.0 kB)
Collecting furl
  Downloading furl-2.1.3-py2.py3-none-any.whl (20 kB)
Collecting boto3
  Downloading boto3-1.26.47-py3-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/jim

/content/drive/MyDrive/jim


## <span style='color:#ff5f27'> 📝 Imports

In [None]:
import pandas as pd

from functions import *

---

## <span style='color:#ff5f27'> 💽 Loading Historical Data</span>


#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [None]:
df_air_quality = pd.read_csv('/content/drive/MyDrive/jim/aqi.csv')
df_air_quality.head()

Unnamed: 0,date,aqi
0,2014/1/1,136
1,2014/1/2,218
2,2014/1/3,127
3,2014/1/4,213
4,2014/1/5,168


In [None]:
df_air_quality.date = df_air_quality.date.apply(timestamp_2_time)
df_air_quality.sort_values(by = ['date'],inplace = True,ignore_index = True)

df_air_quality.head()

Unnamed: 0,date,aqi
0,1388534400000,136
1,1388620800000,218
2,1388707200000,127
3,1388793600000,213
4,1388880000000,168


#### <span style='color:#ff5f27'> 🌦 Weather Data

In [None]:
df_weather = pd.read_csv('/content/drive/MyDrive/jim/weather.csv')
df_weather.head(5)

Unnamed: 0,date,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,...,snowdepth,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,conditions
0,2014/1/1,11.4,-1.0,6.4,11.4,-2.5,4.5,-16.7,18.9,0.0,...,0.0,32.4,286.8,1015.2,0.0,13.5,110.8,9.6,5,Clear
1,2014/1/2,7.0,-5.0,0.5,7.0,-6.9,-0.5,-9.0,50.9,0.0,...,0.0,14.4,36.0,1018.5,6.5,6.6,94.1,8.1,4,Clear
2,2014/1/3,9.0,-2.0,3.1,8.0,-5.9,1.2,-13.2,32.4,0.0,...,0.0,16.1,7.7,1022.5,1.9,16.5,111.9,9.6,5,Clear
3,2014/1/4,2.0,-6.0,-1.9,2.0,-6.1,-3.1,-6.9,69.1,0.0,...,0.0,10.8,68.3,1022.8,12.1,4.8,91.0,7.9,4,Clear
4,2014/1/5,6.0,-7.0,-0.7,3.7,-12.9,-4.2,-10.9,51.8,0.0,...,0.0,18.0,16.0,1024.8,1.4,8.8,112.5,9.9,5,Clear


In [None]:
df_weather.date = df_weather.date.apply(timestamp_2_time)
df_weather.sort_values(by=['date'],inplace=True, ignore_index=True)
df_weather['precipprob'] = df_weather['precipprob'].astype(float)
df_weather['uvindex'] = df_weather['uvindex'].astype(float)



In [None]:
df_weather.drop(columns=['sealevelpressure'], inplace=True)

---

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5333
Connected. Call `.close()` to terminate connection gracefully.


---

## <span style="color:#ff5f27;">🪄 Creating Feature Groups</span>

#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [None]:
aqi_fg = fs.get_or_create_feature_group(
        name = 'aqi_fg',
        description = 'Air Quality characteristics of each day',
        version = 1,
        primary_key = ['date'],
        online_enabled = True,
        event_time = 'date'
    )    

aqi_fg.insert(df_air_quality)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/5333/fs/5253/fg/14723


Uploading Dataframe: 0.00% |          | Rows 0/3293 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/5333/jobs/named/aqi_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f68f1caf670>, None)

#### <span style='color:#ff5f27'> 🌦 Weather Data

In [None]:
weather_data_fg = fs.get_or_create_feature_group(
        name = 'weather_data_fg',
        description = 'Weather characteristics of each day',
        version = 1,
        primary_key = ['date'],
        online_enabled = True,
        event_time = 'date'
    )    

weather_data_fg.insert(df_weather)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/5333/fs/5253/fg/14726


Uploading Dataframe: 0.00% |          | Rows 0/3293 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/5333/jobs/named/weather_data_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f6908335c40>, None)

---