# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/bitcoin/1_backfill_feature_groups.ipynb)

## 🗒️ This notebook is divided in 3 sections:
1. Loading the data 
2. Connect to the Hopsworks feature store.
3. Create feature groups and insert them to the feature store.

![tutorial-flow](../images/01_featuregroups.png)

## API keys are stored in .env file in the next format:
`BINANCE_API_KEY = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"`

`BINANCE_API_SECRET = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"`


`TWITTER_API_KEY = "cccccccccccccccccccccccccccc"`

`TWITTER_API_SECRET = "ddddddddddddddddddddddddddddddddddd"`

### <span style="color:#ff5f27;"> 📝 Imports</span>

In [1]:
!pip install -U unicorn-binance-rest-api --quiet
!pip install -U python-dotenv --quiet

You should consider upgrading via the '/opt/miniconda3/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/opt/miniconda3/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import pandas as pd

In [3]:
from functions import *

from dotenv import load_dotenv
load_dotenv()

[nltk_data] Downloading package stopwords to /Users/Max/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/Max/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /Users/Max/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

___

## <span style="color:#ff5f27;"> 💽 Loading Data</span>

### <span style='color:#ff5f27'> 📈 Bitcoin Data

In [4]:
df_bitcoin = parse_btc_data(number_of_days_ago=2000)

df_bitcoin = df_bitcoin[(df_bitcoin.date >= '2021-02-05 10:00:00') & (df_bitcoin.date <= '2022-06-04 23:00:00')] 
df_bitcoin.reset_index(drop=True,inplace=True)

df_bitcoin.head(3)

Unnamed: 0,date,open,high,low,close,volume,quote_av,trades,tb_base_av,tb_quote_av,unix
0,2021-02-06 00:00:00,38289.32,40955.51,38215.94,39186.94,98757.311183,3922095000.0,2291646,52015.513362,2065181000.0,1612562400000
1,2021-02-07 00:00:00,39181.01,39700.0,37351.0,38795.69,84363.679763,3256521000.0,1976357,40764.388959,1574483000.0,1612648800000
2,2021-02-08 00:00:00,38795.69,46794.45,37988.89,46374.87,138597.536914,5881537000.0,3230961,72345.891568,3069314000.0,1612735200000


In [5]:
df_bitcoin_processed = process_btc_data(df_bitcoin)
df_bitcoin_processed.tail(3)

Unnamed: 0,date,open,high,low,close,volume,quote_av,trades,tb_base_av,tb_quote_av,...,exp_std_14_days,momentum_14_days,rate_of_change_14_days,strength_index_14_days,std_56_days,exp_mean_56_days,exp_std_56_days,momentum_56_days,rate_of_change_56_days,strength_index_56_days
481,2022-06-02,29805.84,30689.0,29594.55,30452.62,56961.42928,1711653000.0,1086183,28555.06607,858193500.0,...,1679.699391,133.39,4.286187,46.348521,5039.300314,34124.984906,5597.739238,-12991.57,-27.926222,43.640162
482,2022-06-03,30452.63,30699.0,29282.36,29700.21,54067.44727,1615617000.0,993769,26583.25141,794354800.0,...,1576.588412,499.2,0.866529,43.654773,4996.379059,33969.729641,5559.688788,-12551.8,-30.532276,43.052983
483,2022-06-04,29700.21,29988.88,29485.0,29864.04,25617.90113,760874300.0,618037,12971.7246,385358200.0,...,1472.337144,418.98,-1.419096,44.412343,4925.337066,33825.670351,5514.223362,-12889.93,-29.163058,43.222349


### <span style='color:#ff5f27'> 💭 Tweets Data

In [6]:
# tweets_textblob = pd.read_csv("https://repo.hops.works/dev/davit/bitcoin/tweets_textblob.csv")
tweets_textblob = pd.read_csv("data/tweets_textblob.csv")
tweets_textblob.head(3)

Unnamed: 0,date,subjectivity,polarity,unix
0,2021-02-05 00:00:00,462.983446,153.870358,1612476000000
1,2021-02-06 00:00:00,945.521424,356.941159,1612562400000
2,2021-02-07 00:00:00,1055.799641,446.821937,1612648800000


In [7]:
# tweets_vader = pd.read_csv("https://repo.hops.works/dev/davit/bitcoin/tweets_vader.csv")
tweets_vader = pd.read_csv("data/tweets_vader.csv")

tweets_vader.head(3)

Unnamed: 0,date,compound,unix
0,2021-02-05 00:00:00,229.4372,1612476000000
1,2021-02-06 00:00:00,464.11,1612562400000
2,2021-02-07 00:00:00,447.5697,1612648800000


---
## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [8]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/167




Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> 🪄 Creating Feature Groups </span>

### <span style='color:#ff5f27'> 📈 Bitcoin Price Feature Group

In [9]:
btc_price_fg = fs.get_or_create_feature_group(
    name='bitcoin_price',
    description='Bitcoin price aggregated for days',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time=['unix']
)

btc_price_fg.insert(df_bitcoin_processed)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fg/2676


Uploading Dataframe: 0.00% |          | Rows 0/484 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_price_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fcd074b97c0>, None)

### <span style='color:#ff5f27'> 💭 Tweets Feature Groups

In [10]:
tweets_textblob_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_textblob',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time=['unix']
)

tweets_textblob_fg.insert(tweets_textblob)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fg/2677


Uploading Dataframe: 0.00% |          | Rows 0/528 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_tweets_textblob_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fcd074439a0>, None)

In [11]:
tweets_vader_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_vader',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time=['unix']
)

tweets_vader_fg.insert(tweets_vader)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fg/2678


Uploading Dataframe: 0.00% |          | Rows 0/528 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_tweets_vader_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fcd074bae50>, None)

---