# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Training Data & Feature views</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/bitcoin/3_feature_views_and_training_dataset.ipynb)

<span style="font-width:bold; font-size: 1.4rem;">This is the third part of advanced tutorials about Hopsworks Feature Store. This notebook explains how to read from a feature group and create training dataset within the feature store</span>

## 🗒️ In this notebook you will see how to create a training dataset from the feature groups: 

1. Retrieving Feature Groups.
2. Defining Transformation functions.
4. Feature View creation.
5. Training Dataset with training, validation and test data.

![part2](../images/02_training-dataset.png) 

### <span style="color:#ff5f27;"> 📝 Imports</span>

In [1]:
import pandas as pd

import datetime

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/167




Connected. Call `.close()` to terminate connection gracefully.


---

## <span style="color:#ff5f27;">🪝 Retrieving Feature Groups </span>

In [3]:
btc_price_fg = fs.get_or_create_feature_group(
    name='bitcoin_price',
    version=1
)

btc_price_fg.read().head(3)

2022-09-28 17:00:36,501 INFO: USE `maksym00_featurestore`
2022-09-28 17:00:37,608 INFO: SELECT `fg0`.`date` `date`, `fg0`.`open` `open`, `fg0`.`high` `high`, `fg0`.`low` `low`, `fg0`.`close` `close`, `fg0`.`volume` `volume`, `fg0`.`quote_av` `quote_av`, `fg0`.`trades` `trades`, `fg0`.`tb_base_av` `tb_base_av`, `fg0`.`tb_quote_av` `tb_quote_av`, `fg0`.`unix` `unix`, `fg0`.`mean_7_days` `mean_7_days`, `fg0`.`mean_14_days` `mean_14_days`, `fg0`.`mean_56_days` `mean_56_days`, `fg0`.`signal` `signal`, `fg0`.`std_7_days` `std_7_days`, `fg0`.`exp_mean_7_days` `exp_mean_7_days`, `fg0`.`exp_std_7_days` `exp_std_7_days`, `fg0`.`momentum_7_days` `momentum_7_days`, `fg0`.`rate_of_change_7_days` `rate_of_change_7_days`, `fg0`.`strength_index_7_days` `strength_index_7_days`, `fg0`.`std_14_days` `std_14_days`, `fg0`.`exp_mean_14_days` `exp_mean_14_days`, `fg0`.`exp_std_14_days` `exp_std_14_days`, `fg0`.`momentum_14_days` `momentum_14_days`, `fg0`.`rate_of_change_14_days` `rate_of_change_14_days`, `fg

Unnamed: 0,date,open,high,low,close,volume,quote_av,trades,tb_base_av,tb_quote_av,...,exp_std_14_days,momentum_14_days,rate_of_change_14_days,strength_index_14_days,std_56_days,exp_mean_56_days,exp_std_56_days,momentum_56_days,rate_of_change_56_days,strength_index_56_days
0,2022-05-19 21:00:00,30319.22,30777.33,28730.0,29201.01,60517.25325,1800589000.0,1694004,30890.40127,919467700.0,...,3309.834053,-6812.76,-17.679609,35.604076,5652.935995,36611.788364,5686.179927,-15112.15,-34.396367,41.258938
1,2021-05-20 21:00:00,40525.39,42200.0,33488.0,37252.01,202100.888258,7713347000.0,3993336,94002.431409,3591749000.0,...,7554.338338,-20062.74,-36.713026,28.998528,6013.760637,52320.267459,6954.268184,-17773.58,-33.260626,44.87207
2,2021-12-07 22:00:00,50588.95,51200.0,48600.0,50471.19,38425.92466,1925459000.0,1118225,18126.84362,908410700.0,...,4120.300702,-6667.1,-14.398097,35.5866,4388.713351,56447.953603,6174.286106,-6895.81,-11.991276,47.825775


In [5]:
tweets_textblob_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_textblob',
    version=1
)

tweets_textblob_fg.show(3)

2022-09-28 17:00:47,714 INFO: USE `maksym00_featurestore`
2022-09-28 17:00:48,762 INFO: SELECT `fg0`.`date` `date`, `fg0`.`subjectivity` `subjectivity`, `fg0`.`polarity` `polarity`, `fg0`.`unix` `unix`
FROM `maksym00_featurestore`.`bitcoin_tweets_textblob_1` `fg0`


Unnamed: 0,date,subjectivity,polarity,unix
0,2022-05-20 00:00:00,0.0,0.0,1652994000000
1,2021-05-21 00:00:00,0.0,0.0,1621544400000
2,2021-12-08 00:00:00,0.0,0.0,1638914400000


In [6]:
tweets_vader_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_vader',
    version=1
)

tweets_vader_fg.show(3)

2022-09-28 17:00:52,298 INFO: USE `maksym00_featurestore`
2022-09-28 17:00:53,370 INFO: SELECT `fg0`.`date` `date`, `fg0`.`compound` `compound`, `fg0`.`unix` `unix`
FROM `maksym00_featurestore`.`bitcoin_tweets_vader_1` `fg0`


Unnamed: 0,date,compound,unix
0,2022-05-20 00:00:00,0.0,1652994000000
1,2021-05-21 00:00:00,0.0,1621544400000
2,2021-12-08 00:00:00,0.0,1638914400000


--- 

## <span style="color:#ff5f27;"> 🖍 Feature View Creation and Retrieving </span>

In [7]:
# Query Preparation
fg_query = btc_price_fg.select_except(["date","unix"]).join(tweets_textblob_fg.select(["subjectivity","polarity"])).join(tweets_vader_fg.select("compound"))
fg_query.show(5)

2022-09-28 17:00:56,825 INFO: USE `maksym00_featurestore`
2022-09-28 17:00:58,058 INFO: WITH right_fg0 AS (SELECT *
FROM (SELECT `fg2`.`open` `open`, `fg2`.`high` `high`, `fg2`.`low` `low`, `fg2`.`close` `close`, `fg2`.`volume` `volume`, `fg2`.`quote_av` `quote_av`, `fg2`.`trades` `trades`, `fg2`.`tb_base_av` `tb_base_av`, `fg2`.`tb_quote_av` `tb_quote_av`, `fg2`.`mean_7_days` `mean_7_days`, `fg2`.`mean_14_days` `mean_14_days`, `fg2`.`mean_56_days` `mean_56_days`, `fg2`.`signal` `signal`, `fg2`.`std_7_days` `std_7_days`, `fg2`.`exp_mean_7_days` `exp_mean_7_days`, `fg2`.`exp_std_7_days` `exp_std_7_days`, `fg2`.`momentum_7_days` `momentum_7_days`, `fg2`.`rate_of_change_7_days` `rate_of_change_7_days`, `fg2`.`strength_index_7_days` `strength_index_7_days`, `fg2`.`std_14_days` `std_14_days`, `fg2`.`exp_mean_14_days` `exp_mean_14_days`, `fg2`.`exp_std_14_days` `exp_std_14_days`, `fg2`.`momentum_14_days` `momentum_14_days`, `fg2`.`rate_of_change_14_days` `rate_of_change_14_days`, `fg2`.`stre

Unnamed: 0,open,high,low,close,volume,quote_av,trades,tb_base_av,tb_quote_av,mean_7_days,...,strength_index_14_days,std_56_days,exp_mean_56_days,exp_std_56_days,momentum_56_days,rate_of_change_56_days,strength_index_56_days,subjectivity,polarity,compound
0,29700.21,29988.88,29485.0,29864.04,25617.90113,760874300.0,618037,12971.7246,385358200.0,30403.722857,...,44.412343,4925.337066,33825.670351,5514.223362,-12889.93,-29.163058,43.222349,2147.097309,1169.112676,1437.81
1,29654.58,30223.74,29294.21,29542.15,59537.38659,1771708000.0,985440,28846.96589,858587300.0,29652.16,...,39.077705,5642.588236,35465.943895,5822.881326,-17525.84,-35.086949,41.959219,4316.993556,1565.416369,2280.2427
2,29445.07,30487.99,29255.11,30293.94,36158.98748,1082531000.0,862880,18538.10071,555044200.0,29756.214286,...,40.191047,5729.799991,36147.468601,5751.567999,-16533.82,-35.711971,42.346685,7015.49422,3411.991247,4862.2767
3,29201.01,29656.18,28947.28,29445.06,20987.13124,616194700.0,643486,10558.92886,310046100.0,29904.064286,...,36.628172,5736.496208,36360.324198,5741.839573,-15066.21,-37.120503,41.501958,0.0,0.0,0.0
4,28715.33,30545.18,28691.38,30319.23,67877.36415,2014360000.0,1860780,35339.65787,1049163000.0,30008.024286,...,38.232795,5543.601532,36881.271228,5603.663532,-13672.23,-31.579626,42.044964,0.0,0.0,0.0


In [9]:
# Load the transformation functions.
min_max_scaler = fs.get_transformation_function(name="min_max_scaler")

# Map features to transformation functions.
transformation_functions = {
    'open': min_max_scaler, 
    'high': min_max_scaler, 
    'low': min_max_scaler, 
    'close': min_max_scaler,
    'volume': min_max_scaler, 
    'quote_av': min_max_scaler, 
    'trades': min_max_scaler,
    'tb_base_av': min_max_scaler, 
    'tb_quote_av': min_max_scaler, 
    'mean_7_days': min_max_scaler, 
    'mean_14_days': min_max_scaler,
    'mean_56_days': min_max_scaler, 
    'signal': min_max_scaler, 
    'std_7_days': min_max_scaler, 
    'exp_mean_7_days': min_max_scaler,
    'exp_std_7_days': min_max_scaler, 
    'momentum_7_days': min_max_scaler,
    'rate_of_change_7_days': min_max_scaler,
    'strength_index_7_days': min_max_scaler, 
    'std_14_days': min_max_scaler, 
    'exp_mean_14_days': min_max_scaler,
    'exp_std_14_days': min_max_scaler, 
    'momentum_14_days': min_max_scaler, 
    'rate_of_change_14_days': min_max_scaler,
    'strength_index_14_days': min_max_scaler, 
    'std_56_days': min_max_scaler, 
    'exp_mean_56_days': min_max_scaler,
    'exp_std_56_days': min_max_scaler, 
    'momentum_56_days': min_max_scaler, 
    'rate_of_change_56_days': min_max_scaler,
    'strength_index_56_days': min_max_scaler, 
    'subjectivity': min_max_scaler, 
    'polarity': min_max_scaler, 
    'compound': min_max_scaler,                           
}

In [10]:
feature_view = fs.create_feature_view(
    name='bitcoin_feature_view',
    version=1,
    transformation_functions=transformation_functions,
    query=fg_query
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/167/fs/109/fv/bitcoin_feature_view/version/1


---

## <span style="color:#ff5f27;"> 🏋️ Training Dataset Creation</span>
---

### <span style="color:#ff5f27;">🪓 TimeSeriesSplit</span>

In [11]:
from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"

In [14]:
# Create training datasets based event time filter
start_time = int(float(datetime.strptime("2021-02-05 10:00:00", date_format).timestamp()) * 1000)
end_time = int(float(datetime.strptime("2022-01-01 23:59:59", date_format).timestamp()) * 1000)


td_train_version, td_job = feature_view.create_training_data(
        start_time = start_time,
        end_time = end_time,    
        description = 'transactions fraud online training dataset jan/feb',
        data_format = "csv",
        coalesce = True,
        write_options = {'wait_for_job': True},
    )

Training dataset job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_feature_view_1_1_create_fv_td_28092022140508/executions




In [15]:
# Create training datasets based event time filter
start_time = int(float(datetime.strptime("2022-01-02 00:00:00", date_format).timestamp()) * 1000)
end_time = int(float(datetime.strptime("2022-04-30 23:59:59", date_format).timestamp()) * 1000)

td_validation_version, td_job = feature_view.create_training_data(
        start_time = start_time,
        end_time = end_time,    
        description = 'transactions fraud online training dataset jan/feb',
        data_format = "csv",
        coalesce = True,
        write_options = {'wait_for_job': True},
    )

Training dataset job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_feature_view_1_2_create_fv_td_28092022140638/executions




In [16]:
# Create training datasets based event time filter
start_time = int(float(datetime.strptime("2022-05-01 00:00:00", date_format).timestamp()) * 1000)
end_time = int(float(datetime.strptime("2022-06-04 23:59:59", date_format).timestamp()) * 1000)

td_test_version, td_job = feature_view.create_training_data(
        start_time = start_time,
        end_time = end_time,    
        description = 'transactions fraud online training dataset jan/feb',
        data_format = "csv",
        coalesce = True,
        write_options = {'wait_for_job': True},
    )

Training dataset job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/bitcoin_feature_view_1_3_create_fv_td_28092022140803/executions




---