# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Training Data & Feature views</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/bitcoin/3_feature_views_and_training_dataset.ipynb)

<span style="font-width:bold; font-size: 1.4rem;">This is the third part of advanced tutorials about Hopsworks Feature Store. This notebook explains how to read from a feature group and create training dataset within the feature store</span>

## 🗒️ This notebook is divided into the following sections: 

1. Fetch Feature Groups
2. Define Transformation functions
4. Create Feature Views
5. Create Training Dataset with training, validation and test splits

![part2](../../images/02_training-dataset.png) 

### <span style="color:#ff5f27;"> 📝 Imports</span>

In [None]:
import pandas as pd

import datetime

import warnings
warnings.filterwarnings('ignore')

---

## <span style="color:#ff5f27;"> 📡 Connecting to the Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
btc_price_fg = fs.get_or_create_feature_group(
    name='bitcoin_price',
    version=1
)

btc_price_fg.read().head(3)

In [None]:
tweets_textblob_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_textblob',
    version=1
)

tweets_textblob_fg.show(3)

In [None]:
tweets_vader_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_vader',
    version=1
)

tweets_vader_fg.show(3)

--- 

## <span style="color:#ff5f27;"> 🖍 Feature View Creation and Retrieving </span>

In [None]:
# Query Preparation
fg_query = btc_price_fg.select_except(["date",'signal']).join(tweets_textblob_fg.select(["subjectivity","polarity"])).join(tweets_vader_fg.select("compound"))

final_df = fg_query.read()

In [None]:
final_df.head(5)

In [None]:
# Check for Nans
final_df.isna().sum()[final_df.isna().sum() > 0]

In [None]:
columns_to_transform = final_df.columns
columns_to_transform = columns_to_transform.tolist()
columns_to_transform.remove("unix")

In [None]:
# Map features to transformation functions.
transformation_functions = {col: fs.get_transformation_function(name="min_max_scaler") for col in columns_to_transform}

In [None]:
feature_view = fs.get_or_create_feature_view(
    name='bitcoin_feature_view',
    version=1,
    transformation_functions=transformation_functions,
    query=fg_query
)

---

## <span style="color:#ff5f27;"> 🏋️ Training Dataset Creation</span>

In [None]:
from datetime import datetime


df_format = '%Y-%m-%d'

# You can combine different datetime formats.
td_jan_feb_version, td_job = feature_view.create_train_test_split(
    train_start="2021/02/05",
    train_end="2022-06-01",
    validation_start=datetime.strptime('2022-06-02', df_format),
    validation_end="20220901",
    test_start="20220902,
    test_end="2022/012/01",
    data_format = "csv",
    coalesce = True,
    write_options = {'wait_for_job': False},
    )

## <span style="color:#ff5f27;">⏭️ **Next:** Part 04 </span>

In the next notebook you will train a model on the dataset, that was created in this notebook.

---