# <span style="font-width:bold; font-size: 3rem; color:#1EB182;">**Hopsworks Feature Store** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Training Data & Feature views</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/nyc_taxi_fares/3_feature_view_and_dataset_creation.ipynb)



## 🗒️ This notebook is divided into 3 main sections:
1. **Feature selection**,
2. **Feature transformations**,
3. **Training datasets creation**.

![02_training-dataset](../../images/02_training-dataset.png)

In [None]:
import warnings

# Mute warnings
warnings.filterwarnings("ignore")

## <span style="color:#ff5f27;"> 📡 Connecting to the Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
# Retrieve feature groups.
rides_fg = fs.get_or_create_feature_group("nyc_taxi_rides",
                                          version=1)

In [None]:
fares_fg = fs.get_or_create_feature_group("nyc_taxi_fares",
                                          version=1)

---

## <span style="color:#ff5f27;"> 🖍 Feature View Creation and Retrieving </span>

Firstly you have to make a query from desired features.

In [None]:
# Select features for training data.
fg_query = fares_fg.select(['total_fare', "tolls"])\
                            .join(rides_fg.select_except(['taxi_id', "driver_id", "pickup_datetime",
                                                          "pickup_longitude", "pickup_latitude",
                                                          "dropoff_longitude", "dropoff_latitude"]),
                                  on=['ride_id'])

fg_query.show(2)

`Feature Views` stands between **Feature Groups** and **Training Dataset**. Сombining **Feature Groups** we can create **Feature Views** which store a metadata of our data. Having **Feature Views** we can create **Training Dataset**.

The Feature Views allows schema in form of a query with filters, define a model target feature/label and additional transformation functions.

In order to create Feature View we can use `FeatureStore.create_feature_view()` method.

We can specify next parameters:

- `name` - name of a feature group.

- `version` - version of a feature group.

- `labels`- our target variable.

- `transformation_functions` - functions to transform our features.

- `query` - query object with data.

In [None]:
nyc_fares_fv = fs.get_or_create_feature_view(
    name='nyc_taxi_fares_fv',
    query=fg_query,
    labels=["total_fare"]
)

In [None]:
nyc_fares_fv.version

---

## <span style="color:#ff5f27;">🏋️ Training Dataset Creation</span>
    
In Hopsworks training data is a query where the projection (set of features) is determined by the parent FeatureView with an optional snapshot on disk of the data returned by the query.

Training Dataset may contain splits such as:

    Training set - the subset of training data used to train a model.
    Validation set - the subset of training data used to evaluate hparams when training a model
    Test set - the holdout subset of training data used to evaluate a mode

Training dataset is created using fs.create_train_validation_test_split() method.

In [None]:
td_version, td_job = nyc_fares_fv.create_train_test_split(
    description = 'NYC taxi fares dataset',
    data_format = 'csv',
    test_size = 0.2,
    write_options = {'wait_for_job': True},
    coalesce = True,
)

In [None]:
X_train, X_test, y_train, y_test = nyc_fares_fv.get_train_test_split(
    training_dataset_version=1
)

In [None]:
X_train.head(5)

In [None]:
y_train.head(5)

In [None]:
X_test.head(5)

In [None]:
y_test.head(5)

## <span style="color:#ff5f27;">⏭️ **Next:** Part 04 </span>

In the next notebook you will train a model on the dataset, that was created in this notebook.