# <span style="font-width:bold; font-size: 3rem; color:#1EB182;">**Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/nyc_taxi_fares/2_nyc_taxi_fares_feature_pipeline.ipynb)

## 🗒️ This notebook is divided into 2 sections:
1. Data generation.
2. Insert new data into the Feature Store.

In [None]:
!pip install -U hopsworks --quiet

# Hosted notebook environments may not have the local features package
import os

def need_download_modules():
    if 'google.colab' in str(get_ipython()):
        return True
    if 'HOPSWORKS_PROJECT_ID' in os.environ:
        return True
    return False

if need_download_modules():
    print("⚙️ Downloading modules...")
    os.system('mkdir -p features')
    os.system('cd features && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/nyc_taxi_fares/features/nyc_taxi_fares.py')
    os.system('cd features && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/nyc_taxi_fares/features/nyc_taxi_rides.py')
    print('✅ Done!')
else:
    print("Local environment")

### <span style='color:#ff5f27'> 📝 Imports

In [None]:
import pandas as pd
from datetime import datetime
import time 
import os 

from features import nyc_taxi_rides, nyc_taxi_fares

# Mute warnings
import warnings
warnings.filterwarnings("ignore")

___

## <span style="color:#ff5f27;"> 🪄 Generating new data</span>

### <span style='color:#ff5f27'> 🚖 Rides Data

In [None]:
df_rides = nyc_taxi_rides.generate_rides_data(150)
df_rides

In [None]:
# Calculate Distance Features
df_rides = nyc_taxi_rides.calculate_distance_features(df_rides)

# Calculate Datatime Features
df_rides = nyc_taxi_rides.calculate_datetime_features(df_rides)

In [None]:
# now save the newly-generated ride_ids.
# it will be retrieved and used in for fares data generation
ride_ids = df_rides.ride_id

In [None]:
for col in ["passenger_count", "taxi_id", "driver_id"]:
    df_rides[col] = df_rides[col].astype("int64")

### <span style='color:#ff5f27'> 💸 Fares Data

In [None]:
df_fares = nyc_taxi_fares.generate_fares_data(150)
df_fares

In [None]:
df_fares = df_fares.astype("int64")

In [None]:
# lets load our ride_ids which were created moments ago for rides_fg
df_fares["ride_id"] = ride_ids

In [None]:
for col in ["tolls", "total_fare"]:
    df_fares[col] = df_fares[col].astype("double")

___

## <span style="color:#ff5f27;"> 📡 Connecting to the Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

In [None]:
rides_fg = fs.get_or_create_feature_group(
    name="nyc_taxi_rides",
    version=1,
)   

fares_fg = fs.get_or_create_feature_group(
    name="nyc_taxi_fares",
    version=1,
)   

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

In [None]:
rides_fg.insert(df_rides)

In [None]:
fares_fg.insert(df_fares)

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 03: Training Pipeline </span>

In the next notebook, you will create a feature view, training dataset, train a model and save it to Hopsworks Model Registry.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/nyc_taxi_fares/3_nyc_taxi_fares_training_pipeline.ipynb)