# <span style="font-width:bold; font-size: 3rem; color:#1EB182;">**Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>


## 🗒️ This notebook is divided into 2 sections:
1. Parse Data.
2. Insert new data into the Feature Store.

### <span style='color:#ff5f27'> 📝 Imports

In [None]:
from datetime import timedelta, datetime
import pandas as pd
import os

from features import (
    citibike, 
    meteorological_measurements,
)

# Mute warnings
import warnings
warnings.filterwarnings("ignore")

---

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
# Retrieve citibike_usage and meteorological_measurements feature groups
citibike_usage_fg = fs.get_or_create_feature_group(
    name="citibike_usage",
    version=1,
)

meteorological_measurements_fg = fs.get_or_create_feature_group(
    name="meteorological_measurements",
    version=1,
)

### <span style="color:#ff5f27;">📅 Getting tha last date</span>


In [None]:
# Get the last date in the 'citibike_usage_fg' Feature Group
last_date = citibike.get_last_date_in_fg(citibike_usage_fg)
last_date

In [None]:
# Get the next date after 'last_date' using the 'get_next_date' function
next_date = citibike.get_next_date(last_date).split("-")
next_date

In [None]:
# Extract the target year and month from the next date
target_year, target_month = int(next_date[0]), int(next_date[1])

print(f"Let's download citibike data for {target_month}/{target_year}")

---

## <span style="color:#ff5f27;"> 🪄 Parsing new data</span>

### <span style="color:#ff5f27;"> 🚲 Citibike usage info</span>

In [None]:
# Download Citibike data for the specified month and year
df_raw_batch = citibike.get_citibike_data(
    f"{target_month}/{target_year}", 
    f"{target_month}/{target_year}",
)
df_raw_batch.head(3)

In [None]:
# Engineer Citibike features for the downloaded batch of data
df_enhanced_batch = citibike.engineer_citibike_features(
    df_raw_batch,
)

In [None]:
# Drop rows with missing values in the enhanced batch DataFrame
df_enhanced_batch = df_enhanced_batch.dropna()

# Convert 'station_id' to string type for categorical representation
df_enhanced_batch.station_id = df_enhanced_batch.station_id.astype(str)

# Display the last three rows of the enhanced batch DataFrame
df_enhanced_batch.tail(3)

### <span style="color:#ff5f27;"> 🌤 Meteorological measurements from VisualCrossing</span>

You will parse weather data so you should get an API key from [VisualCrossing](https://www.visualcrossing.com/). You can use [this link](https://www.visualcrossing.com/weather-api).

#### Don't forget to create an `.env` configuration file inside this directory where all the necessary environment variables will be stored:

`WEATHER_API_KEY = "YOUR_API_KEY"`

> If you done it after you run this notebook, restart the Python Kernel (because `functions.py` does not have these variables in his namespace).

![](images/api_keys_env_file.png)

In [None]:
# Convert the 'date' column to string type
df_enhanced_batch.date = df_enhanced_batch.date.astype(str)

# Find the minimum and maximum dates in the 'date' column of the enhanced batch DataFrame
start_date, end_date = df_enhanced_batch.date.min(), df_enhanced_batch.date.max()

In [None]:
# Get weather data for New York City within the date range of the enhanced batch DataFrame
df_weather_batch = meteorological_measurements.get_weather_data(
    city="nyc", 
    start_date=start_date, 
    end_date=end_date,
)
df_weather_batch.tail(3)

In [None]:
# Fix data types for specified columns in the weather batch DataFrame
for column in ["snowdepth", "snow"]:
    df_weather_batch[column] = df_weather_batch[column].astype("double")

In [None]:
# Unix columns creation
df_enhanced_batch["timestamp"] = df_enhanced_batch["date"].apply(
    meteorological_measurements.convert_date_to_unix
)
df_weather_batch["timestamp"] = df_weather_batch["date"].apply(
    meteorological_measurements.convert_date_to_unix
)

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

In [None]:
# Insert new data
citibike_usage_fg.insert(df_enhanced_batch)

In [None]:
# Insert new data
meteorological_measurements_fg.insert(df_weather_batch)

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 03: Training Pipeline </span>

In the next notebook you will create a feature view, training dataset, train a model and register it in Hopsworks Model Registry.
