# <span style="font-width:bold; font-size: 3rem; color:#1EB182;">**Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/electricity/2_feature_pipeline.ipynb)

## 🗒️ This notebook is divided into 2 sections:
1. **Parse Data**.
2. **Insert new data into the Feature Store**.

### <span style='color:#ff5f27'> 📝 Imports

In [None]:
!pip install -U hopsworks --quiet

In [None]:
from datetime import timedelta, datetime
import pandas as pd

from functions import *

import warnings

# Mute warnings
warnings.filterwarnings("ignore")

---

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()

In [None]:
citibike_usage_fg = fs.get_or_create_feature_group(
    name="citibike_usage",
    version=1
)

In [None]:
meteorological_measurements_fg = fs.get_or_create_feature_group(
    name="meteorological_measurements",
    version=1
)

### <span style="color:#ff5f27;">📅 Getting tha last date</span>


In [None]:
last_date = get_last_date_in_fg(citibike_usage_fg)

In [None]:
last_date

In [None]:
next_date = get_next_date(last_date)

In [None]:
next_date = next_date.split("-")
next_date

In [None]:
target_year, target_month = int(next_date[0]), int(next_date[1])

In [None]:
print(f"So, now let's download citibike data for {target_month}/{target_year}")

---

## <span style="color:#ff5f27;"> 🪄 Parsing new data</span>

### <span style="color:#ff5f27;"> 🚲 Citibike usage info</span>

In [None]:
# get new month data
df_raw_batch = get_citibike_data(f"{target_month}/{target_year}", f"{target_month}/{target_year}")

In [None]:
df_raw_batch

In [None]:
df_enhanced_batch = engineer_citibike_features(df_raw_batch)

In [None]:
df_enhanced_batch = df_enhanced_batch.dropna()

In [None]:
df_enhanced_batch.station_id = df_enhanced_batch.station_id.astype(str)

In [None]:
df_enhanced_batch.tail(3)

### <span style="color:#ff5f27;"> 🌤 Meteorological measurements from VisualCrossing</span>

### You will parse weather data so you should get an API key from [VisualCrossing](https://www.visualcrossing.com/). You can use [this link](https://www.visualcrossing.com/weather-api).

### Don't forget to create an `.env` configuration file where all the necessary environment variables (API keys) will be stored:
![](images/api_keys_env_file.png)

In [None]:
df_enhanced_batch.date = df_enhanced_batch.date.astype(str)

start_date, end_date = df_enhanced_batch.date.min(), df_enhanced_batch.date.max()

In [None]:
df_weather_batch = get_weather_data(city="nyc", start_date=start_date, end_date=end_date)

In [None]:
df_weather_batch.tail(5)

In [None]:
# lets fix datatypes
for column in ["snowdepth", "snow"]:
    df_weather_batch[column] = df_weather_batch[column].astype("double")

In [None]:
# unix columns creation

df_enhanced_batch["timestamp"] = df_enhanced_batch["date"].apply(convert_date_to_unix)
df_weather_batch["timestamp"] = df_weather_batch["date"].apply(convert_date_to_unix)

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

In [None]:
citibike_usage_fg.insert(df_enhanced_batch, write_options={"wait_for_job": False})

In [None]:
meteorological_measurements_fg.insert(df_weather_batch, write_options={"wait_for_job": False})

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 03 </span>

In the next notebook, you will create a feature view and training dataset.