# <span style="font-width:bold; font-size: 3rem; color:#1EB182;">**Hopsworks Feature Store** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/electricity/1_backfill_feature_groups.ipynb)

**Note**: you may get an error when installing hopsworks on Colab, and it is safe to ignore it.

This is the first part of the advanced series of tutorials about Hopsworks Feature Store. As part of this first module, you will work with data related to electricity prices and meteorological observations in Sweden. 

The objective of this tutorial is to demonstrate how to work with the **Hopworks Feature Store**  for batch data with a goal of training and deploying a model that can predict electricity prices in the future.

## 🗒️ This notebook is divided in 3 sections:
1. **Loading the data and feature engineeing**,
2. **Connect to the Hopsworks feature store**,
3. **Create feature groups and upload them to the feature store**.

### <span style='color:#ff5f27'> 📝 Imports

In [1]:
!pip install -U hopsworks --quiet

In [1]:
import pandas as pd
from functions import *

---

## <span style="color:#ff5f27;"> 🚲 Load the Citi Bike Trip Histories</span>

Downloadable files of Citi Bike trip data are located [here](https://s3.amazonaws.com/tripdata/index.html). The data includes:

    Ride ID
    Rideable type
    Started at
    Ended at
    Start station name
    Start station ID
    End station name
    End station ID
    Start latitude
    Start longitude
    End latitude
    End Longitude
    Member or casual ride



In [2]:
df = get_citibike_data("12/2021", "01/2022")

_____ Processing 12/2021... _____
Downloading Started...⏳
Downloading Completed!👌
Retrieving DataFrame from the csv file...💿
________________________________


  original_df = pd.read_csv(filename)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  res.started_at = pd.to_datetime(res.started_at)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  res.started_at = res.started_at.dt.floor('d')


data/202112-citibike-tripdata.csv
_____ Processing 01/2022... _____
Downloading Started...⏳
Downloading Completed!👌
Retrieving DataFrame from the csv file...💿
________________________________


  original_df = pd.read_csv(filename)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  res.started_at = pd.to_datetime(res.started_at)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  res.started_at = res.started_at.dt.floor('d')


data/202201-citibike-tripdata.csv


In [3]:
df

Unnamed: 0,date,station_id,users_count
0,2022-01-01,2782.02,2
1,2022-01-01,2832.03,3
2,2022-01-01,2912.08,1
3,2022-01-01,2932.01,2
4,2022-01-01,2961.05,1
...,...,...,...
81572,2021-12-31,8778.01,1
81573,2021-12-31,8782.01,2
81574,2021-12-31,8795.01,2
81575,2021-12-31,8795.03,2


In [4]:
print("\033[4mNUMBER OF NULL VALUES PER COLUMN:\033[0m")

print(df.isnull().sum())

[4mNUMBER OF NULL VALUES PER COLUMN:[0m
date           0
station_id     0
users_count    0
dtype: int64


---

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [5]:
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()

Copy your Api Key (first register/login): https://c.app.hopsworks.ai/account/api/generated

Paste it here: KrCaASkXgou0tsZu.8AFPAKeEiU8zDEYPFz6iUm2IZgtfWmPC1X6uZa6Z4XsGKyInXEJ0Yy8YWzUEWqxC
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/3235




Connected. Call `.close()` to terminate connection gracefully.


---

## <span style="color:#ff5f27;"> 🪄 Creating Feature Groups </span>

A [feature group](https://docs.hopsworks.ai/feature-store-api/latest/generated/feature_group/) can be seen as a collection of conceptually related features. In this case, you will create a feature group for the Meteorological measurements from SMHI, Electricity prices feature group from NORD POOL and Swedish holidays feature group.

In [26]:
citibike_stations_fg = fs.get_or_create_feature_group(
    name="citibike_stations",
    version=1,
    description="Citibike_stations across the NYC.",
    primary_key=["date", "station_id"],
    online_enabled=True
)

In [24]:
df.date = df.date.apply(str)
df.station_id = df.station_id.apply(str)
df.users_count = df.users_count.apply(int)

In [27]:
df

Unnamed: 0,date,station_id,users_count
0,2022-01-01 00:00:00,0 0 0 0 2782.02\n1...,2
1,2022-01-01 00:00:00,0 0 0 0 2782.02\n1...,3
2,2022-01-01 00:00:00,0 0 0 0 2782.02\n1...,1
3,2022-01-01 00:00:00,0 0 0 0 2782.02\n1...,2
4,2022-01-01 00:00:00,0 0 0 0 2782.02\n1...,1
...,...,...,...
81572,2021-12-31 00:00:00,0 0 0 0 2782.02\n1...,1
81573,2021-12-31 00:00:00,0 0 0 0 2782.02\n1...,2
81574,2021-12-31 00:00:00,0 0 0 0 2782.02\n1...,2
81575,2021-12-31 00:00:00,0 0 0 0 2782.02\n1...,2


In [25]:
citibike_stations_fg.insert(df)

FeatureStoreException: Features are not compatible with Feature Group schema: 
 - date (expected type: 'timestamp', derived from input: 'string') has the wrong type.

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 02 </span>

In the next notebook, you will be generating new data for the Feature Groups.