# Feature store example
---

The idea of this notebook is to give a example on how we manage:
- Transform raw data
- Ingest data in a batch way
- Retrieve data from the feature store.

### Importing packages

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import random
from typing import List
from datetime import datetime

import feast
import pandas as pd
from elemeno_ai_sdk.ml.features.feature_store import FeatureStore
from elemeno_ai_sdk.ml.features.feature_table import FeatureTable
from elemeno_ai_sdk.ml.features.ingest.sink.ingestion_sink_builder import IngestionSinkType



Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  pkg_resources.declare_namespace(__name__)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)


### Dataframe to ingest

In [3]:
FILE_PATH = "./example_data/datasource.csv"
raw_data = pd.read_csv(FILE_PATH)

In [4]:
raw_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,id,target,created_timestamp,event_timestamp
0,5.1,3.5,1.4,0.2,0,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
1,4.9,3.0,1.4,0.2,1,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
2,4.7,3.2,1.3,0.2,2,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
3,4.6,3.1,1.5,0.2,3,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
4,5.0,3.6,1.4,0.2,4,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248


### Creating feature store

Instantiate the Feature Store object, since we are working with a csv file we do not need to pass a `source_type` into the constructor, we just need to pass the `sink_type` which in our case will be `REDSHIFT`.

In [5]:
feature_store = FeatureStore(
    sink_type=IngestionSinkType.REDSHIFT
)



Now we create a Feature Table object. First I parse the id and feature columns to feast `Entity` and `Feature` objects:

In [6]:
def get_entities(id_columns: List[str]) -> List[feast.Entity]:
    return [feast.Entity(name=id_col) for id_col in id_columns] 

def get_features(feature_list: List[str]) -> List[feast.Feature]:
    features = []
    for feature in feature_list:
        if feature == "created_timestamp" or feature == "event_timestamp":
            dtype = feast.ValueType.BYTES
        elif feature == "target":
            dtype = feast.ValueType.INT32
        else:
            dtype = feast.ValueType.FLOAT
        features.append(feast.Feature(name=feature, dtype=dtype))
    return features

In [7]:
FEATURE_TABLE_NAME = "one_blinc_example"
FEATURES = [col for col in raw_data.columns if col != "id"]
IDS = ["id"]

In [8]:
feature_table = FeatureTable(
    name=FEATURE_TABLE_NAME,
    feature_store=feature_store,
    entities=get_entities(id_columns=IDS),
    features=get_features(feature_list=FEATURES),
    online=True
)

### Ingest features

To ingest features we could call the `ingest` method or if we want to transform the features before ingest we call the `transform_and_ingest` method. We need to pass the `feature_table` we just created and the `dataframe` we want to save together with a list of python functions to transform the raw data. These transformations could be creating new features or just cleaning the data.

We can also pass two more additional params, `renames` which will rename the features and `all_columns` which will filtered the features from your data source before sending to the feature store, for the purpose of our example we will leave it as ***None***

##### Feature engineering methods:

In [9]:
def add_columns(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_one"] = data["sepal_length"] + data["sepal_width"] 
    return data

def double_column(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_two"] = data["sepal_length"] * 2
    return data 

def multiply_columns(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_three"] = data["sepal_length"] * data["sepal_width"]
    return data

In [10]:
feature_store.transform_and_ingest(
    ft=feature_table, 
    response=raw_data, 
    transformations=[add_columns, double_column, multiply_columns]
)

### Retrieve features

In [11]:
retrieved_data = feature_store.get_training_features(feature_table=feature_table)

  return self.connectable.execution_options().execute(*args, **kwargs)


In [12]:
retrieved_data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,id,target,created_timestamp,event_timestamp,feature_one,feature_two,feature_three
82,5.8,2.7,3.9,1.2,82,1,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.5,11.6,15.66
81,5.5,2.4,3.7,1.0,81,1,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,7.9,11.0,13.20
80,5.5,2.4,3.8,1.1,80,1,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,7.9,11.0,13.20
79,5.7,2.6,3.5,1.0,79,1,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.3,11.4,14.82
78,6.0,2.9,4.5,1.5,78,1,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.9,12.0,17.40
...,...,...,...,...,...,...,...,...,...,...,...
146,6.3,2.5,5.0,1.9,146,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.8,12.6,15.75
145,6.7,3.0,5.2,2.3,145,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,9.7,13.4,20.10
144,6.7,3.3,5.7,2.5,144,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,10.0,13.4,22.11
143,6.8,3.2,5.9,2.3,143,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,10.0,13.6,21.76
