# Feature store example
---

The idea of this notebook is to give a example on how we manage:
- Transform raw data
- Ingest data in a batch way
- Retrieve data from the feature store.

### Importing packages

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import random
from typing import List
from datetime import datetime

import feast
import pandas as pd
from elemeno_ai_sdk.ml.features.feature_store import FeatureStore
from elemeno_ai_sdk.ml.features.feature_table import FeatureTable
from elemeno_ai_sdk.ml.features.ingest.sink.ingestion_sink_builder import IngestionSinkType
from elemeno_ai_sdk.ml.features.ingest.source.ingestion_source_builder import IngestionSourceType



### Dataframe to ingest

In [3]:
FILE_PATH = "./example_data/datasource.csv"
raw_data = pd.read_csv(FILE_PATH)

In [4]:
raw_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,id,target,created_timestamp,event_timestamp
0,5.1,3.5,1.4,0.2,0,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
1,4.9,3.0,1.4,0.2,1,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
2,4.7,3.2,1.3,0.2,2,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
3,4.6,3.1,1.5,0.2,3,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248
4,5.0,3.6,1.4,0.2,4,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248


### Creating feature store

Instantiate the Feature Store object, since we are working with a csv file we do not need to pass a `source_type` into the constructor, we just need to pass the `sink_type` which in our case will be `REDSHIFT`.

In [5]:
feature_store = FeatureStore(
    sink_type=IngestionSinkType.REDSHIFT,
    source_type=IngestionSourceType.REDSHIFT
)



Now we create a Feature Table object. First I parse the id and feature columns to feast `Entity` and `Feature` objects:

In [6]:
def get_entities(id_columns: List[str]) -> List[feast.Entity]:
    return [feast.Entity(name=id_col) for id_col in id_columns] 

def get_features(feature_list: List[str]) -> List[feast.Feature]:
    features = []
    for feature in feature_list:
        if feature == "created_timestamp" or feature == "event_timestamp":
            dtype = feast.ValueType.BYTES
        elif feature == "target":
            dtype = feast.ValueType.INT32
        else:
            dtype = feast.ValueType.FLOAT
        features.append(feast.Feature(name=feature, dtype=dtype))
    return features

In [7]:
FEATURE_TABLE_NAME = "feature_table_test"
FEATURES = [col for col in raw_data.columns if col != "id"]
IDS = ["id"]

In [8]:
feature_table = FeatureTable(
    name=FEATURE_TABLE_NAME,
    feature_store=feature_store,
    entities=get_entities(id_columns=IDS),
    features=get_features(feature_list=FEATURES),
    online=True
)

### Ingest features

To ingest features we could call the `ingest` method or if we want to transform the features before ingest we call the `transform_and_ingest` method. We need to pass the `feature_table` we just created and the `dataframe` we want to save together with a list of python functions to transform the raw data. These transformations could be creating new features or just cleaning the data.

We can also pass two more additional params, `renames` which will rename the features and `all_columns` which will filtered the features from your data source before sending to the feature store, for the purpose of our example we will leave it as ***None***

##### Feature engineering methods:

In [9]:
def add_columns(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_one"] = data["sepal_length"] + data["sepal_width"] 
    return data

def double_column(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_two"] = data["sepal_length"] * 2
    return data 

def multiply_columns(data: pd.DataFrame) -> pd.DataFrame:
    data["feature_three"] = data["sepal_length"] * data["sepal_width"]
    return data

In [10]:
query = f"""
    SELECT * 
    FROM {FEATURE_TABLE_NAME}
    """;

In [11]:
feature_store.read_transform_and_ingest(
    ft=feature_table, 
    query=query, 
    transformations=[add_columns, double_column, multiply_columns],
    binary_cols=None
)

  ssl_context: SSLContext = SSLContext()
  ssl_context: SSLContext = SSLContext()


### Retrieve features

In [12]:
retrieved_data = feature_store.get_training_features(feature_table=feature_table)

  return self.connectable.execution_options().execute(*args, **kwargs)


In [13]:
retrieved_data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,id,target,created_timestamp,event_timestamp,feature_one,feature_two,feature_three
1065,5.7,4.4,1.5,0.4,15,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,10.1,11.4,25.08
1056,4.6,3.4,1.4,0.3,6,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.0,9.2,15.64
1057,5.0,3.4,1.5,0.2,7,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.4,10.0,17.00
1058,4.4,2.9,1.4,0.2,8,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,7.3,8.8,12.76
1059,4.9,3.1,1.5,0.1,9,0,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,8.0,9.8,15.19
...,...,...,...,...,...,...,...,...,...,...,...
1002,7.1,3.0,5.9,2.1,102,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,10.1,14.2,21.30
1003,6.3,2.9,5.6,1.8,103,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,9.2,12.6,18.27
1004,6.5,3.0,5.8,2.2,104,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,9.5,13.0,19.50
1005,7.6,3.0,6.6,2.1,105,2,2022-07-14 18:08:05.487499,2022-07-14 18:08:05.488248,10.6,15.2,22.80
