# Feature Store Example (Stocks)

This notebook demonstrates the following:
- Generate features and feature-sets
- Build complex transformations and ingest to offline and real-time data stores
- Fetch feature vectors for training
- Save feature vectors for re-use in real-time pipelines
- Access features and their statistics in real-time

Install the latest MLRun package and restart the notebook

Setting up the environment and project

In [None]:
import mlrun
mlrun.set_environment(project="stocks")

## Create Sample Data For Demo

In [None]:
import pandas as pd
quotes = pd.DataFrame(
    {
        "time": [
            pd.Timestamp("2016-05-25 13:30:00.023"),
            pd.Timestamp("2016-05-25 13:30:00.023"),
            pd.Timestamp("2016-05-25 13:30:00.030"),
            pd.Timestamp("2016-05-25 13:30:00.041"),
            pd.Timestamp("2016-05-25 13:30:00.048"),
            pd.Timestamp("2016-05-25 13:30:00.049"),
            pd.Timestamp("2016-05-25 13:30:00.072"),
            pd.Timestamp("2016-05-25 13:30:00.075")
        ],
        "ticker": [
               "GOOG",
               "MSFT",
               "MSFT",
               "MSFT",
               "GOOG",
               "AAPL",
               "GOOG",
               "MSFT"
           ],
           "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
           "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
    }
)

trades = pd.DataFrame(
       {
           "time": [
               pd.Timestamp("2016-05-25 13:30:00.023"),
               pd.Timestamp("2016-05-25 13:30:00.038"),
               pd.Timestamp("2016-05-25 13:30:00.048"),
               pd.Timestamp("2016-05-25 13:30:00.048"),
               pd.Timestamp("2016-05-25 13:30:00.048")
           ],
           "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
           "price": [51.95, 51.95, 720.77, 720.92, 98.0],
           "quantity": [75, 155, 100, 100, 100]
       }
)


stocks = pd.DataFrame(
       {
           "ticker": ["MSFT", "GOOG", "AAPL"],
           "name": ["Microsoft Corporation", "Alphabet Inc", "Apple Inc"],
           "exchange": ["NASDAQ", "NASDAQ", "NASDAQ"]
       }
)

import datetime
def move_date(df, col):
    max_date = df[col].max()
    now_date = datetime.datetime.now()
    delta = now_date - max_date 
    df[col] = df[col] + delta 
    return df

quotes = move_date(quotes, "time")
trades = move_date(trades, "time")

### View Demo Data

In [None]:
quotes

In [None]:
trades

In [None]:
stocks

## Define, Infer and Ingest Feature Sets

In [None]:
import mlrun.feature_store as fstore
from mlrun.feature_store.steps import *
from mlrun.features import MinMaxValidator

### Build & Ingest Simple Feature Set (stocks)

In [None]:
# add feature set without time column (stock ticker metadata) 
stocks_set = fstore.FeatureSet("stocks", entities=[fstore.Entity("ticker")])
fstore.ingest(stocks_set, stocks, infer_options=fstore.InferOptions.default())

### Build Advanced feature set - with feature engineering pipeline
Define a feature set with custom data processing and time aggregation functions 

In [None]:
# create a new feature set
quotes_set = fstore.FeatureSet("stock-quotes", entities=[fstore.Entity("ticker")])

**define a custom pipeline step (python class)**

In [None]:
class MyMap(MapClass):
    def __init__(self, multiplier=1, **kwargs):
        super().__init__(**kwargs)
        self._multiplier = multiplier

    def do(self, event):
        event["multi"] = event["bid"] * self._multiplier
        return event

**build and show the transformatiom pipeline**

Use `storey` stream processing classes along with library and custom classes

In [None]:
quotes_set.graph.to("MyMap", multiplier=3)\
                .to("storey.Extend", _fn="({'extra': event['bid'] * 77})")\
                .to("storey.Filter", "filter", _fn="(event['bid'] > 51.92)")\
                .to(FeaturesetValidator())

quotes_set.add_feature_aggregation("ask", ["sum", "max"], "1h", "10m", name="asks1")
quotes_set.add_feature_aggregation("ask", ["sum", "max"], "5h", "10m", name="asks5")
quotes_set.add_feature_aggregation("bid", ["min", "max"], "1h", "10m")

# add feature validation policy
quotes_set["bid"] = fstore.Feature(validator=MinMaxValidator(min=52, severity="info"))

# add default target definitions and plot
quotes_set.set_targets()
quotes_set.plot(rankdir="LR", with_targets=True)

**test and show the pipeline results locally (allow to quickly develop and debug)**

In [None]:
fstore.preview(
    quotes_set,
    quotes,
    entity_columns=["ticker"],
    timestamp_key="time",
    options=fstore.InferOptions.default(),
)

In [None]:
# print the feature set object
print(quotes_set.to_yaml())

### ingest data into offline and online stores
This will write to both targets (Parquet and NoSQL)

In [None]:
# save ingest data and print the FeatureSet spec
df = fstore.ingest(quotes_set, quotes)

## Get an Offline Feature Vector for Training
Example of combining features from 3 sources with time travel join of 3 tables with **time travel**

Specify a set of features and request the feature vector offline result as a dataframe

In [None]:
features = [
    "stock-quotes.multi",
    "stock-quotes.asks5_sum_5h as total_ask",
    "stock-quotes.bid_min_1h",
    "stock-quotes.bid_max_1h",
    "stocks.*",
]

vector = fstore.FeatureVector("stocks-vec", features, description="stocks demo feature vector")
vector.save()

In [None]:
resp = fstore.get_offline_features(vector, entity_rows=trades, entity_timestamp_column="time")
resp.to_dataframe()

## Initialize an online feature service and use it for real-time inference

In [None]:
service = fstore.get_online_feature_service("stocks-vec")

**Request feature vector statistics, can be used for imputing or validation**

In [None]:
service.vector.get_stats_table()

**Real-time feature vector request**

In [None]:
service.get([{"ticker": "GOOG"}, {"ticker": "MSFT"}])

In [None]:
service.get([{"ticker": "AAPL"}])

In [None]:
service.close()