# Qwak Feature Store Guide - Stream Feature Set with Pandas Transformations and Window Aggregations

Welcome to the Qwak Feature Store example! In this tutorial, we'll guide you through creating a sample Data Source, transforming it into a Feature Set, and leveraging its features for model training and inference using the Qwak Platform. 

Guides like this one aim to provide you with a starting point by offering a straightforward framework for working with Qwak. However, we encourage you to explore our [comprehensive documentation](https://docs-saas.qwak.com/docs/feature-store-overview) for more detailed and specific information.

Before diving in, make sure you have the Qwak SDK installed and authenticated. If you haven't done so already, follow these steps:

1. [Install the Qwak SDK](https://docs-saas.qwak.com/docs/installing-the-qwak-sdk) - Ensure you have the SDK installed on your local environment.
2. [Authenticate](https://docs-saas.qwak.com/docs/installing-the-qwak-sdk#1-via-qwak-cli) - Authenticate with a new Personal or Service Qwak API Key.

To gain a deeper understanding of Feature Stores and their importance in machine learning workflows, we recommend checking out our comprehensive [documentation](https://docs-saas.qwak.com/docs/feature-store-overview) and our blog article on [What is a Feature Store](https://www.qwak.com/post/what-is-a-feature-store-in-ml). Let's get started!


## Create the Kafka-based Data Source

In Qwak, a Data Source serves as a configuration object that specifies how to access and fetch your data. It includes metadata such as name and description, connection details to the data store/storage, the query or resource to retrieve, and the relevant time column for indexing time series features.

In the following example, we'll connect to a publicly accessible Kafka broker hosted on AWS via MSK which contains the data from our Churn Model example.

**Kafka Source supports authentication through user/password, so we will store those first in Qwak Secrets before creating the Data Source.**


In [3]:
!qwak secrets set --name 'sample-kafka-user' --value 'kafka-user'
!qwak secrets set --name 'sample-kafka-password' --value 'kafka-password'

Creating secret named 'sample-kafka-user' with value length of 10...
Created!
Creating secret named 'sample-kafka-password' with value length of 14...
Created!


In [26]:
%%writefile streaming_data_source.py
from qwak.feature_store.data_sources import *
import json

BOOTSTRAP_SERVERS = 'b-1-public.basicmskcluster.6o8ynq.c2.kafka.us-east-1.amazonaws.com:9196,b-2-public.basicmskcluster.6o8ynq.c2.kafka.us-east-1.amazonaws.com:9196' # comma-separated sequence of host:port entries, example: b-1-public.basicmskcluster.6o8ynq.c2.kafka.us-east-1.amazonaws.com:9196
KAFKA_TOPIC = 'users' # comma separated list of 1 or more topics

json_schema = {
        "type": "struct",
        "fields": [
            {"metadata": {}, "name": "churn", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "User_Id", "nullable": False, "type": "string"},
            {"metadata": {}, "name": "State", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "Account_Length", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Area_Code", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Phone", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "Intl_Plan", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "VMail_Plan", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "VMail_Message", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Day_Mins", "nullable": True, "type": "float"},
            {"metadata": {}, "name": "Day_Calls", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Eve_Mins", "nullable": True, "type": "float"},
            {"metadata": {}, "name": "Eve_Calls", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Night_Mins", "nullable": True, "type": "float"},
            {"metadata": {}, "name": "Night_Calls", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "Intl_Mins", "nullable": True, "type": "float"},
            {"metadata": {}, "name": "Intl_Calls", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "CustServ_Calls", "nullable": True, "type": "integer"},
            {"metadata": {}, "name": "event date", "nullable": True, "type": "string"},
            {"metadata": {}, "name": "Agitation_Level", "nullable": True, "type": "string"}
        ]
    }

# Create deserializer with JSON schema
deserializer = GenericDeserializer(
	message_format = MessageFormat.JSON,
	schema = json.dumps(json_schema)
)

kafka_source = KafkaSource(
    name ='Risk_Data_Streaming',
    description ='Risk Model Data Streamed from Kafka',       
    bootstrap_servers = BOOTSTRAP_SERVERS,                      # List of HOST:PORT entries separated by comma
    subscribe = KAFKA_TOPIC,                                    # List of Kafka topic/s to read from
		authentication_method = SaslAuthentication(
            username_secret = 'sample-kafka-user',              # Qwak Secret where the Kafka SASL/SCRAM user is stored
            password_secret = 'sample-kafka-password',          # Qwak Secret where the Kafka SASL/SCRAM password is stored
            sasl_mechanism = SaslMechanism.SCRAMSHA512,         # Qwak support authentication via SASL_SSL with SCRAM with various SHA options
            security_protocol = SecurityProtocol.SASL_SSL,      # Qwak support authentication via SASL_SSL or Plaintext
        ),
	deserialization=deserializer,                               # JSON or AVRO deserializers are currently supported, together with Custom Deserializers
)

Overwriting streaming_data_source.py


### Additional Considerations for Registering Data Sources

When registering Data Sources in Qwak, it's essential to ensure that the underlying data store is accessible by the platform. Depending on your deployment model (Hybrid or SaaS), there are different ways to grant Qwak access to your data.

#### Accessing AWS Resources:

If your data is stored in AWS services, you can grant access to Qwak using an IAM role ARN. For detailed instructions, refer to our documentation on [Accessing AWS Resources with IAM Role](https://docs-saas.qwak.com/docs/accessing-aws-resources-with-iam-role).

#### Using Qwak Secrets:

You can pass the Kafka SASL credentials as Qwak Secrets. This approach provides a secure way to manage and authenticate access to your data. For more information, see [Qwak Secret Management](https://docs-saas.qwak.com/docs/secret-management).

For more information about the types of Data Sources supported by Qwak, refer to our documentation:
- [Batch Data Sources](https://docs-saas.qwak.com/docs/batch-data-sources)
- [Streaming Data Sources](https://docs-saas.qwak.com/docs/streaming-data-sources)

<br>

### Sampling Data from the Data Source

It's important to note that the data source object cannot be used as a query engine independently (for now). However, it can serve as a sampling mechanism to verify that the data is being fetched properly before registration to the Qwak Platform.

**The Kafka offset will be stored internally in the Qwak Data Engine and will read all the data from the topic.**


In [27]:
%run streaming_data_source.py

df_sample = kafka_source.get_sample(2)
print (f"Data Source Data Types:\n\n{df_sample.dtypes}\n")
print (f"Data Source Sample :\n\n{df_sample.head(7).to_string()}\n")

Data Source Data Types:

churn               object
User_Id             object
State               object
Account_Length       int64
Area_Code            int64
Phone               object
Intl_Plan           object
VMail_Plan          object
VMail_Message        int64
Day_Mins           float64
Day_Calls            int64
Eve_Mins           float64
Eve_Calls            int64
Night_Mins         float64
Night_Calls          int64
Intl_Mins          float64
Intl_Calls           int64
CustServ_Calls       int64
event date          object
Agitation_Level     object
dtype: object

Data Source Sample :

  churn                               User_Id State  Account_Length  Area_Code     Phone Intl_Plan VMail_Plan  VMail_Message    Day_Mins  Day_Calls    Eve_Mins  Eve_Calls  Night_Mins  Night_Calls  Intl_Mins  Intl_Calls  CustServ_Calls           event date Agitation_Level
0     0  ed377768-cab6-433d-af34-88b693c72d67    ND              59        510  351-4226         0          0              0  

## Registering the Data Source with the Qwak Platform

After verifying that the Data Source returns the desired results, the next step is to register it with the Qwak Platform.

In [18]:
!echo "Y" | qwak features register -p streaming_data_source.py

[K[?25h[34m✅[0m Finding Entities to register (0:00:00.00)
👀 Found 0 Entities
----------------------------------------
[K[?25h[34m✅[0m Finding Data Sources to register (0:00:00.00)
👀 Found 1 Data Sources
Validating 'Risk_Data_Streaming' data source
[K[?25h[34m✅[0m  (0:00:04.98)
✅ Validation completed successfully, got data source columns:
column name      type
---------------  ------
churn            string
User_Id          string
State            string
Account_Length   int
Area_Code        int
Phone            string
Intl_Plan        string
VMail_Plan       string
VMail_Message    int
Day_Mins         float
Day_Calls        int
Eve_Mins         float
Eve_Calls        int
Night_Mins       float
Night_Calls      int
Intl_Mins        float
Intl_Calls       int
CustServ_Calls   int
event date       string
Agitation_Level  string
Update existing Data Source 'Risk_Data_Streaming' from source file '/Users/haha/Projects/qwak-examples/feature_store/streaming_data_source.py'?
contin

<hr><br>

## Creating the Streaming Feature Set from the Data Source

When creating a Feature Set, it typically consists of the following components:

- **Metadata:** Includes name, key, data sources, and the timestamp column used for indexing.
- **Scheduling Expression:** For Batch Feature Sets, this defines when ingestion jobs should run.
- **Cluster Type:** Specifies the resources to use for running the ingestion job.
- **Backfill:** Determines how far back in time the Feature Set should ingest data.
- **Transformation:** Can be SQL-based or UDF-based (currently Koalas) for data transformation.

[Read Policies](https://docs-saas.qwak.com/docs/read-policies) instruct Qwak on which data to fetch from the Data Source. 
- **NewOnly:** Fetches records created after the last ingestion.
- **TimeFrame:** Fetches records within a specified timeframe.
- **FullRead:** Fetches all data from the Data Source in every ingestion job, which can be heavy for main tables but useful for foreign key-based tables.

For this example, we'll use FullRead since our sample Data Source is static, consisting of a single CSV file.

The execution specification refers to the size of the cluster used for data ingestion. More information can be found in the [Qwak docs](https://docs-saas.qwak.com/docs/instance-sizes#feature-store).

### The feature engineering process

As you're aware, a Koalas DataFrame is akin to a pandas DataFrame but operates across a Spark cluster. Once your data resides in a Koalas dataframe, it's advisable to carry out transformations directly within Koalas rather than switching to other data types like pandas. This is because switching to other dataframes, such as pandas, consolidates the distributed data into a single node, leading to sequential computation rather than parallel.

The qwak `transform` method should return a KoalasTransformation object, indicating your end-to-end transformation function, in this case, `extract_features`. You're not restricted to a single function; you can divide the logic into as many as necessary and call them within a main function passed to the transformation.

The function's input will be a Python dictionary with data source names as keys and their associated values as Koalas DataFrames containing the ingested data from the DataSource.

It's important to note that while this code is callable locally, via `get_sample()`` as demonstrated in the next cell, it's not available for debugging, as the actual transformations will be executed remotely.

To utilize Spark Pandas Transformation effectively, **ensure you have Python 3.8 installed** and `cloudpickle` locked to version `2.2.1`, as your code will be pickled and provided to Qwak for registration and testing.

In [12]:
%%writefile streaming_agg_feature_set.py

import pandas as pd
from qwak.feature_store.feature_sets import streaming
from qwak.feature_store.feature_sets.transformations import (
    Column,
    Schema,
    QwakAggregation,
    SparkSqlTransformation,
    Type,
    qwak_pandas_udf
)
from qwak.feature_store.feature_sets.execution_spec import ClusterTemplate

@qwak_pandas_udf(output_schema=Schema([Column(type=Type.timestamp)]))
def parse_date(event_dates: pd.Series) -> pd.Series:
    return pd.to_datetime(event_dates)

@streaming.feature_set(
    name="telecom-aggregated-usage",
    key="user_id",
    data_sources=["Risk_Data_Streaming"],
    timestamp_column_name="timestamp",
)
@streaming.execution_specification(
    online_cluster_template=ClusterTemplate.NANO,
    offline_cluster_template=ClusterTemplate.NANO,
)

def transform():
    sql = """
        
        SELECT
            User_Id,
            Day_Mins,
            Day_Calls,
            Eve_Calls,
            Night_Calls,
            CustServ_Calls,
            Agitation_Level,
            
            -- Derived Features
            (Day_Calls + Eve_Calls + Night_Calls) AS Total_Calls,
            
            -- Date Column
            parse_date(`event date`) as timestamp,

            -- Required for the window aggregations
            topic,
            partition,
            offset

        FROM Risk_Data_Streaming
    """

    return (
        SparkSqlTransformation( sql,
                                functions=[parse_date])
        .aggregate(QwakAggregation.avg("Day_Mins"))
        .aggregate(QwakAggregation.avg("CustServ_Calls"))
        .aggregate(QwakAggregation.max("Agitation_Level"))
        .aggregate(QwakAggregation.sum("Total_Calls"))
        .by_windows("7 days", "30 days")
    )


Overwriting streaming_agg_feature_set.py


## Sampling the Data Source and Printing Data and Data Types

If your data source takes more than 5 minutes to query or fetch a sample of the data (for example, due to long-running queries), your sampling process may fail with a timeout error. In such cases, you can skip validation during registration with Qwak and proceed to register your feature set, allowing it to run an ingestion job.

### Note:
The sampling process is essential for verifying that the data is queried properly. However, if it takes too long, you can proceed with the registration without validation and rely on the ingestion job to ensure data correctness.


In [6]:
%run streaming_agg_feature_set.py

df_sample = transform.get_sample()
print (f"Window Aggregations Feature Set Data Types:\n\n{df_sample.dtypes}\n")
print (f"Window Aggregations Feature Set Sample :\n\n{df_sample.head(2).to_string()}\n")

Data Source Data Types:

user_id                     object
avg_CustServ_Calls_7d      float64
sum_Total_Calls_7d           int64
avg_Day_Mins_7d            float64
max_Agitation_Level_7d      object
avg_CustServ_Calls_30d     float64
sum_Total_Calls_30d          int64
avg_Day_Mins_30d           float64
max_Agitation_Level_30d     object
dtype: object

Data Source Sample :

                                user_id  avg_CustServ_Calls_7d  sum_Total_Calls_7d  avg_Day_Mins_7d max_Agitation_Level_7d  avg_CustServ_Calls_30d  sum_Total_Calls_30d  avg_Day_Mins_30d max_Agitation_Level_30d
0  d6e6de2d-3a4d-4795-82f7-317373eb9a33                    0.0                 294       207.699997                    135                     0.0                  294        207.699997                     135
1  2f370912-53c0-4cce-81ce-ea803f30207b                    2.0                 299       192.600006                    108                     2.0                  299        192.600006                  

### Creating a feature set with Pandas UDFs and SQL, without aggregations

In [23]:
%%writefile streaming_feature_set.py

import pandas as pd
from qwak.feature_store.feature_sets import streaming
from qwak.feature_store.feature_sets.transformations import (
    Column,
    Schema,
    SparkSqlTransformation,
    Type,
    qwak_pandas_udf
)
from qwak.feature_store.feature_sets.execution_spec import ClusterTemplate

@qwak_pandas_udf(output_schema=Schema([Column(type=Type.timestamp)]))
def parse_date(event_dates: pd.Series) -> pd.Series:
    return pd.to_datetime(event_dates)

@streaming.feature_set(
    name="telecom-usage-features",
    key="user_id",
    data_sources=["Risk_Data_Streaming"],
    timestamp_column_name="timestamp",
    offline_scheduling_policy="*/5 * * * *",
    online_trigger_interval=30
)
@streaming.execution_specification(
    online_cluster_template=ClusterTemplate.NANO,
    offline_cluster_template=ClusterTemplate.NANO,
)

def transform():
    sql = """
        
        SELECT
            User_Id,
            Day_Mins,
            Day_Calls,
            Eve_Calls,
            Night_Calls,
            CustServ_Calls,
            Agitation_Level,
            
            -- Derived Features
            (Day_Mins + Eve_Mins + Night_Mins) AS Total_Mins,
            (Day_Calls + Eve_Calls + Night_Calls) AS Total_Calls,
            (Day_Mins + Eve_Mins + Night_Mins) / NULLIF((Day_Calls + Eve_Calls + Night_Calls), 0) AS Mins_Per_Call,
            Intl_Mins / NULLIF((Day_Mins + Eve_Mins + Night_Mins), 0) AS Intl_Usage_Ratio,
            CustServ_Calls / NULLIF((Day_Calls + Eve_Calls + Night_Calls), 0) AS CustServ_Call_Ratio,
            
            -- Date Column
            parse_date(`event date`) as timestamp

        FROM Risk_Data_Streaming
    """

    return (
        SparkSqlTransformation( sql,
                                functions=[parse_date])
    )


Overwriting streaming_feature_set.py


In [18]:
%run streaming_feature_set.py

df_sample = transform.get_sample()
print (f"SQL Feature Set Data Types:\n\n{df_sample.dtypes}\n")
print (f"SQL Feature Set Sample :\n\n{df_sample.head(2).to_string()}\n")

SQL Feature Set Data Types:

User_Id                        object
Day_Mins                      float64
Day_Calls                       int64
Eve_Calls                       int64
Night_Calls                     int64
CustServ_Calls                  int64
Agitation_Level                object
Total_Mins                    float64
Total_Calls                     int64
Mins_Per_Call                 float64
Intl_Usage_Ratio              float64
CustServ_Call_Ratio           float64
timestamp              datetime64[ns]
dtype: object

SQL Feature Set Sample :

                                User_Id    Day_Mins  Day_Calls  Eve_Calls  Night_Calls  CustServ_Calls Agitation_Level  Total_Mins  Total_Calls  Mins_Per_Call  Intl_Usage_Ratio  CustServ_Call_Ratio  timestamp
0  8fbe0c29-b2fd-4772-946a-5b4eb0087648  197.199997        118         70          104               0             162       746.0          292       2.554795          0.005228             0.000000 2020-01-01
1  2f370912-53c0-4

## Visualizing Data in the Feature Store

The displayed data represents the features stored in the feature store, which will be utilized in our Qwak ML model for both training and inference purposes.

Once we have confirmed that the data appears as expected and meets our requirements, we can proceed with registering the feature set in Qwak.


In [24]:
!echo "Y" | qwak features register -p streaming_agg_feature_set.py
!echo "Y" | qwak features register -p streaming_feature_set.py

Notice that BatchInferenceClient and FeedbackClient are not available in the skinny package. In order to use them, please install them as extras: pip install "qwak-inference[batch,feedback]".
[K[?25h[34m✅[0m Finding Entities to register (0:00:00.10)
👀 Found 0 Entities
----------------------------------------
[K[?25h[34m✅[0m Finding Data Sources to register (0:00:00.00)
👀 Found 0 Data Sources
----------------------------------------
[K[?25h[34m✅[0m Finding Feature Sets to register (0:00:00.00)
👀 Found 1 Feature Set(s)
Update existing feature set 'telecom-aggregated-usage' from source file '/Users/haha/Projects/qwak-examples/feature_store/streaming_agg_feature_set.py'?
continue? [y/N]: Validating 'telecom-aggregated-usage' feature set
[K[?25h[34m✅[0m  (0:00:05.85)
✅ Validation completed successfully, got data source columns:
column name              type
-----------------------  ------
user_id                  string
avg_CustServ_Calls_7d    double
sum_Total_Calls_7d     

<br>

#### Verifying Feature Set Registration

To ensure that the Feature Set has been successfully registered and is valid, execute the following command to list all Feature Sets associated with your Qwak account:

<br>

In [None]:
!qwak features list

<br>

For more information on the available Feature Store SDK commands, please use the CLI help:

<br>

In [3]:
!qwak features --help

Notice that BatchInferenceClient and FeedbackClient are not available in the skinny package. In order to use them, please install them as extras: pip install "qwak-inference[batch,feedback]".
Usage: qwak features [OPTIONS] COMMAND [ARGS]...

  Commands for interacting with the Qwak Feature Store

Options:
  --help  Show this message and exit.

Commands:
  backfill          Trigger a backfill process for a Feature Set
  delete            Delete by name a feature store object - a feature...
  execution-status  Retrieve the current status of an execution...
  list              List registered feature sets
  pause             Pause a running feature set
  register          Register and deploy all feature store object under...
  resume            Resume a paused feature set
  trigger           Trigger a batch feature set job ingestion job


<hr><br>

## Consuming Features from the Offline Feature Store (Training/Batch Inference)

To retrieve features from the Offline Feature Store for training or batch inference, you can use two methods:

1. **By List of IDs and Timestamp**:
   - Fetches records associated with the provided set of keys, inserted at a specific timestamp.
   - Query date must fall between the start and end timestamp.

2. **By Date Range**:
   - Retrieves all records within the specified date range.
   - May include multiple records per key for time series data.


For simplicity we will focus on the second option, but you can find more information on the first one in [our docs](https://docs-saas.qwak.com/docs/getting-features-for-training#get-feature-values). 

In [22]:
# Importing the Feature Store clients used to fetch results
from qwak.feature_store.offline import OfflineClientV2
from qwak.feature_store.offline.feature_set_features import FeatureSetFeatures

from datetime import datetime
import pandas as pd

FEATURE_SET = 'telecom-usage-features'
FEATURES_LIST = ['Day_Mins', 'Day_Calls', 'Eve_Calls', 'Night_Calls', 'CustServ_Calls', 'Agitation_Level', 'Total_Mins', 'Total_Calls', 'Mins_Per_Call', 'Intl_Usage_Ratio', 'CustServ_Call_Ratio']

def fetch_training_features(start_time: datetime, end_time: datetime) -> pd.DataFrame: 

    offline_feature_store = OfflineClientV2()
    
    features = FeatureSetFeatures(
        feature_set_name= FEATURE_SET,
        feature_names= FEATURES_LIST)
    
    # It's recommended to be surrounded in a try/catch
    features: pd.DataFrame = offline_feature_store.get_feature_range_values(
        features=features,
        start_date=start_time,
        end_date=end_time
    )

    return features
    

if __name__ == '__main__':

    # Define the date range for feature retrieval
    feature_range_start = datetime(year=2019, month=12, day=1)
    feature_range_end = datetime.today()

    train_df = fetch_training_features(feature_range_start, feature_range_end)

    print(f"\n\nTraining data sample:\n\n{train_df.head().to_string()}\n")

QwakException: [91mGot the following run-time exception: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "There was a server error trying to handle an exception"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"There was a server error trying to handle an exception", grpc_status:13, created_time:"2024-07-29T12:40:44.101109+03:00"}"
>[0m

<br>

Please note that although the Feature Set has been registered, it usually takes a couple of minutes to run the first ingestion job. This means you might not have any data to fetch until the ingestion job runs at least once.

To verify the status of the ingestion, please refer to the Qwak Dashboard -> Feature Sets -> `credit-risk-fs-sql` -> Jobs.

![Feature Store Dashboard](ingestion-job-finished.png)


<br>

<hr><br>

## Consuming Features for Real-Time Inference from the Online Store

In the previous example, we retrieved historical data from the Offline Store, which is storing all the historical data. Now, we'll use the Online Store, which is optimized for real-time use-cases and provides a low-latency feature retrieval mechanism. 
Qwak provides two ways to query the Online store and look up the most recent feature vector for a given key:

###  1. Enriching Inference Requests with Features from Online Store

Qwak natively integrates the Model runtime with the Feature Store, offering an easy way to leverage very low-latency feature retrieval. This is done without specifically running a query, just by sending the feature set key in the model request input. This will automatically extract the latest features for that `key`, in our case `user` during a model serving request.


Note: Below is an example code for local use only. If you're using it for a live model, please remove the `run_local` import.

**The ModelSchema definition is mandatory to enable feature extraction via the OnlineClient or qwak.api decorator**.


In [27]:
from qwak.model.tools import run_local # utility tooling for local testing and debugging - REMOVE BEFORE BUILDING REMOTELY

from qwak.model.base import QwakModel
from qwak.model.adapters import DefaultOutputAdapter, DataFrameInputAdapter
from qwak.model.schema import ModelSchema, InferenceOutput
from qwak.model.schema_entities import FeatureStoreInput
import pandas as pd
import qwak

class ChurnPredictionModel(QwakModel):
   
    def __init__(self):
        pass

    def build(self):
        pass

    def schema(self) -> ModelSchema:
        model_schema = ModelSchema(
            inputs= [FeatureStoreInput(name=f'{FEATURE_SET}.{feature}') for feature in FEATURES_LIST],
            outputs=[InferenceOutput(name="churn_probability", type=float)]
        )
        return model_schema

    @qwak.api(
        feature_extraction=True,
        input_adapter=DataFrameInputAdapter(),
        output_adapter=DefaultOutputAdapter()
    )
    def predict(self, df: pd.DataFrame, extracted_df: pd.DataFrame) -> pd.DataFrame:
        print(f"\nInput dataframe df:\n{df}")
        print(f"\nFeature Set extracted dataframe:\n{extracted_df.to_string()}")
        return pd.DataFrame([['score', 0.5]])


Calling the model locally to test `predict()`:

In [29]:
def test_model_locally():
    # Create a new instance of the model
    m = ChurnPredictionModel()

    # Define the columns
    columns = ["user_id"]

    # Define the data
    data = [["0166f628-07a6-4461-9870-fc9df0df7a5b"]]
    
    # Create the DataFrame and convert it to JSON
    json_payload = pd.DataFrame(data=data, columns=columns).to_json()

    print("Predicting for: \n\n", json_payload)
    

    # Run local inference using the model and print the prediction
    # The run_local function is part of the qwak library and allows for local testing of the model
    prediction = run_local(m, json_payload)
    print("\nPrediction: ", prediction)

test_model_locally()

Predicting for: 

 {"user_id":{"0":"0166f628-07a6-4461-9870-fc9df0df7a5b"}}


QwakException: [91mFailed to get online features results. Error is: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "upstream connect error or disconnect/reset before headers. reset reason: protocol error"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"upstream connect error or disconnect/reset before headers. reset reason: protocol error", grpc_status:14, created_time:"2024-07-29T13:54:17.95888+03:00"}"
>[0m

<br>
As you can see, the we only sent the `user` ID in the prediction request, and Qwak automatically extracted the relevant (latest) features for that key from the Feature Set specified in the Model Schema. 

This approach is automatically logging the extraction latency to the model Analytics.

<br>

###  2. Features Lookup with the OnlineClient

With the previous approach we managed to enable a QwakModel to fetch features automatically and that approach is great for most cases. However what happens if we want to have more control over the keys we want to look up for at runtime, like for example looking up multiple keys for a single prediction request input. 

That's what the `OnlineClient` is for, to enable you explicit queries, as we'll exemplify below:

<br>

In [15]:
import pandas as pd
from qwak.feature_store.online.client import OnlineClient
from qwak.model.schema_entities import FeatureStoreInput
from qwak.model.schema import ModelSchema

model_schema = ModelSchema(
    inputs= [FeatureStoreInput(name=f'{FEATURE_SET}.{feature}') for feature in FEATURES_LIST],
    outputs=[InferenceOutput(name="credit_score", type=float)]
)
    
online_client = OnlineClient()

df = pd.DataFrame(columns=['user_id',],
                  data   =[['06cc255a-aa07-4ec9-ac69-b896ccf05322'],
                           ['46ad9e4b-1d0f-47b7-a73d-71cc66538b03'],
                           ['95ec0c53-4e27-4490-b85f-1448de70fc26']])
                  
online_features = online_client.get_feature_values(model_schema, df)


print(f"\n\Realtime features extracted:\n\n{online_features.to_string()}\n")

QwakException: [91mFailed to get online features results. Error is: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "upstream connect error or disconnect/reset before headers. reset reason: protocol error"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"upstream connect error or disconnect/reset before headers. reset reason: protocol error", grpc_status:14, created_time:"2024-07-15T18:48:18.663342+03:00"}"
>[0m

<br>

You may have noticed that the FeatureStoreInput names contain both the feature set name and the feature name. This design allows you to specify and utilize multiple feature sets within the same request.

Similar to the previous option, the `ModelSchema` is a required component. It informs Qwak about the features to include in the lookup.
