Welcome! Here we use a basic example to explain key concepts and user flows in Feast.
We focus on a specific example (that does not include online features or realtime models):
- Goal: build a platform for data scientists and engineers to share features for training offline models
- Use case: Predicting churn for drivers in a ride-sharing application.
- Stack: you have data in a combination of data warehouses (to be explored in a future module) and data lakes (e.g. S3)
- Installing Feast
- Exploring the data
- Reviewing Feast concepts
- User groups
- User group 1: ML Platform Team
- User group 2: ML Engineers
- User group 3: Data Scientists
- Exercise: merge a sample PR in your fork
- Conclusion
- FAQ
Before we get started, first install Feast with AWS or GCP dependencies:
- AWS
pip install "feast[aws]"
- GCP
pip install "feast[gcp]"
We've made some dummy data for this workshop in infra/driver_stats.parquet
. Let's dive into what the data looks like. You can follow along in explore_data.ipynb:
import pandas as pd
pd.read_parquet("infra/driver_stats.parquet")
This is a set of time-series data with driver_id
as the primary key (representing the driver entity) and event_timestamp
as showing when the event happened.
Let's quickly review some Feast concepts needed to build this ML platform / use case. Don't worry about committing all this to heart. We explore these concepts by example throughout the workshop.
Concept | Description |
---|---|
Data source | We need a FileSource (with an S3 path and endpoint override) to represent the driver_stats.parquet which will be hosted in S3. Data sources can generally be files or tables in data warehouses.Note: data sources in Feast need timestamps for now. For data that doesn't change, unfortunately this will require you to supply a dummy timestamp. |
Entity | The main entity we need here is a driver entity (with driver_id as the join key). Entities are used to construct composite keys to join feature views together. Typically, they map to the unique ids that you will have available at training / inference time. |
Feature view | We'll have various feature views corresponding to different logical groups of features and transformations from data sources keyed on entities. These can be shared / re-used by data scientists and engineers and are registered with feast apply . Feast also supports reusable last mile transformations with OnDemandFeatureView s. We explore this in Module 2 |
Feature service | We build different model versions with different sets of features using feature services (model_v1 , model_v2 ). Feature services group features a given model version depends on. It allows retrieving all necessary model features by using a feature service name. |
Registry | Where Feast tracks registered features, data sources, entities, feature services and metadata. Users + model servers will pull from this to discover registered features. This is by default a single file which is not human readable (it's serialized protocol buffer). In the upcoming release, you can use SQLAlchemy to store this in a database like Postgres / MySQL. |
Provider | We use the AWS provider here. A provider is a customizable interface that Feast uses to orchestrate feature generation / retrieval. Specifying a built-in provider (e.g. aws ) ensures your registry can be stored in S3 (and also specifies default offline / online stores) |
Offline store | What Feast will use to execute point in time joins (typically a data warehouse). Here we use file |
Online store | Low-latency data store that Feast uses to power online inference. Feast ingests offline feature values and streaming features into the online store. In this module, we do not need one. |
Let's look at a feature view we have in this module:
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
description="Hourly features",
entities=["driver"],
ttl=timedelta(seconds=8640000000),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
],
online=True,
source=driver_stats,
tags={"production": "True"},
owner="test2@gmail.com",
)
Feature views are one of the most important concepts in Feast.
They represent a group of features that should be physically colocated (e.g. in the offline or online stores). More specifically:
- The above feature view comes from a single batch source (a
FileSource
). - If we wanted to also use an online store, the above feature view would be colocated in e.g. a Redis hash or a DynamoDB table.
- A common need is to have both a batch and stream source for a single set of features. You'd thus have a
FeatureView
that essentially can have two different sources but map to the same physical location in the online store.
It's worth noting that there are multiple types of feature views. OnDemandFeatureView
s for example enable row-level transformations on data sources and request data, with the output features described in the schema
parameter.
There are three user groups here worth considering. The ML platform team, the ML engineers running batch inference on models, and the data scientists building models.
The team here sets up the centralized Feast feature repository and CI/CD in GitHub. This is what's seen in feature_repo_aws/
or feature_repo_gcp/
.
This assumes you have an AWS account & Terraform setup. If you don't:
- Set up an AWS account, install the AWS CLI, and setup your credentials with
aws configure
as per the AWS credential quickstart - Install Terraform
We've made a simple Terraform project to help with S3 bucket creation (and uploading the test data we need):
$ cd infra/aws
$ terraform init
$ terraform apply
var.project_name
The project identifier is used to uniquely namespace resources
Enter a value: danny
...
aws_s3_bucket.feast_bucket: Creating...
aws_s3_bucket.feast_bucket: Creation complete after 3s [id=feast-workshop-danny]
aws_s3_bucket_acl.feast_bucket_acl: Creating...
aws_s3_object.driver_stats_upload: Creating...
aws_s3_bucket_acl.feast_bucket_acl: Creation complete after 0s [id=feast-workshop-danny,private]
aws_s3_object.driver_stats_upload: Creation complete after 1s [id=driver_stats.parquet]
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
Outputs:
project_bucket = "s3://feast-workshop-danny"
project_name = "danny"
This assumes you have an GCP account & Terraform setup. If you don't:
- Set up an GCP account, install the
gcloud
CLI - Install Terraform
We've made a simple Terraform project to help with GCS bucket creation (and uploading the test data we need). We'll also need a service account here which the GitHub CI can later use as well to manage resources.
- Create a project in GCP (https://console.cloud.google.com/, we use
feast-workshop
as the name below) - Setup credentials (e.g. install
gcloud
, then do)gcloud config configurations create personal gcloud config set project feast-workshop gcloud auth login
- Create service account and let terraform know where your credentials are
$ gcloud iam service-accounts create feast-workshop-sa --display-name "Terraform and GitHub CI account" Created service account [feast-workshop-sa].
- Grant the service account access to your project:
$ gcloud projects add-iam-policy-binding feast-workshop --member serviceAccount:feast-workshop-sa@feast-workshop.iam.gserviceaccount.com --role roles/editor Updated IAM policy for project [feast-workshop].
- Generate keys
$ gcloud iam service-accounts keys create ~/.config/gcloud/feast-workshop.json --iam-account feast-workshop-sa@feast-workshop.iam.gserviceaccount.com created key [YOUR PRIVATE KEY] of type [json] as [/Users/dannychiao/.config/gcloud/feast-workshop.json] for [feast-workshop-sa@feast-workshop.iam.gserviceaccount.com]
- Let terraform know where your credentials are:
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/feast-workshop.json
- Run the logic in
infra/gcp
$ cd infra/gcp $ terraform init $ terraform apply var.gcp_project The GCP project id Enter a value: feast-workshop var.project_name The project identifier is used to uniquely namespace resources Enter a value: danny ... google_storage_bucket.feast_bucket: Creating... google_storage_bucket.feast_bucket: Creation complete after 1s [id=feast-workshop-danny] google_storage_bucket_object.driver_stats_upload: Creating... google_storage_bucket_object.driver_stats_upload: Creation complete after 0s [id=feast-workshop-danny-driver_stats.parquet] Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs: project_bucket = "gs://feast-workshop-danny" project_name = "danny"
The first thing a platform team needs to do is setup a feature_store.yaml
file within a version controlled repo like GitHub. feature_store.yaml
is the primary way to configure an overall Feast project. We've setup a sample feature repository in feature_repo_aws/
or feature_repo_gcp/
There are two files in feature_repo_aws
or feature_repo_gcp
you need to change to point to your bucket:
data_sources.py
# AWS version
driver_stats = FileSource(
name="driver_stats_source",
path="s3://[INSERT YOUR BUCKET]/driver_stats.parquet",
s3_endpoint_override="http://s3.us-west-2.amazonaws.com", # Needed since s3fs defaults to us-east-1
timestamp_field="event_timestamp",
created_timestamp_column="created",
description="A table describing the stats of a driver based on hourly logs",
owner="test2@gmail.com",
)
# GCP version
driver_stats = FileSource(
name="driver_stats_source",
path="gs://feast-workshop-danny/driver_stats.parquet", # TODO: Replace with your bucket
timestamp_field="event_timestamp",
created_timestamp_column="created",
description="A table describing the stats of a driver based on hourly logs",
owner="test2@gmail.com",
)
feature_store.yaml
# AWS version
project: feast_demo_aws
provider: aws
registry: s3://[INSERT YOUR BUCKET]/registry.pb
online_store: null
offline_store:
type: file
flags:
alpha_features: true
on_demand_transforms: true
# GCP version
project: feast_demo_gcp
provider: gcp
registry: gcs://feast-workshop-danny/registry.pb # TODO: Replace with your bucket
online_store: null
offline_store:
type: file
A quick explanation of what's happening in this feature_store.yaml
:
Key | What it does | Example |
---|---|---|
project |
Gives infrastructure isolation via namespacing (e.g. online stores + Feast objects). | any unique name within your organization (e.g. feast_demo_aws ) |
provider |
Defines registry location & sets defaults for offline / online stores | gcp enables a GCS-based registry and sets BigQuery + Datastore as the default offline / online stores. |
registry |
Defines the specific path for the registry (local, gcs, s3, etc) | s3://[YOUR BUCKET]/registry.pb |
online_store |
Configures online store (if needed for supporting real-time models) | null , redis , dynamodb , datastore , postgres , hbase (each have their own extra configs) |
offline_store |
Configures offline store, which executes point in time joins | bigquery , snowflake.offline , redshift , spark , trino (each have their own extra configs) |
flags |
(legacy) Soon to be deprecated way to enable experimental functionality. |
- Project
- Users can only request features from a single project
- Provider
- Default offline or online store choices can be easily overriden in
feature_store.yaml
.- For example, one can use the
aws
provider and specify Snowflake as the offline store:project: feast_demo_aws provider: aws registry: s3://[INSERT YOUR BUCKET]/registry.pb online_store: null offline_store: type: snowflake.offline account: SNOWFLAKE_DEPLOYMENT_URL user: SNOWFLAKE_USER password: SNOWFLAKE_PASSWORD role: SNOWFLAKE_ROLE warehouse: SNOWFLAKE_WAREHOUSE database: SNOWFLAKE_DATABASE
- For example, one can use the
- Default offline or online store choices can be easily overriden in
- Offline store
- We recommend users use data warehouses or Spark as their offline store for performant training dataset generation.
- In this workshop, we use file sources for instructional purposes. This will directly read from files (local or remote) and use Dask to execute point-in-time joins.
- A project can only support one type of offline store (cannot mix Snowflake + file for example)
- Each offline store has its own configurations which map to YAML. (e.g. see BigQueryOfflineStoreConfig):
- We recommend users use data warehouses or Spark as their offline store for performant training dataset generation.
- Online store
- You only need this if you're powering real time models (e.g. inferring in response to user requests)
- If you are precomputing predictions in batch (aka batch scoring), then the online store is optional. You should be using the offline store and running
feature_store.get_historical_features
. We touch on this later in this module.
- If you are precomputing predictions in batch (aka batch scoring), then the online store is optional. You should be using the offline store and running
- Each online store has its own configurations which map to YAML. (e.g. RedisOnlineStoreConfig)
- You only need this if you're powering real time models (e.g. inferring in response to user requests)
- Custom offline / online stores
- Generally, custom offline + online stores and providers are supported and can plug in.
- e.g. see adding a new offline store, adding a new online store
- Example way of using a custom offline store you define:
project: test_custom registry: data/registry.db provider: local offline_store: type: feast_custom_offline_store.CustomOfflineStore
With the feature_store.yaml
setup, you can now run feast plan
to see what changes would happen with feast apply
.
feast plan
Sample output:
$ feast plan
Created entity driver
Created feature view driver_hourly_stats
Created feature service model_v2
Created feature service model_v1
No changes to infrastructure
Now run feast apply
.
This will parse the feature, data source, and feature service definitions and publish them to the registry. It may also setup some tables in the online store to materialize batch features to (in this case, we set the online store to null so no online store changes will occur).
$ feast apply
Created entity driver
Created feature view driver_hourly_stats
Created feature service model_v1
Created feature service model_v2
Deploying infrastructure for driver_hourly_stats
You can now run Feast CLI commands to verify Feast knows about your features and data sources.
$ feast feature-views list
NAME ENTITIES TYPE
driver_hourly_stats {'driver'} FeatureView
Feast comes with an experimental Web UI. Users can already spin this up locally with feast ui
, but you may want to have a Web UI that is universally available. Here, you'd likely deploy a service that runs feast ui
on top of a feature_store.yaml
, with some configuration on how frequently the UI should be refreshing its registry.
Note: If you're using Windows, you may need to run feast ui -h localhost
instead.
$ feast ui
INFO: Started server process [10185]
05/15/2022 04:35:58 PM INFO:Started server process [10185]
INFO: Waiting for application startup.
05/15/2022 04:35:58 PM INFO:Waiting for application startup.
INFO: Application startup complete.
05/15/2022 04:35:58 PM INFO:Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
05/15/2022 04:35:58 PM INFO:Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
The feature repository is already created for you on GitHub.
We setup CI/CD to automatically manage the registry. You'll want e.g. a GitHub workflow that
- on pull request, runs
feast plan
- on PR merge, runs
feast apply
.
We recommend automatically running feast plan
on incoming PRs to describe what changes will occur when the PR merges.
- This is useful for helping PR reviewers understand the effects of a change.
- This can prevent breaking models in production (e.g. catching PRs that would change features used by an existing model version (`FeatureService)).
An example GitHub workflow that runs feast plan
on PRs (See feast_plan_aws.yml or feast_plan_gcp.yml, which are setup in this workshop repo)
name: Feast plan
on: [pull_request] # Should be triggered once then manually if possible
jobs:
feast_plan:
runs-on: ubuntu-latest
steps:
- name: Setup Python
id: setup-python
uses: actions/setup-python@v2
with:
python-version: "3.7"
architecture: x64
- name: Set up AWS SDK
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
# Run `feast plan`
- uses: actions/checkout@v2
- name: Install feast
run: pip install "feast[aws]"
- name: Capture `feast plan` in a variable
id: feast_plan
env:
FEAST_USAGE: "False"
FEAST_FORCE_USAGE_UUID: None
IS_TEST: "True"
run: |
body=$(cd module_0/feature_repo_aws; feast plan)
body="${body//'%'/'%25'}"
body="${body//$'\n'/'%0A'}"
body="${body//$'\r'/'%0D'}"
echo "::set-output name=body::$body"
# Post a comment on the PR with the results of `feast plan`
- name: Create comment
uses: peter-evans/create-or-update-comment@v1
if: ${{ steps.feast_plan.outputs.body }}
with:
issue-number: ${{ github.event.pull_request.number }}
body: |
${{ steps.feast_plan.outputs.body }}
You'll notice the above logic reference two secrets in GitHub corresponding to your AWS credentials.
- To make this workflow work, create GitHub secrets with your own
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
- For GCP, you'll need a
GCP_PROJECT_ID
andGCP_SA_KEY
for the service account key json (e.g. contents of~/.config/gcloud/feast-workshop.json
in the above)
See the result on a PR opened in this repo: #3
When a pull request is merged to change the repo, we configure CI/CD to automatically run feast apply
.
An example GitHub workflow which runs feast apply
on PR merge (See feast_apply_aws.yml, which is setup in this workshop repo)
name: Feast apply
on:
push:
branches:
- main
jobs:
feast_apply:
runs-on: ubuntu-latest
steps:
- name: Setup Python
id: setup-python
uses: actions/setup-python@v2
with:
python-version: "3.7"
architecture: x64
- name: Set up AWS SDK
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
# Run `feast apply`
- uses: actions/checkout@v2
- name: Install feast
run: pip install "feast[aws]"
- name: Run feast apply
env:
FEAST_USAGE: "False"
IS_TEST: "True"
run: |
cd module_0/feature_repo_aws
feast apply
Towards the end of the module, we will see this in action, but for now, we focus on understanding what Feast brings to the table.
We won't dive into this in detail here, but you don't want to allow arbitrary users to clone the feature repository, change definitions and run feast apply
.
Thus, you should lock down your registry (e.g. with an S3 bucket policy) to only allow changes from your CI/CD user and perhaps some ML engineers.
Many Feast users use tags
on objects extensively. Some examples of how this may be used:
- To give more detailed documentation on a
FeatureView
- To highlight what groups you need to join to gain access to certain feature views.
- To denote whether a feature service is in production or in staging.
Additionally, users will often want to have a dev/staging environment that's separate from production. In this case, once pattern that works is to have separate projects:
├── .github
│ └── workflows
│ ├── production.yml
│ └── staging.yml
│
├── staging
│ ├── driver_repo.py
│ └── feature_store.yaml
│
└── production
├── driver_repo.py
└── feature_store.yaml
Data scientists or ML engineers can use the defined FeatureService
(corresponding to model versions) and schedule regular jobs that generate batch predictions (or regularly retrain).
get_historical_features
is an API by which you can retrieve features (by referencing features directly or via feature services). It will under the hood manage point-in-time joins and avoid data leakage to generate training datasets or power batch scoring.
For batch scoring, you want to get the latest feature values for your entities. Feast requires timestamps in get_historical_features
, so what you'll need to do is append an event timestamp of now()
. Don't bother running this code right now since we'll run this in the next step.
# Get the latest feature values for unique entities
entity_df = pd.DataFrame.from_dict({"driver_id": [1001, 1002, 1003, 1004, 1005],})
entity_df["event_timestamp"] = pd.to_datetime('now', utc=True)
# Because we're using the default FileOfflineStore, this executes on your machine
training_df = store.get_historical_features(
entity_df=entity_df, features=store.get_feature_service("model_v2"),
).to_df()
# Make batch predictions
predictions = model.predict(training_df)
Note that we're using a FeatureService
in this example.
A feature service is the recommended way to version a model's feature dependencies. This is useful for many reasons:
- Data scientists can easily recall what features were used for a given model iteration, or learn what features were useful for a related model
- Engineers can use that same feature service to retrieve features at serving time, or run A/B experiments across several feature services.
- Data lineage. Feature services give visibility into what features are depended on in production. Thus, we can prevent accidental changes
Go into the module_0/client_aws
or module_0/client_gcp
directory and change the feature_store.yaml
to use your S3/GCS bucket.
Then, run python test_fetch.py
, which runs the above code (printing out the dataframe instead of the model):
$ python test_fetch.py
driver_id event_timestamp conv_rate acc_rate
720 1002 2022-05-15 20:46:00.308163+00:00 0.465875 0.315721
1805 1005 2022-05-15 20:46:00.308163+00:00 0.394072 0.046118
1083 1003 2022-05-15 20:46:00.308163+00:00 0.869917 0.779562
359 1001 2022-05-15 20:46:00.308163+00:00 0.404588 0.407571
1444 1004 2022-05-15 20:46:00.308163+00:00 0.977276 0.051582
You can also not have a feature_store.yaml
and directly instantiate a RepoConfig
object in Python (which is the in memory representation of the contents of feature_store.yaml
). See the module_0/client_no_yaml
directory for an example of this.
We are going to run one of the two python files depending on if you're on GCP vs AWS. The output will be identical to the previous step.
A quick snippet of the AWS code:
from feast import FeatureStore, RepoConfig
from feast.repo_config import RegistryConfig
repo_config = RepoConfig(
registry=RegistryConfig(path="s3://[YOUR BUCKET]/registry.pb"),
project="feast_demo_aws",
provider="aws",
offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig
online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig
)
store = FeatureStore(config=repo_config)
...
training_df = store.get_historical_features(...).to_df()
$ python test_fetch_aws.py
driver_id event_timestamp conv_rate acc_rate
720 1002 2022-05-15 20:46:00.308163+00:00 0.465875 0.315721
1805 1005 2022-05-15 20:46:00.308163+00:00 0.394072 0.046118
1083 1003 2022-05-15 20:46:00.308163+00:00 0.869917 0.779562
359 1001 2022-05-15 20:46:00.308163+00:00 0.404588 0.407571
1444 1004 2022-05-15 20:46:00.308163+00:00 0.977276 0.051582
You may note that the above example uses a to_df()
method to load the training dataset into memory and may be wondering how this scales if you have very large datasets.
get_historical_features
actually returns a RetrievalJob
object that lazily executes the point-in-time join. The RetrievalJob
class is extended by each offline store to allow flushing results to e.g. the data warehouse or data lakes.
Let's look at an example with BigQuery as the offline store.
project: feast_demo_gcp
provider: gcp
registry: gs://[YOUR BUCKET]/registry.pb
offline_store:
type: bigquery
location: EU
Retrieving the data with get_historical_features
gives a BigQueryRetrievalJob
object (reference) which exposes a to_bigquery()
method. Thus, you can do:
path = store.get_historical_features(
entity_df=entity_df, features=store.get_feature_service("model_v2"),
).to_bigquery()
# Continue with distributed training or batch predictions from the BigQuery dataset.
Data scientists will be using or authoring features in Feast. By using Feast, data scientist can:
- Re-use existing features that are already productionized
- Gain inspiration from other related models and the features they use
- Organize model experiments using
FeatureService
s - (in upcoming modules) Directly author features or transformations that are used at serving time (instead of having MLE have to re-engineer)
They can similarly generate in memory dataframes using get_historical_features(...).to_df()
or larger datasets with methods like get_historical_features(...).to_bigquery()
as described above.
We don't need to do anything new here since data scientists will be doing many of the same steps you've seen in previous user flows.
There are two ways data scientists can use Feast:
- Use Feast primarily as a way of pulling production ready features.
- See the
client_aws/
/client_gcp/
, orclient_no_yaml
folders for examples of how users can pull features by only having afeature_store.yaml
or instantiating aRepoConfig
object - This is not recommended since data scientists cannot register feature services to indicate they depend on certain features in production.
- See the
- [Recommended] Have a local copy of the feature repository (e.g.
git clone
) and author / iterate / re-use features.- Data scientist can:
- browse relevant features that are already productionized to re-use
- iterate on new features locally
- apply features to their own dev project with a local registry & experiment
- build feature services in preparation for production
- submit PRs to include features that should be used in production (including A/B experiments, or model training iterations)
- Data scientist can:
Data scientists can also investigate other models and their dependent features / data sources / on demand transformations through the repository or through the Web UI (by running feast ui
)
In your own fork of the feast-workshop
project, with the above setup (i.e. you've made GitHub secrets with your own AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
or GCP_PROJECT_ID
and GCP_SA_KEY
), try making a change with a pull request! And then merge that pull request to see the change propagate in the registry.
Some ideas for what to try:
- Changing metadata (owner, description, tags) on an existing
FeatureView
- Adding or removing a new
Field
in an existingFeatureView
You can verify the change propagated by:
- Running the Feast CLI (e.g.
feast feature-views list
) - Checking in the Feast UI (i.e. via
feast ui
) - Fetching features directly (either by referencing the feature directly in
get_historical_features
or by querying a modifiedFeatureService
)
As a result:
- You have file sources (possibly remote) and a remote registry (e.g. in S3)
- Data scientists are able to author + reuse features based on a centrally managed registry (visible through the Feast CLI or through the Web UI).
- ML engineers are able to use these same features with a reference to the registry to regularly generating predictions on the latest timestamp or re-train models.
- You have CI/CD setup to automatically update the registry + online store infrastructure when changes are merged into the version controlled feature repo.
Continue with the workshop in module 1, which uses Feast (with Spark, Kafka, Redis) to generate fresh features and power online model inference.
We expect that there is already some way for you to retrieve labels at model training time. That same source of data should contain the relevant entities.
For example, you may have an events table like:
user_id | event_timestamp | action_type | action_id |
---|---|---|---|
123 | 2021-12-25 12:30:00+00 | add_item_to_cart | 1111 |
123 | 2021-12-25 15:30:00+00 | buy_item | 2222 |
234 | 2022-12-25 15:30:00+00 | sell_item | 3333 |
This first two columns of this example table would constitute your entity_df
.
Note: Feast, in the future, may support tracking these entity "data sources" to simplify get_historical_features
calls
Not today. See GitHub Issue #1611 for discussions on how we may implement this. Contributions are welcome!