-
Notifications
You must be signed in to change notification settings - Fork 951
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6d7678f
commit f20ccfa
Showing
2 changed files
with
128 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,78 @@ | ||
# Create a feature repository | ||
|
||
We believe that the best way to keep track of your feature definitions is to manage them as code. To define features, you simply describe your feature and data source declarations in pure Python. Then Feast CLI can read Python files with feature definitions, parse the definitions and help you create and manage the infrastructure required to serve these features in production. | ||
|
||
## What is a Feature Repository? | ||
|
||
Feature Repository is nothing more than a collection of Python files containing feature declarations, and a config file with some Feast settings. Typically, Feast users store those files in a git repository, hence the name. Note, however, that Feast makes no hard assumptions about your source control repository structure and doesn't even require you to use git. | ||
|
||
## Creating a Feature Repository | ||
|
||
The easiest way to get started is to use `feast init` command: | ||
|
||
```bash | ||
$ mkdir my_feature_repo && cd my_feature_repo | ||
$ feast init | ||
Generated feature_store.yaml and example features in example_repo.py | ||
Now try runing `feast apply` to apply, or `feast materialize` to sync data to the online store | ||
``` | ||
|
||
You can see that all this does is create a python file with feature definitions, some sample data, and a Feast configuration for local development: | ||
|
||
```bash | ||
$ tree | ||
. | ||
├── data | ||
│ └── driver_stats.parquet | ||
├── example.py | ||
└── feature_store.yaml | ||
|
||
1 directory, 3 files | ||
``` | ||
|
||
## What's Inside a Feature Repository | ||
|
||
Feast configuration is stored in a file named `feature_store.yaml`. There are no restrictions on how Python feature definition files can be named, as long as they're valid Python module names \(so no dashes\). There could be multiple files as well. | ||
|
||
If you take a look at `feature_store.yaml` you'll see something like this: | ||
|
||
{% code title="feature\_store.yaml" %} | ||
```yaml | ||
project: robust_tortoise | ||
metadata_store: data/metadata.db | ||
provider: local | ||
online_store: | ||
local: | ||
path: data/online_store.db | ||
``` | ||
{% endcode %} | ||
|
||
Here `project` is a unique identifier for the Feature Repository generated by `feast init`. You can also notice that this configuration file uses a "local" provider that is most useful for development, as all data is stored and served locally on your computer. Because we're using a Local provider, both metadata store and online feature store are just files on your local file system. | ||
|
||
Now, if you open `example.py` you'll see some example Feature Views and Data Source definitions. The file is too large to quote here but you should see something like this when you open it: | ||
|
||
```python | ||
from feast import Entity, Feature, FeatureView, ValueType | ||
from feast.data_source import FileSource | ||
|
||
... | ||
|
||
driver_hourly_stats = FileSource( | ||
... | ||
) | ||
|
||
driver = Entity(...) | ||
|
||
driver_hourly_stats_view = FeatureView( | ||
name="driver_hourly_stats", | ||
entities=["driver_id"], | ||
... | ||
) | ||
``` | ||
|
||
The way to declare Feature Views and other objects in Feast Feature Repository is to simply write Python code to instantiate the objects, set the parameters and make sure to assign them to a top-level module variable. | ||
|
||
Feast CLI will process all Python files from the Feature Repository as modules and find all top-level variables. You don't need to name Python files or variables in a certain way; just make sure there is a separate variable for each Feast object. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,54 @@ | ||
# Deploy a feature store | ||
|
||
After creating a Feature Repository, we use Feast CLI to create all required infrastructure to serve the features we defined there. | ||
|
||
{% hint style="info" %} | ||
Here we'll be using the example repository we created in the previous guide, [Create a feature store](create-a-feature-repository.md). You can re-create it by running `feast init` in a new directory. | ||
{% endhint %} | ||
|
||
## Deploying | ||
|
||
To have Feast create all infrastructure, you can just run `feast apply` in the Feature Repository directory. It should be a pretty straightforward process: | ||
|
||
``` | ||
$ feast.py | ||
Processing example.py as example | ||
Done! | ||
``` | ||
|
||
Depending on whether the Feature Repository is configured to use the Local provider or one of the cloud providers like GCP or AWS, it may take from a couple of seconds to a minute. | ||
|
||
## What happens during `feast apply` | ||
|
||
#### 1. Scan the Feature Repository | ||
|
||
Feast will scan Python files in your Feature Repository, and find all Feast object definitions, such as Feature Views, Entities, and Data Sources. | ||
|
||
#### 2. Update metadata | ||
|
||
If all definitions look valid, Feast will sync the metadata about Feast objects to the Metadata Store. Metadata store is a tiny database storing most of the same information you have in the Feature Repository, plus some state in a more structured form. It is necessary mostly because the production feature serving infrastructure won't be able to access Python files in the Feature Repository at run time, but it will be able to efficiently and securely read the feature definitions from the Metadata Store. | ||
|
||
#### 3. Create cloud infrastructure | ||
|
||
At this step, Feast CLI will create all necessary infrastructure for feature serving and materialization to work. What exactly gets created depends on what provider is configured to be used in `feature_store.yaml` in the Feature Repository. | ||
|
||
For example, for Local provider, it is as easy as creating a file on your local filesystem as a key-value store to serve feature data from. Local provider is most usable for local testing, no real production serving happens there. | ||
|
||
A more interesting configuration is when we're configured Feast to use GCP provider and Cloud Datastore to store feature data. When you run `feast apply`, Feast will make sure you have valid credentials and create some metadata objects in the Datastore for each Feature View. | ||
|
||
Similarly, when using AWS, Feast will make sure that resources like DynamoDB tables are created for every Feature View. | ||
|
||
{% hint style="warning" %} | ||
Since `feast deploy` \(when configured to use non-Local provider\) will create cloud infrastructure in your AWS or GCP account, it may incur some costs on your cloud bill. While we aim to design it in a way that Feast cloud resources don't cost much when not serving features, preferring "serverless" cloud services that bill per request, please refer to the specific Provider documentation to make sure there are no surprises. | ||
{% endhint %} | ||
|
||
## Cleaning up | ||
|
||
If you no longer need the infrastructure, you can run `feast destroy` to clean up. **Note that this will irrevocably delete all data in the online store, so use it with care.** | ||
|
||
\*\*\*\* | ||
|
||
|
||
|
||
|
||
|