Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Feast apply to import files recursively (and add .feastignore) #1482

Merged
merged 6 commits into from
Apr 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
## Reference

* [feature\_store.yaml](reference/feature-store-yaml.md)
* [.feastignore](reference/feast-ignore.md)
* [Python API reference](http://rtd.feast.dev/)

## Feast on Kubernetes
Expand Down
39 changes: 31 additions & 8 deletions docs/concepts/feature-repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ A feature repository consists of:

* A collection of Python files containing feature declarations.
* A `feature_store.yaml` file containing infrastructural configuration.
* A `.feastignore` file containing paths in the feature repository to ignore.

{% hint style="info" %}
Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.
Expand All @@ -19,27 +20,28 @@ Typically, users store their feature repositories in a Git repository, especiall

The structure of a feature repository is as follows:

* The root of the repository should contain a `feature_store.yaml` file.
* The root of the repository should contain a `feature_store.yaml` file and may contain a `.feastignore` file.
* The repository should contain Python files that contain feature definitions.
* The repository can contain other files as well, including documentation and potentially data files.

An example structure of a feature repository is shown below:

```text
$ tree
$ tree -a
.
├── data
│ └── driver_stats.parquet
├── driver_features.py
└── feature_store.yaml
├── feature_store.yaml
└── .feastignore

1 directory, 3 files
1 directory, 4 files
```

A couple of things to note about the feature repository:

* Feast does not currently read through subdirectories of the feature repository when commands. All feature definition files must reside at the root of the repository.
* Feast reads _all_ Python files when `feast apply` is ran, even if they don't contain feature definitions. It's recommended to store imperative scripts in a different location than inside the feature registry for this purpose.
* Feast reads _all_ Python files recursively when `feast apply` is ran, including subdirectories, even if they don't contain feature definitions.
* It's recommended to add `.feastignore` and add paths to all imperative scripts if you need to store them inside the feature registry.

## The feature\_store.yaml configuration file

Expand All @@ -57,6 +59,28 @@ online_store:

The `feature_store.yaml` file configures how the feature store should run. See [feature\_store.yaml](../reference/feature-store-yaml.md) for more details.

## The .feastignore file

This file contains paths that should be ignored when running `feast apply`. An example `.feastignore` is shown below:

{% code title=".feastignore" %}
```
# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py
```
{% endcode %}

See [.feastignore](../reference/feast-ignore.md) for more details.

## Feature definitions

A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:
Expand Down Expand Up @@ -97,5 +121,4 @@ To declare new feature definitions, just add code to the feature repository, eit
### Next steps

* See [Create a feature repository](../how-to-guides/create-a-feature-repository.md) to get started with an example feature repository.
* See [feature\_store.yaml](../reference/feature-store-yaml.md) or [Feature Views](feature-views.md) for more information on the configuration files that live in a feature registry.

* See [feature\_store.yaml](../reference/feature-store-yaml.md), [.feastignore](../reference/feast-ignore.md) or [Feature Views](feature-views.md) for more information on the configuration files that live in a feature registry.
32 changes: 32 additions & 0 deletions docs/reference/feast-ignore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# .feastignore

## Overview

`.feastignore` is a file that is placed at the root of the [Feature Repository](../concepts/feature-repository.md). This file contains paths that should be ignored when running `feast apply`. An example `.feastignore` is shown below:

{% code title=".feastignore" %}
```
# Ignore virtual environment
venv

# Ignore a specific Python file
scripts/foo.py

# Ignore all Python files directly under scripts directory
scripts/*.py

# Ignore all "foo.py" anywhere under scripts directory
scripts/**/foo.py
```
{% endcode %}

`.feastignore` file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by `feast apply`.

## Feast Ignore Patterns

| Pattern | Example matches | Explanation |
| ----------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| venv | venv/foo.py<br>venv/a/foo.py | You can specify a path to a specific directory. Everything in that directory will be ignored. |
| scripts/foo.py | scripts/foo.py | You can specify a path to a specific file. Only that file will be ignored. |
| scripts/*.py | scripts/foo.py<br>scripts/bar.py | You can specify asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/". |
| scripts/**/foo.py | scripts/foo.py<br>scripts/a/foo.py<br>scripts/a/b/foo.py | You can specify double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories. |
61 changes: 55 additions & 6 deletions sdk/python/feast/repo_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from datetime import timedelta
from importlib.abc import Loader
from pathlib import Path
from typing import List, NamedTuple, Union
from typing import List, NamedTuple, Set, Union

import click

Expand All @@ -31,15 +31,64 @@ class ParsedRepo(NamedTuple):
entities: List[Entity]


def read_feastignore(repo_root: Path) -> List[str]:
"""Read .feastignore in the repo root directory (if exists) and return the list of user-defined ignore paths"""
feast_ignore = repo_root / ".feastignore"
if not feast_ignore.is_file():
return []
lines = feast_ignore.read_text().strip().split("\n")
ignore_paths = []
for line in lines:
# Remove everything after the first occurance of "#" symbol (comments)
if line.find("#") >= 0:
line = line[: line.find("#")]
# Strip leading or ending whitespaces
line = line.strip()
# Add this processed line to ignore_paths if it's not empty
if len(line) > 0:
ignore_paths.append(line)
return ignore_paths


def get_ignore_files(repo_root: Path, ignore_paths: List[str]) -> Set[Path]:
"""Get all ignore files that match any of the user-defined ignore paths"""
ignore_files = set()
for ignore_path in ignore_paths:
# ignore_path may contains matchers (* or **). Use glob() to match user-defined path to actual paths
for matched_path in repo_root.glob(ignore_path):
if matched_path.is_file():
# If the matched path is a file, add that to ignore_files set
ignore_files.add(matched_path.resolve())
else:
# Otherwise, list all Python files in that directory and add all of them to ignore_files set
ignore_files |= {
sub_path.resolve()
for sub_path in matched_path.glob("**/*.py")
if sub_path.is_file()
}
return ignore_files


def get_repo_files(repo_root: Path) -> List[Path]:
"""Get the list of all repo files, ignoring undesired files & directories specified in .feastignore"""
# Read ignore paths from .feastignore and create a set of all files that match any of these paths
ignore_paths = read_feastignore(repo_root)
ignore_files = get_ignore_files(repo_root, ignore_paths)

# List all Python files in the root directory (recursively)
repo_files = {p.resolve() for p in repo_root.glob("**/*.py") if p.is_file()}
# Ignore all files that match any of the ignore paths in .feastignore
repo_files -= ignore_files

# Sort repo_files to read them in the same order every time
return sorted(repo_files)


def parse_repo(repo_root: Path) -> ParsedRepo:
""" Collect feature table definitions from feature repo """
res = ParsedRepo(feature_tables=[], entities=[], feature_views=[])

# FIXME: process subdirs but exclude hidden ones
repo_files = [p.resolve() for p in repo_root.glob("*.py")]

for repo_file in repo_files:

for repo_file in get_repo_files(repo_root):
module_path = py_path_to_module(repo_file, repo_root)
module = importlib.import_module(module_path)

Expand Down
3 changes: 2 additions & 1 deletion sdk/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@
"adlfs==0.5.9",
"firebase-admin==4.5.2",
"google-cloud-datastore==2.1.0",
"pre-commit"
"pre-commit",
"assertpy==1.1",
]

# README file from Feast repo root directory
Expand Down
46 changes: 26 additions & 20 deletions sdk/python/tests/test_cli_local.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
from pathlib import Path
from textwrap import dedent

import assertpy

from feast.feature_store import FeatureStore
from tests.cli_utils import CliRunner
from tests.online_read_write_test import basic_rw_test
Expand Down Expand Up @@ -39,39 +41,39 @@ def test_workflow() -> None:
)

result = runner.run(["apply"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)

# entity & feature view list commands should succeed
result = runner.run(["entities", "list"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)
result = runner.run(["feature-views", "list"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)

# entity & feature view describe commands should succeed when objects exist
result = runner.run(["entities", "describe", "driver"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)
result = runner.run(
["feature-views", "describe", "driver_locations"], cwd=repo_path
)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)

# entity & feature view describe commands should fail when objects don't exist
result = runner.run(["entities", "describe", "foo"], cwd=repo_path)
assert result.returncode == 1
assertpy.assert_that(result.returncode).is_equal_to(1)
result = runner.run(["feature-views", "describe", "foo"], cwd=repo_path)
assert result.returncode == 1
assertpy.assert_that(result.returncode).is_equal_to(1)

# Doing another apply should be a no op, and should not cause errors
result = runner.run(["apply"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)

basic_rw_test(
FeatureStore(repo_path=str(repo_path), config=None),
view_name="driver_locations",
)

result = runner.run(["teardown"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)


def test_non_local_feature_repo() -> None:
Expand Down Expand Up @@ -104,13 +106,13 @@ def test_non_local_feature_repo() -> None:
)

result = runner.run(["apply"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)

fs = FeatureStore(repo_path=str(repo_path))
assert len(fs.list_feature_views()) == 3
assertpy.assert_that(fs.list_feature_views()).is_length(3)

result = runner.run(["teardown"], cwd=repo_path)
assert result.returncode == 0
assertpy.assert_that(result.returncode).is_equal_to(0)


@contextmanager
Expand Down Expand Up @@ -150,19 +152,23 @@ def test_3rd_party_providers() -> None:
# Check with incorrect built-in provider name (no dots)
with setup_third_party_provider_repo("feast123") as repo_path:
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
assert return_code == 1
assert b"Provider 'feast123' is not implemented" in output
assertpy.assert_that(return_code).is_equal_to(1)
assertpy.assert_that(output).contains(b"Provider 'feast123' is not implemented")
# Check with incorrect third-party provider name (with dots)
with setup_third_party_provider_repo("feast_foo.provider") as repo_path:
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
assert return_code == 1
assert b"Could not import provider module 'feast_foo'" in output
assertpy.assert_that(return_code).is_equal_to(1)
assertpy.assert_that(output).contains(
b"Could not import provider module 'feast_foo'"
)
# Check with incorrect third-party provider name (with dots)
with setup_third_party_provider_repo("foo.provider") as repo_path:
with setup_third_party_provider_repo("foo.FooProvider") as repo_path:
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
assert return_code == 1
assert b"Could not import provider 'provider' from module 'foo'" in output
assertpy.assert_that(return_code).is_equal_to(1)
assertpy.assert_that(output).contains(
b"Could not import provider 'FooProvider' from module 'foo'"
)
# Check with correct third-party provider name
with setup_third_party_provider_repo("foo.provider.FooProvider") as repo_path:
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
assert return_code == 0
assertpy.assert_that(return_code).is_equal_to(0)