Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Store] Read optimisation on partitioned target #5465

Conversation

tomerm-iguazio
Copy link
Contributor

@tomerm-iguazio tomerm-iguazio commented Apr 25, 2024

  1. Add support for additional_filters:
    a) In local_merger + dask_merger.
    b) In ParquetSource.
    c) In ParquetTarget.
    d) get_offline_features and more.
  2. add warning for non-supporting source/targets/mergers.
  3. add tests for dask and local engines.
  4. bump storey.

ML-6161

@gtopper gtopper changed the title [FeatureStore] Read optimisation on partitioned target. [Feature Store] Read optimisation on partitioned target Apr 30, 2024
Copy link
Collaborator

@gtopper gtopper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle but I do have some comments. See below.

mlrun/datastore/base.py Outdated Show resolved Hide resolved
mlrun/datastore/base.py Outdated Show resolved Hide resolved
mlrun/datastore/base.py Outdated Show resolved Hide resolved
mlrun/datastore/base.py Outdated Show resolved Hide resolved
mlrun/datastore/base.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
mlrun/datastore/base.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
tests/system/feature_store/test_feature_store.py Outdated Show resolved Hide resolved
mlrun/datastore/sources.py Outdated Show resolved Hide resolved
@@ -175,6 +176,14 @@ def get_offline_features(
By default, the filter executes on the timestamp_key of each feature set.
Note: the time filtering is performed on each feature set before the
merge process using start_time and end_time params.
:param additional_filters: (list of tuples, optional): List of additional_filters conditions as tuples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to follow the convention of how we specify type and optionality in the rest of the code base. I guess there isn't exactly one way that we do this, but better not introduce one more way. Docs should be consistent.

@@ -550,6 +575,12 @@ def as_df(
:param end_time: filters out data after this time
:param time_column: Store timestamp_key will be used if None.
The results will be filtered by this column and start_time & end_time.
:param additional_filters: List of additional_filters conditions as tuples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param additional_filters: List of additional_filters conditions as tuples.
:param additional_filters: List of additional filter conditions as tuples.

Copy link
Collaborator

@gtopper gtopper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

@assaf758 assaf758 merged commit b3ab5c7 into mlrun:development May 4, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants