feat: DuckDB historical retrieval without entity dataframe#6108
feat: DuckDB historical retrieval without entity dataframe#6108Vperiodt wants to merge 9 commits intofeast-dev:masterfrom
Conversation
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
| full_feature_names: bool = False, | ||
| **kwargs, | ||
| ) -> RetrievalJob: | ||
| start_date: Optional[datetime] = kwargs.get("start_date", None) |
There was a problem hiding this comment.
instead keep it as optional params in get_historical_features
| DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp" | ||
|
|
||
|
|
||
| def _build_entity_df_from_sources( |
| ) | ||
|
|
||
|
|
||
| DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp" |
There was a problem hiding this comment.
import this from from feast.infra.offline_stores.offline_utils import DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL
|
@Vperiodt There is integration test tests/integration/offline_store/test_non_entity_mode.py, see if duckdb coverage can be included there |
| start_date = end_date - timedelta(seconds=max_ttl_seconds) | ||
| else: | ||
| start_date = end_date - timedelta(days=30) | ||
| start_date = make_tzaware(start_date) |
There was a problem hiding this comment.
Line 229 - Line 244 is common across the offline store to figure out the start_date & end_date. If its not too much, is it possible to create utility function which can be re-used?
There was a problem hiding this comment.
Yes, I’ll refactor this into a reusable utility function and apply it to similar date range calculations across the codebase in a follow-up PR
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
| "duckdb": ( | ||
| "local", | ||
| importlib.import_module( | ||
| "tests.universal.feature_repos.duckdb_repo_configuration" | ||
| ).DuckDBDataSourceCreator, | ||
| ), |
There was a problem hiding this comment.
🟡 Unconditional eager import of DuckDB config breaks test loading when ibis is not installed
The duckdb entry in OFFLINE_STORE_TO_PROVIDER_CONFIG uses importlib.import_module("tests.universal.feature_repos.duckdb_repo_configuration") at module level. This transitively imports feast.infra.offline_stores.duckdb (via duckdb_repo_configuration.py:1), which has import ibis at the top (sdk/python/feast/infra/offline_stores/duckdb.py:6). If ibis is not installed, loading repo_configuration.py will fail with ModuleNotFoundError, breaking all tests that import from this module — not just DuckDB-specific tests. This breaks the established pattern where non-core offline stores (BigQuery, Redshift, Snowflake at sdk/python/tests/universal/feature_repos/repo_configuration.py:116-137) are imported conditionally behind environment variable checks.
Prompt for agents
In sdk/python/tests/universal/feature_repos/repo_configuration.py, the duckdb entry in OFFLINE_STORE_TO_PROVIDER_CONFIG (lines 94-99) should be moved out of the unconditional dict literal and added conditionally, similar to how BigQuery/Redshift/Snowflake are added at lines 135-137. One approach is to wrap it in a try/except ImportError block or check for ibis availability:
try:
from tests.universal.feature_repos.duckdb_repo_configuration import DuckDBDataSourceCreator
OFFLINE_STORE_TO_PROVIDER_CONFIG["duckdb"] = ("local", DuckDBDataSourceCreator)
except ImportError:
pass
This should be placed after the initial OFFLINE_STORE_TO_PROVIDER_CONFIG dict definition (after line 94 in the original, which only has the "file" entry).
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
Adds date-range historical retrieval for the DuckDB offline store when entity_df is omitted.
Which issue(s) this PR fixes:
fixes #5832 related to #1611
Misc