Skip to content

feat: DuckDB historical retrieval without entity dataframe#6108

Open
Vperiodt wants to merge 9 commits intofeast-dev:masterfrom
Vperiodt:patch-dataframe
Open

feat: DuckDB historical retrieval without entity dataframe#6108
Vperiodt wants to merge 9 commits intofeast-dev:masterfrom
Vperiodt:patch-dataframe

Conversation

@Vperiodt
Copy link
Contributor

@Vperiodt Vperiodt commented Mar 14, 2026

What this PR does / why we need it:

Adds date-range historical retrieval for the DuckDB offline store when entity_df is omitted.

Which issue(s) this PR fixes:

fixes #5832 related to #1611

Misc


Open with Devin

Vperiodt and others added 4 commits March 15, 2026 01:57
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@Vperiodt Vperiodt marked this pull request as ready for review March 15, 2026 18:17
@Vperiodt Vperiodt requested a review from a team as a code owner March 15, 2026 18:17
devin-ai-integration[bot]

This comment was marked as resolved.

Vperiodt and others added 2 commits March 16, 2026 18:33
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
full_feature_names: bool = False,
**kwargs,
) -> RetrievalJob:
start_date: Optional[datetime] = kwargs.get("start_date", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead keep it as optional params in get_historical_features

DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp"


def _build_entity_df_from_sources(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring for this

)


DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import this from from feast.infra.offline_stores.offline_utils import DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL

@ntkathole
Copy link
Member

@Vperiodt There is integration test tests/integration/offline_store/test_non_entity_mode.py, see if duckdb coverage can be included there

start_date = end_date - timedelta(seconds=max_ttl_seconds)
else:
start_date = end_date - timedelta(days=30)
start_date = make_tzaware(start_date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 229 - Line 244 is common across the offline store to figure out the start_date & end_date. If its not too much, is it possible to create utility function which can be re-used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ll refactor this into a reusable utility function and apply it to similar date range calculations across the codebase in a follow-up PR

Vperiodt and others added 2 commits March 24, 2026 10:44
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment on lines +94 to +99
"duckdb": (
"local",
importlib.import_module(
"tests.universal.feature_repos.duckdb_repo_configuration"
).DuckDBDataSourceCreator,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unconditional eager import of DuckDB config breaks test loading when ibis is not installed

The duckdb entry in OFFLINE_STORE_TO_PROVIDER_CONFIG uses importlib.import_module("tests.universal.feature_repos.duckdb_repo_configuration") at module level. This transitively imports feast.infra.offline_stores.duckdb (via duckdb_repo_configuration.py:1), which has import ibis at the top (sdk/python/feast/infra/offline_stores/duckdb.py:6). If ibis is not installed, loading repo_configuration.py will fail with ModuleNotFoundError, breaking all tests that import from this module — not just DuckDB-specific tests. This breaks the established pattern where non-core offline stores (BigQuery, Redshift, Snowflake at sdk/python/tests/universal/feature_repos/repo_configuration.py:116-137) are imported conditionally behind environment variable checks.

Prompt for agents
In sdk/python/tests/universal/feature_repos/repo_configuration.py, the duckdb entry in OFFLINE_STORE_TO_PROVIDER_CONFIG (lines 94-99) should be moved out of the unconditional dict literal and added conditionally, similar to how BigQuery/Redshift/Snowflake are added at lines 135-137. One approach is to wrap it in a try/except ImportError block or check for ibis availability:

try:
    from tests.universal.feature_repos.duckdb_repo_configuration import DuckDBDataSourceCreator
    OFFLINE_STORE_TO_PROVIDER_CONFIG["duckdb"] = ("local", DuckDBDataSourceCreator)
except ImportError:
    pass

This should be placed after the initial OFFLINE_STORE_TO_PROVIDER_CONFIG dict definition (after line 94 in the original, which only has the "file" entry).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DuckDB - historical retrieval without entity dataframe

3 participants