You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm investigating improvements to the DataHub Feast Integration https://datahubproject.io/docs/generated/ingestion/sources/feast/. To complete a data lineage graph, the ingestion plugin needs to be able to associate a feature view with the "fully qualified" name of the table. That is, something like ${db}.${schema}.${table}.
Right now this is only works by manually duplicating the database name from feature_store.yaml into the source definition, which feels well... duplicative. Otherwise the properties of the class are not fully reconstituted from the registry.
(From the init demo)
driver_stats_source = SnowflakeSource(
# The Snowflake table where features can be found
database=yaml.safe_load(open("feature_store.yaml"))["offline_store"]["database"],
table=f"{project_name}_feast_driver_hourly_stats",
# The event timestamp is used for point-in-time joins and for ensuring only
# features within the TTL are returned
timestamp_field="event_timestamp",
# The (optional) created timestamp is used to ensure there are no duplicate
# feature rows in the offline store or when building training datasets
created_timestamp_column="created",
)
Results in
$ feast data-sources describe notable_hyena_feast_driver_hourly_stats
type: BATCH_SNOWFLAKE
timestampField: event_timestamp
createdTimestampColumn: created
snowflakeOptions:
table: notable_hyena_feast_driver_hourly_stats
schema: PUBLIC
database: EXPERIMENTS_CHRIS
name: notable_hyena_feast_driver_hourly_stats
While
driver_stats_source = SnowflakeSource(
# The Snowflake table where features can be found
#database=yaml.safe_load(open("feature_store.yaml"))["offline_store"]["database"],
table=f"{project_name}_feast_driver_hourly_stats",
# The event timestamp is used for point-in-time joins and for ensuring only
# features within the TTL are returned
timestamp_field="event_timestamp",
# The (optional) created timestamp is used to ensure there are no duplicate
# feature rows in the offline store or when building training datasets
created_timestamp_column="created",
)
Note that even the init example just loads from the yaml file.
Describe the solution you'd like
Since get_historical_features works in either case, I presume there must already be a code path that handles this. I would like referencing the database and schema properties of a SnowflakeSource to always have the same values as when get_historical_features is called.
Describe alternatives you've considered
The DataHub ingestion plugin could read feature_store.yaml and smoosh strings together. That would work for this particular case, but I suspect it would be brittle in the long run.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I'm investigating improvements to the DataHub Feast Integration https://datahubproject.io/docs/generated/ingestion/sources/feast/. To complete a data lineage graph, the ingestion plugin needs to be able to associate a feature view with the "fully qualified" name of the table. That is, something like
${db}.${schema}.${table}
.Right now this is only works by manually duplicating the database name from
feature_store.yaml
into the source definition, which feels well... duplicative. Otherwise the properties of the class are not fully reconstituted from the registry.(From the
init
demo)Results in
While
only gives
Note that even the
init
example just loads from the yaml file.Describe the solution you'd like
Since
get_historical_features
works in either case, I presume there must already be a code path that handles this. I would like referencing thedatabase
andschema
properties of aSnowflakeSource
to always have the same values as whenget_historical_features
is called.Describe alternatives you've considered
The DataHub ingestion plugin could read
feature_store.yaml
and smoosh strings together. That would work for this particular case, but I suspect it would be brittle in the long run.The text was updated successfully, but these errors were encountered: