-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
To achieve combined lineage tracking, we need data storage or data source details to be available in Presto query events.
Currently, Presto does not provide these details in its query events. However, if we populate the data source metadata within these events, we can map that information to OpenLineage input/output dataset events, enabling effective lineage tracking.
Expected Behavior or Use Case
Presto Component, Service, or Connector
presto-base-jdbc, presto-hive, presto-iceberg
Possible Implementation
For table/schema location for Hive/Lakehouse Connectors, one potential place to store this information is in
QueryCompletedEvent->QueryIOMetadata->QueryInputMetadata.connectorInfo, which is populated by com.facebook.presto.metadata.Metadata#getInfo.
These changes support data sources and storage systems accessed via the Iceberg, Hive, and JDBC connectors.