Skip to content

Populate the data storage location details in query events. #25123

@evanvdia

Description

@evanvdia

To achieve combined lineage tracking, we need data storage or data source details to be available in Presto query events.
Currently, Presto does not provide these details in its query events. However, if we populate the data source metadata within these events, we can map that information to OpenLineage input/output dataset events, enabling effective lineage tracking.

Expected Behavior or Use Case

Presto Component, Service, or Connector

presto-base-jdbc, presto-hive, presto-iceberg

Possible Implementation

For table/schema location for Hive/Lakehouse Connectors, one potential place to store this information is in
QueryCompletedEvent->QueryIOMetadata->QueryInputMetadata.connectorInfo, which is populated by com.facebook.presto.metadata.Metadata#getInfo.

These changes support data sources and storage systems accessed via the Iceberg, Hive, and JDBC connectors.

Example Screenshots (if appropriate):

Context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions