You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In 0.8 we added lineage APIs and currently support lineage extraction from several sources
Airflow
BigQuery
Redshift
Snowflake
MSSQL
Vertica
View Level Lineage Capture
Metabase
Superset
Tableau
This list is continuing to grow with each release.
We should also allow capturing lineage at the column level through the APIs and also through the above sources, enabling the UI lineage editor to build lineage at the column level as well
Currently, OpenMetadata supports the following lineage:
Table to Table
Tables to a pipeline and then to the output table
Nodes in the lineage graph can also include other data assets, such as dashboard and reports, etc.
This is the proposal to add column-level lineage to both the backend and supporting it in Lineage APIs. The lineage edge between two tables are enhanced to add column level lineage information as follows:
The column lineage details are stored as additional property of the lineage edge between two tables using the following new type, lineageDetails with:
columnLineage that has a set of source columns, function used for transforming them into destination column
SQLQuery that consumes a set of source tables to generate the destination table
pipeline that ran the SQL query to generate the destination table
The lineageDetails schema is shown below:
"columnLineage": {
"type" : "object",
"properties": {
"fromColumns" : {
"description": "One or more source columns identified by fully qualified column name used by transformation function to create destination column.",
"type" : "array",
"items" : {
"$ref" : "../type/basic.json#/definitions/fullyQualifiedEntityName"
}
},
"toColumn" : {
"description": "Destination column identified by fully qualified column name created by the transformation of source columns.",
"$ref" : "../type/basic.json#/definitions/fullyQualifiedEntityName"
},
"function" : {
"description": "Transformation function applied to source columns to create destination column. That is `function(fromColumns) -> toColumn`.",
"$ref" : "../type/basic.json#/definitions/sqlFunction"
}
}
},
"lineageDetails" : {
"description" : "Lineage details including sqlQuery + pipeline + columnLineage.",
"type" : "object",
"properties": {
"sqlQuery" : {
"description": "SQL used for transformation.",
"$ref" : "../type/basic.json#/definitions/sqlQuery"
},
"columnsLineage" : {
"description" : "Lineage information of how upstream columns were combined to get downstream column.",
"type" : "array",
"items" : {
"$ref" : "#/definitions/columnLineage"
}
},
"pipeline" : {
"description": "Pipeline where the sqlQuery is periodically run.",
"$ref" : "../type/entityReference.json"
}
},
"required": ["sqlQuery", "columnsLineage"]
},
All the existing APIs will remain as it is except with the addition of lineageDetails to the edge.
Ingestion Support
UI Support
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In 0.8 we added lineage APIs and currently support lineage extraction from several sources
This list is continuing to grow with each release.
We should also allow capturing lineage at the column level through the APIs and also through the above sources, enabling the UI lineage editor to build lineage at the column level as well
Task breakdown
API Support
Currently, OpenMetadata supports the following lineage:
This is the proposal to add column-level lineage to both the backend and supporting it in Lineage APIs. The lineage edge between two tables are enhanced to add column level lineage information as follows:
The column lineage details are stored as additional property of the lineage edge between two tables using the following new type,
lineageDetails
with:columnLineage
that has a set of source columns, function used for transforming them into destination columnSQLQuery
that consumes a set of source tables to generate the destination tablepipeline
that ran the SQL query to generate the destination tableThe
lineageDetails
schema is shown below:All the existing APIs will remain as it is except with the addition of
lineageDetails
to theedge
.Ingestion Support
UI Support
The text was updated successfully, but these errors were encountered: