# OM ingestion quick start

In this notebook, we will use a simple example to understand how the `om python sdk` works. How to use it to ingest metadata into the OM server

In [1]:
from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import (OpenMetadataConnection, AuthProvider)
from metadata.generated.schema.security.client.openMetadataJWTClientConfig import OpenMetadataJWTClientConfig

In the server config clause, we must define

In [3]:
from creds import om_admin_token
server_config = OpenMetadataConnection(
    hostPort="http://datacatalog.casd.local/api",
    authProvider=AuthProvider.openmetadata,
    securityConfig=OpenMetadataJWTClientConfig(
        jwtToken=om_admin_token,
    ),
)
metadata = OpenMetadata(server_config)

In [4]:
metadata.health_check()

True

## 2. creating a database service

In [5]:

from metadata.generated.schema.api.services.createDatabaseService import (
    CreateDatabaseServiceRequest,
)
from metadata.generated.schema.entity.services.connections.database.common.basicAuth import (
    BasicAuth,
)
from metadata.generated.schema.entity.services.connections.database.mysqlConnection import (
    MysqlConnection,
)
from metadata.generated.schema.entity.services.databaseService import (
    DatabaseConnection,
    DatabaseService,
    DatabaseServiceType,
)

db_service = CreateDatabaseServiceRequest(
    name="test-service-db-lineage",
    serviceType=DatabaseServiceType.Mysql,
    connection=DatabaseConnection(
        config=MysqlConnection(
            username="db_login",
            authType=BasicAuth(password="db_name"),
            hostPort="http://db_url:1234",
        )
    ),
)

db_service_entity = metadata.create_or_update(data=db_service)

## 3. Creating the Database

Any Entity that is created and linked to another Entity, has to hold the fullyQualifiedName to the Entity it relates to. In this case, a Database is bound to a specific service.

In [6]:
from metadata.generated.schema.api.data.createDatabase import CreateDatabaseRequest

create_db = CreateDatabaseRequest(
    name="test-db",
    service=db_service_entity.fullyQualifiedName,
)

create_db_entity = metadata.create_or_update(data=create_db)

## 4. Creating the Schema

The same happens with the Schemas. They are related to a Database.

In [7]:
from metadata.generated.schema.api.data.createDatabaseSchema import (
    CreateDatabaseSchemaRequest,
)

create_schema = CreateDatabaseSchemaRequest(
    name="test-schema", database=create_db_entity.fullyQualifiedName
)

# the create request will return the fqn(fully qualified name) of the created schema
create_schema_entity = metadata.create_or_update(data=create_schema)

## 5. Creating the Tables

And finally, Tables are contained in a specific Schema, so we use the fullyQualifiedName here as well.

We are doing a simple example with a single column.

In [8]:
from metadata.generated.schema.api.data.createTable import CreateTableRequest
from metadata.generated.schema.entity.data.table import Column, DataType

table_a = CreateTableRequest(
    name="tableA",
    databaseSchema=create_schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT)],
)

table_b = CreateTableRequest(
    name="tableB",
    databaseSchema=create_schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT)],
)

table_a_entity = metadata.create_or_update(data=table_a)
table_b_entity = metadata.create_or_update(data=table_b)

## 6. Adding Lineage

With everything prepared, we can now create the Lineage between both Entities. An `AddLineageRequest` type represents the `edge` between two `Entities`, typed under `EntitiesEdge`.

In the below example, we created an edge between table A and B. And the edge is directed with the keyword `fromEntity` and `toEntity`



In [10]:
from metadata.generated.schema.api.lineage.addLineage import AddLineageRequest
from metadata.generated.schema.type.entityLineage import EntitiesEdge 
from metadata.generated.schema.type.entityReference import EntityReference

add_lineage_request = AddLineageRequest(
    edge=EntitiesEdge(
        description="test lineage",
        fromEntity=EntityReference(id=table_a_entity.id, type="table"),
        toEntity=EntityReference(id=table_b_entity.id, type="table"),
    ),
)

created_lineage = metadata.add_lineage(data=add_lineage_request)

## 7. Fetching Lineage

Finally, let's fetch the lineage from the other node involved:

In [11]:
from metadata.generated.schema.entity.data.table import Table

metadata.get_lineage_by_name(
    entity=Table,
    fqn="test-service-db-lineage.test-db.test-schema.tableB",
    # Tune this to control how far in the lineage graph to go
    up_depth=1,
    down_depth=1
)

{'entity': {'id': 'd337b285-24d4-4dfa-9986-680f53ffc453',
  'type': 'table',
  'name': 'tableB',
  'fullyQualifiedName': 'test-service-db-lineage.test-db.test-schema.tableB',
  'displayName': 'tableB',
  'deleted': False,
  'href': 'http://datacatalog.casd.local/api/v1/tables/d337b285-24d4-4dfa-9986-680f53ffc453'},
 'nodes': [{'id': '3e0be9f7-df11-43b6-9132-549d4741c49c',
   'type': 'table',
   'name': 'tableA',
   'fullyQualifiedName': 'test-service-db-lineage.test-db.test-schema.tableA',
   'displayName': 'tableA',
   'deleted': False,
   'href': 'http://datacatalog.casd.local/api/v1/tables/3e0be9f7-df11-43b6-9132-549d4741c49c'}],
 'upstreamEdges': [{'fromEntity': '3e0be9f7-df11-43b6-9132-549d4741c49c',
   'toEntity': 'd337b285-24d4-4dfa-9986-680f53ffc453'}],
 'downstreamEdges': []}

## 8. Lineage Details

Note how when adding lineage information we give to the API an [AddLineage](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/api/lineage/addLineage.json) Request. This is composed of an Entity Edge, whose definition you can find [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/type/entityLineage.json).

In a nutshell, an Entity Edge has:

1. The Entity Reference as the lineage origin,
2. The Entity Reference as the lineage destination,
3. Optionally, Lineage Details.

In the Lineage Details property we can pass further information specific about Table to Table lineage:
- `sqlQuery` specifying the transformation,
- An array of `columnsLineage` as an object with an array of source and destination columns, as well as their own specific transformation function,
- Optionally, the Entity Reference of a Pipeline powering the transformation from Table A to Table B.

The API call will be exactly the same as before, but now we will add more ingredients when defining our objects. Let's see how to do that and play with the possible combinations:

First, import the required classes and create a new table:

In [13]:
from metadata.generated.schema.type.entityLineage import (
    ColumnLineage,
    EntitiesEdge,
    LineageDetails,
)

# Prepare a new table
table_c = CreateTableRequest(
    name="tableC",
    databaseSchema=create_schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT)],
)

table_c_entity = metadata.create_or_update(data=table_c)

###  8.1 Column Level Lineage

We can start by linking our columns together. For that we are going to create:

- A `ColumnLineage` object, linking our Table A column ID -> Table C column ID. Note that this can be a list!
- A `LineageDetails` object, passing the column lineage and the SQL query that powers the transformation.

In [14]:
# a column lineage object has two arguments, fromColumns indicates the source columns, toColumn indicates the destination columns
column_lineage = ColumnLineage(
    fromColumns=["test-service-db-lineage.test-db.test-schema.tableA.id"],
    toColumn="test-service-db-lineage.test-db.test-schema.tableC.id"
)

# a lineage details contains the sql query which does the data transformation
# columnLineage contains the info of column relations
lineage_details = LineageDetails(
    sqlQuery="SELECT * FROM AWESOME",
    columnsLineage=[column_lineage],
)

add_lineage_request = AddLineageRequest(
    edge=EntitiesEdge(
        fromEntity=EntityReference(id=table_a_entity.id, type="table"),
        toEntity=EntityReference(id=table_c_entity.id, type="table"),
        lineageDetails=lineage_details,
    ),
)

created_lineage = metadata.add_lineage(data=add_lineage_request)

After running the above command, you could see the `column lineage in the web UI`. 
> Click on the `Lineage` tab, on bottom left corner, you can find a `Layers` button, click on it and choose `column` options.


### 8.2 Adding a Pipeline Reference

We can as well pass the reference to the pipeline used to create the lineage (e.g., the ETL feeding the tables).

To prepare this example, we need to start by creating the `Pipeline Entity`. Again, we'll need first to prepare the `Pipeline Service`:

In [15]:
from metadata.generated.schema.api.data.createPipeline import CreatePipelineRequest
from metadata.generated.schema.api.services.createPipelineService import (
    CreatePipelineServiceRequest,
)
from metadata.generated.schema.entity.services.pipelineService import (
    PipelineConnection,
    PipelineService,
    PipelineServiceType,
    airflowConnection
)

from metadata.generated.schema.entity.services.connections.pipeline.backendConnection import (
    BackendConnection,
)

pipeline_service = CreatePipelineServiceRequest(
    name="test-service-pipeline",
    serviceType=PipelineServiceType.Airflow,
    connection=PipelineConnection(
        config=airflowConnection.AirflowConnection(
            hostPort="http://localhost:8080",
            connection=BackendConnection(),
        ),
    ),
)

pipeline_service_entity = metadata.create_or_update(data=pipeline_service)

create_pipeline = CreatePipelineRequest(
    name="test",
    service=pipeline_service_entity.fullyQualifiedName,
)

pipeline_entity = metadata.create_or_update(data=create_pipeline)

With the newly created pipeline service, we can now create a new `LineageDetails` which contains three attributes:
- A `sqlQuery` attribute: which stores the sql query
- A `columnsLineage` attribute, passing the column lineage details
- A `pipeline` attribute, specify the pipeline entity which the lineage uses.

In [17]:
column_lineage = ColumnLineage(
    fromColumns=["test-service-db-lineage.test-db.test-schema.tableA.id"],
    toColumn="test-service-db-lineage.test-db.test-schema.tableC.id"
)

lineage_details = LineageDetails(
    sqlQuery="SELECT * FROM AWESOME",
    columnsLineage=[column_lineage],
    pipeline=EntityReference(id=pipeline_entity.id, type="pipeline"),
)

add_lineage_request = AddLineageRequest(
    edge=EntitiesEdge(
        fromEntity=EntityReference(id=table_a_entity.id, type="table"),
        toEntity=EntityReference(id=table_c_entity.id, type="table"),
        lineageDetails=lineage_details,
        description="show how a lineage works",
    ),
)

created_lineage = metadata.add_lineage(data=add_lineage_request)

## 9. Automated SQL lineage

Let's create a new table 

In [18]:
# Prepare a new table tableD
table_d = CreateTableRequest(
    name="tableD",
    databaseSchema=create_schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT,description="id of the user"),
             Column(name="age", dataType=DataType.BIGINT,description="age of the user")],
)

table_d_entity = metadata.create_or_update(data=table_d)

In [19]:
lineage_service: DatabaseService = metadata.get_by_name(
    entity=DatabaseService, fqn="test-service-db-lineage"
)

metadata.add_lineage_by_query(
    database_service=lineage_service,
    timeout=200, # timeout in seconds
    sql="insert into tableD(id, id+1) as select id from tableA" # your sql query
)

With the above command, a new table `tableD` is created, and a lineage between `tableA` and `tableD` is created as well.

## 10. The ingestion with CLI 

The python package `openmetadata-ingestion` which we have installed via `pip install openmetadata-ingestion` offers us an CLI as well, we can use it to ingest data into to OM server without using any python code

The general form is 

```shell
metadata lineage -c path/to/config_yaml.yaml
```

Below is an example of the yaml file which can connect to a OM server and create a new lineage between two tables

```yaml
serviceName: test-service-db-lineage
query: insert into tableD(id, id+1) as select id from tableA
# filePath: test.sql
# parseTimeout: 360 # timeout in seconds
workflowConfig:
  # loggerLevel: DEBUG  # DEBUG, INFO, WARN or ERROR
  openMetadataServerConfig:
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>
```

- **serviceName**: Name of the database service which contains the table involved in query.
- **query**: You can specify the raw sql query within the yaml file itself.
- **filePath**: In case the query is too big then you can also save query in a file and pass the path to the file in this field.
- **parseTimeout**: Timeout for the lineage parsing process.
- **workflowConfig**: The main property here is the openMetadataServerConfig, where you can define the host and security provider of your OpenMetadata installation.