# OM ingestion quick start

In this notebook, we will use a simple example to understand how the `om python sdk` works. How to use it to ingest metadata into the OM server

In [1]:
from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import (OpenMetadataConnection, AuthProvider)
from metadata.generated.schema.security.client.openMetadataJWTClientConfig import OpenMetadataJWTClientConfig

## 1. Create a connexion to the Open Metadata server
 
In the server config clause, we must define the `server host and port`, the credential(in our case is a jwt token)

In [3]:
from creds import om_admin_token
server_config = OpenMetadataConnection(
    hostPort="http://datacatalog.casd.local/api",
    authProvider=AuthProvider.openmetadata,
    securityConfig=OpenMetadataJWTClientConfig(
        jwtToken=om_admin_token,
    ),
)
metadata = OpenMetadata(server_config)

In [4]:
# if it returns true, it means the connection is success 
metadata.health_check()

True

## 2. Create a database service

As we mentioned before, there is a hierarchy of entity. So we can just create an orphan table which belongs to no database and database service. So we must create a database service(server), then a database, then a schema(optional), then a table.

The below code create a database service which represents a mysql server, it may contain one or more databases.

In [27]:
from metadata.generated.schema.api.services.createDatabaseService import CreateDatabaseServiceRequest
from metadata.generated.schema.entity.services.connections.database.common.basicAuth import BasicAuth
from metadata.generated.schema.entity.services.connections.database.mysqlConnection import MysqlConnection
from metadata.generated.schema.entity.services.databaseService import (DatabaseConnection, DatabaseService, DatabaseServiceType,)

db_service = CreateDatabaseServiceRequest(
    name="test-db-service",
    serviceType=DatabaseServiceType.Mysql,
    connection=DatabaseConnection(
        config=MysqlConnection(
            username="db_login",
            authType=BasicAuth(password="db_name"),
            hostPort="http://db_url:1234",
        )
    ),
)

# when we create an entity by using function `create_or_update`, it returns the created instance of the query
db_service_entity = metadata.create_or_update(data=db_service)

In [28]:
# let's check the content

print(f"the type is {type(db_service_entity)}")

print(f"the content is {db_service_entity}")

the type is <class 'metadata.generated.schema.entity.services.databaseService.DatabaseService'>
the content is id=Uuid(__root__=UUID('25bd2d97-2459-4890-a326-8c2bdf41847d')) name=EntityName(__root__='test-db-service') fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service') displayName=None serviceType=<DatabaseServiceType.Mysql: 'Mysql'> description=None connection=DatabaseConnection(config=MysqlConnection(type=<MySQLType.Mysql: 'Mysql'>, scheme=<MySQLScheme.mysql_pymysql: 'mysql+pymysql'>, username='db_login', authType=BasicAuth(password=SecretStr('**********')), hostPort='http://db_url:1234', databaseName=None, databaseSchema=None, sslConfig=None, connectionOptions=None, connectionArguments=None, supportsMetadataExtraction=SupportsMetadataExtraction(__root__=True), supportsDBTExtraction=SupportsDBTExtraction(__root__=True), supportsProfiler=SupportsProfiler(__root__=True), supportsQueryComment=SupportsQueryComment(__root__=True), sampleDataStorageConfig=None)) pipelin

## 3. Creating a Database 

We have created a database service, now we need to create a database inside this service.
Any Entity that is created and linked to another Entity, has to hold the fullyQualifiedName to the Entity it relates to. 
In our case, the new database must be bound to the specific service `test-db-service`.

In [31]:
from metadata.generated.schema.api.data.createDatabase import CreateDatabaseRequest

db_entity_req = CreateDatabaseRequest(
    name="test-db",
    service=db_service_entity.fullyQualifiedName,
)

db_entity = metadata.create_or_update(data=db_entity_req)

In [32]:
# let's check the content

print(f"the type is {type(db_entity)}")

print(f"the content is {db_entity}")

the type is <class 'metadata.generated.schema.entity.data.database.Database'>
the content is id=Uuid(__root__=UUID('2f24f0ac-903d-46db-9bf8-ba8873162b92')) name=EntityName(__root__='test-db') fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db') displayName=None description=None dataProducts=None tags=None version=EntityVersion(__root__=0.1) updatedAt=Timestamp(__root__=1719305281875) updatedBy='ingestion-bot' href=Href(__root__=AnyUrl('http://datacatalog.casd.local/api/v1/databases/2f24f0ac-903d-46db-9bf8-ba8873162b92', scheme='http', host='datacatalog.casd.local', tld='local', host_type='domain', path='/api/v1/databases/2f24f0ac-903d-46db-9bf8-ba8873162b92')) owner=None service=EntityReference(id=Uuid(__root__=UUID('25bd2d97-2459-4890-a326-8c2bdf41847d')), type='databaseService', name='test-db-service', fullyQualifiedName='test-db-service', description=None, displayName='test-db-service', deleted=False, inherited=None, href=Href(__root__=AnyUrl('http://datac

## 4. Creating the Schema

The same happens with the Schemas. They are related to a Database.

In [33]:
from metadata.generated.schema.api.data.createDatabaseSchema import (
    CreateDatabaseSchemaRequest,
)

create_schema_req = CreateDatabaseSchemaRequest(
    name="test-schema", database=db_entity.fullyQualifiedName
)

# the create request will return the fqn(fully qualified name) of the created schema
schema_entity = metadata.create_or_update(data=create_schema_req)

In [34]:
# let's check the content

print(f"the type is {type(schema_entity)}")

print(f"the content is {schema_entity}")

the type is <class 'metadata.generated.schema.entity.data.databaseSchema.DatabaseSchema'>
the content is id=Uuid(__root__=UUID('d29afd1a-a0c8-4de4-81f2-bee7e351b1b8')) name=EntityName(__root__='test-schema') fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db.test-schema') displayName=None description=None dataProducts=None version=EntityVersion(__root__=0.1) updatedAt=Timestamp(__root__=1719305443257) updatedBy='ingestion-bot' href=Href(__root__=AnyUrl('http://datacatalog.casd.local/api/v1/databaseSchemas/d29afd1a-a0c8-4de4-81f2-bee7e351b1b8', scheme='http', host='datacatalog.casd.local', tld='local', host_type='domain', path='/api/v1/databaseSchemas/d29afd1a-a0c8-4de4-81f2-bee7e351b1b8')) owner=None service=EntityReference(id=Uuid(__root__=UUID('25bd2d97-2459-4890-a326-8c2bdf41847d')), type='databaseService', name='test-db-service', fullyQualifiedName='test-db-service', description=None, displayName='test-db-service', deleted=False, inherited=None, href=Href

## 5. Creating the Tables

And finally, Tables are contained in a specific Schema, so we use the fullyQualifiedName here as well.

We are doing a simple example with a single column.

In [36]:
from metadata.generated.schema.api.data.createTable import CreateTableRequest
from metadata.generated.schema.entity.data.table import Column, DataType

table_a = CreateTableRequest(
    name="test_user",
    databaseSchema=schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT,description="id of the user"),
             Column(name="age", dataType=DataType.INT,description="age of the user")],
)

table_b = CreateTableRequest(
    name="test_order",
    databaseSchema=schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT,description="id of the user"),
             Column(name="product_id", dataType=DataType.BIGINT,description="product id"),
             Column(name="uid", dataType=DataType.BIGINT,description="id of the user which start the order"),],
)

table_a_entity = metadata.create_or_update(data=table_a)
table_b_entity = metadata.create_or_update(data=table_b)

## 6. Fetching the created entities

We can use the function `get_by_name` to get any created entities. We need to specify:
 - the entity type
  - the entity fqn

In [39]:
from metadata.generated.schema.entity.data.table import Table

table_order_ref = metadata.get_by_name(entity=Table, fqn="test-db-service.test-db.test-schema.test_order")

if table_order_ref:
    print(f"The type of the response: {type(table_order_ref)}")
    print(f"Content of the response: {table_order_ref}")
else:
    print("Check if your fqn is valid or not")

The type of the response: <class 'metadata.generated.schema.entity.data.table.Table'>
Content of the response: id=Uuid(__root__=UUID('4c6eca8b-2edd-4f93-ae85-84ec8c197f23')) name=EntityName(__root__='test_order') displayName=None fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db.test-schema.test_order') description=None version=EntityVersion(__root__=0.1) updatedAt=Timestamp(__root__=1719306099247) updatedBy='ingestion-bot' href=Href(__root__=AnyUrl('http://datacatalog.casd.local/api/v1/tables/4c6eca8b-2edd-4f93-ae85-84ec8c197f23', scheme='http', host='datacatalog.casd.local', tld='local', host_type='domain', path='/api/v1/tables/4c6eca8b-2edd-4f93-ae85-84ec8c197f23')) tableType=None columns=[Column(name=ColumnName(__root__='id'), displayName=None, dataType=<DataType.BIGINT: 'BIGINT'>, arrayDataType=None, dataLength=None, precision=None, scale=None, dataTypeDisplay='bigint', description=Markdown(__root__='id of the user'), fullyQualifiedName=FullyQualifiedEn

## Delete entity

We can use the `delete` method to remove existing entities.

Below example shows how to delete a table

> you will notice that in the lineage UI, you can still find the `tableB` which is marked as deleted.

In [40]:
metadata.delete(entity=Table, entity_id=table_order_ref.id)

So if we want to clean up a database server, we can use the below code to clean the database recursively.

In [41]:
service_id = metadata.get_by_name(
    entity=DatabaseService, fqn="test-db-service"
).id

metadata.delete(
    entity=DatabaseService,
    entity_id=service_id,
    recursive=True,
    hard_delete=True,
)