# Custom an entity type

To create your own entity type, you must understand how an entity is created. Here we will use an entity `MlModel` to illustrate.

An `entity type` is also called an `entity definition`, which is a `.json file` which defines all `required attributes and optional attributes`. for example, the definition of `MlModel` can be found [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/mlmodel.json)

In the last section of the json file, you can find 
```shell
"required": ["id", "name", "algorithm", "service"],
```
which means, `id`, `name`, `algorithm`, `service` are the only required properties when creating an MlModel. Other properties which are defined in `definitions` are optional.

For example, below code shows the definition of the property `featureSourceDataType`. It has string type and value must be one of the value in the enum list. 

```shell
"featureSourceDataType": {
      "javaType": "org.openmetadata.schema.type.FeatureSourceDataType",
      "description": "This enum defines the type of data of a ML Feature source.",
      "type": "string",
      "enum": [
        "integer",
        "number",
        "string",
        "array",
        "date",
        "timestamp",
        "object",
        "boolean"
      ]
    },
```

Below code shows the definition of the property `dashboard`, which is a reference to a `Dashboard Entity` present in OpenMetadata (what we call an EntityReference).

```shell
"dashboard": {
      "description": "Performance Dashboard URL to track metric evolution.",
      "$ref": "../../type/entityReference.json"
    },
```

In [1]:
from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import (OpenMetadataConnection, AuthProvider)
from metadata.generated.schema.security.client.openMetadataJWTClientConfig import OpenMetadataJWTClientConfig

In [2]:
from creds import om_admin_token
server_config = OpenMetadataConnection(
    hostPort="http://datacatalog.casd.local/api",
    authProvider=AuthProvider.openmetadata,
    securityConfig=OpenMetadataJWTClientConfig(
        jwtToken=om_admin_token,
    ),
)
metadata = OpenMetadata(server_config)

In [3]:
# if it returns true, it means the connection is success 
metadata.health_check()

True

# Create an ML model service

A ml model service is an abstraction of a server which tracks the ml model. In this example, we use this service to represent a mlflow server.

In [4]:
from metadata.generated.schema.api.services.createMlModelService import CreateMlModelServiceRequest
from metadata.generated.schema.entity.services.mlmodelService import (
    MlModelConnection,
    MlModelService,
    MlModelServiceType,
)
from metadata.generated.schema.entity.services.connections.mlmodel.mlflowConnection import MlflowConnection
from metadata.generated.schema.entity.data.mlmodel import (
    FeatureSource,
    FeatureSourceDataType,
    FeatureType,
    MlFeature,
    MlHyperParameter,
    MlModel,
)
ml_service_create = CreateMlModelServiceRequest(
        name="test-model-service",
        serviceType=MlModelServiceType.Mlflow,
        connection=MlModelConnection(
            config=MlflowConnection(
                trackingUri="http://localhost:1234",
                registryUri="http://localhost:4321",
            )
        ),
    )

ml_service_entity=metadata.create_or_update(ml_service_create)


In [5]:

print(ml_service_entity)

id=Uuid(__root__=UUID('3756066a-706b-4410-ac35-9e2566cfd397')) name=EntityName(__root__='test-model-service') fullyQualifiedName=FullyQualifiedEntityName(__root__='test-model-service') serviceType=<MlModelServiceType.Mlflow: 'Mlflow'> description=None displayName=None version=EntityVersion(__root__=0.1) updatedAt=Timestamp(__root__=1719910991286) updatedBy='ingestion-bot' pipelines=None connection=MlModelConnection(config=MlflowConnection(type=<MlflowType.Mlflow: 'Mlflow'>, trackingUri='http://localhost:1234', registryUri='http://localhost:4321', supportsMetadataExtraction=SupportsMetadataExtraction(__root__=True))) testConnectionResult=None tags=None owner=None href=Href(__root__=AnyUrl('http://datacatalog.casd.local/api/v1/services/mlmodelServices/3756066a-706b-4410-ac35-9e2566cfd397', scheme='http', host='datacatalog.casd.local', tld='local', host_type='domain', path='/api/v1/services/mlmodelServices/3756066a-706b-4410-ac35-9e2566cfd397')) changeDescription=None deleted=False dataPr

## create data source

The feature which we use to train the model is from various data source. In this example, we suppose the data source is a table from a mysql database.

In [9]:
from metadata.generated.schema.api.data.createMlModel import CreateMlModelRequest
from metadata.generated.schema.entity.data.table import Table
from metadata.generated.schema.api.services.createDatabaseService import CreateDatabaseServiceRequest
from metadata.generated.schema.entity.services.connections.database.common.basicAuth import BasicAuth
from metadata.generated.schema.entity.services.connections.database.mysqlConnection import MysqlConnection
from metadata.generated.schema.entity.services.databaseService import (DatabaseConnection, DatabaseService, DatabaseServiceType,)
from metadata.generated.schema.api.data.createDatabase import CreateDatabaseRequest
from metadata.generated.schema.api.data.createTable import CreateTableRequest
from metadata.generated.schema.entity.data.table import Column, DataType
from metadata.generated.schema.api.data.createDatabaseSchema import CreateDatabaseSchemaRequest

# create database service
db_service = CreateDatabaseServiceRequest(
    name="test-db-service",
    serviceType=DatabaseServiceType.Mysql,
    connection=DatabaseConnection(
        config=MysqlConnection(
            username="db_login",
            authType=BasicAuth(password="db_name"),
            hostPort="http://db_url:1234",
        )
    ),
)
# when we create an entity by using function `create_or_update`, it returns the created instance of the query
db_service_entity = metadata.create_or_update(data=db_service)

# create a database 
db_entity_req = CreateDatabaseRequest(
    name="test-db",
    service=db_service_entity.fullyQualifiedName,
)

db_entity = metadata.create_or_update(data=db_entity_req)

# create a schema
create_schema_req = CreateDatabaseSchemaRequest(
    name="test-schema", 
    database=db_entity.fullyQualifiedName)

# the create request will return the fqn(fully qualified name) of the created schema
schema_entity = metadata.create_or_update(data=create_schema_req)

# create a table
table_a = CreateTableRequest(
    name="test_user",
    databaseSchema=schema_entity.fullyQualifiedName,
    columns=[Column(name="id", dataType=DataType.BIGINT,description="id of the user"),
             Column(name="age", dataType=DataType.INT,description="age of the user")],
)

table_a_entity = metadata.create_or_update(data=table_a)



In [10]:
print(table_a_entity)

id=Uuid(__root__=UUID('246a3a35-3565-41b4-b5e2-cea0ba1270ee')) name=EntityName(__root__='test_user') displayName=None fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db.test-schema.test_user') description=None version=EntityVersion(__root__=0.1) updatedAt=Timestamp(__root__=1719912639212) updatedBy='ingestion-bot' href=Href(__root__=AnyUrl('http://datacatalog.casd.local/api/v1/tables/246a3a35-3565-41b4-b5e2-cea0ba1270ee', scheme='http', host='datacatalog.casd.local', tld='local', host_type='domain', path='/api/v1/tables/246a3a35-3565-41b4-b5e2-cea0ba1270ee')) tableType=None columns=[Column(name=ColumnName(__root__='id'), displayName=None, dataType=<DataType.BIGINT: 'BIGINT'>, arrayDataType=None, dataLength=None, precision=None, scale=None, dataTypeDisplay='bigint', description=Markdown(__root__='id of the user'), fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db.test-schema.test_user.id'), tags=None, constraint=None, ordinalPositio

## 3. Create a model

Now we have all what we need to create a model. 

In [16]:
model_create=CreateMlModelRequest(
            name="test-model",
            algorithm="random_forest",
            mlFeatures=[
                MlFeature(
                    name="age",
                    dataType=FeatureType.numerical,
                    featureSources=[
                        FeatureSource(
                            name="age",
                            dataType=FeatureSourceDataType.integer,
                            dataSource=metadata.get_entity_reference(
                                entity=Table, fqn=table_a_entity.fullyQualifiedName
                            ),
                        )
                    ],
                    featureAlgorithm="Bucketing",
                     description="feature to show the age",
                ),
                MlFeature(
                    name="persona",
                    dataType=FeatureType.categorical,
                    featureSources=[
                        FeatureSource(
                            name="age",
                            dataType=FeatureSourceDataType.integer,
                            dataSource=metadata.get_entity_reference(
                                entity=Table, fqn=table_a_entity.fullyQualifiedName
                            ),
                        ),
                        FeatureSource(
                            name="id",
                            dataType=FeatureSourceDataType.integer,
                            dataSource=metadata.get_entity_reference(
                                entity=Table, fqn=table_a_entity.fullyQualifiedName
                            ),
                        ),
                        FeatureSource(
                            name="city", dataType=FeatureSourceDataType.string
                        ),
                    ],
                    featureAlgorithm="PCA",
                    description="feature to identify the person",
                ),
            ],
            mlHyperParameters=[
                MlHyperParameter(name="regularisation", value="0.5"),
                MlHyperParameter(name="random", value="hello"),
            ],
            target="myTarget",
            service=ml_service_entity.fullyQualifiedName,
        )

model_entity = metadata.create_or_update(model_create)

In [14]:
print(model_entity)

id=Uuid(__root__=UUID('c6f67a98-122b-49f4-bc67-2816a8921d91')) name=EntityName(__root__='test-model') fullyQualifiedName=FullyQualifiedEntityName(__root__='test-model-service.test-model') displayName=None description=None algorithm='random_forest' mlFeatures=[MlFeature(name=EntityName(__root__='age'), dataType=<FeatureType.numerical: 'numerical'>, description=None, fullyQualifiedName=FullyQualifiedEntityName(__root__='test-model-service.test-model.age'), featureSources=[FeatureSource(name=EntityName(__root__='age'), dataType=<FeatureSourceDataType.integer: 'integer'>, description=None, fullyQualifiedName=FullyQualifiedEntityName(__root__='test-db-service.test-db.test-schema.test_user.age'), dataSource=EntityReference(id=Uuid(__root__=UUID('246a3a35-3565-41b4-b5e2-cea0ba1270ee')), type='table', name=None, fullyQualifiedName='test-db-service.test-db.test-schema.test_user', description=None, displayName=None, deleted=None, inherited=None, href=Href(__root__=AnyUrl('http://datacatalog.casd

## 4. Clean up

In [17]:
# remove the database service
service_id = metadata.get_by_name(
    entity=DatabaseService, fqn="test-db-service"
).id

metadata.delete(
    entity=DatabaseService,
    entity_id=service_id,
    recursive=True,
    hard_delete=True,
)

In [18]:
# remove the ml model service
service_id = metadata.get_by_name(
    entity=MlModelService, fqn="test-model-service"
).id

metadata.delete(
    entity=MlModelService,
    entity_id=service_id,
    recursive=True,
    hard_delete=True,
)