# Pushing Metadata to DataHub through the API

This notebook will cover how to push metadata to DataHub through the API. For this case we will use the Python Emitter class.

The functions that are shown here are used in populate_descriptions.py, in the modules folder.

An important thing to note before pushing table & column descriptions! Once you push these descriptions to DataHub you will replace any table or column descriptions currently in place. 

Especially with column descriptions - even if you are pushing a description for only one column, the emitter will replace all other column descriptions with blanks.

In [None]:
# Import modules
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
    DatasetSnapshotClass,
    MetadataChangeEventClass,
    EditableSchemaMetadataClass,
    EditableDatasetPropertiesClass,
)
import requests
import pandas as pd
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

In [None]:
# Set the environment variables we need
gms_server = 'http://localhost:8080' # In this case we have to use the 8080 port for populating data
token = 'TOKEN' # Get your token from Settings or ask an admin to do it for you if you are using the on premise deployment

In [None]:
# We define our function for emitting table descriptions.

def table_desc_emitter(gms_server: str, token:str, urn: str, table_description: str) -> None:
    """ 
    Function to emit metadata to PROD environement.

    Input:
    - gms_server        : the host for DataHub
    - token             : authentication token
    - urn               : the PRODUCTION urn we will be pushing to 
    - table_description : description of the table gorgeous

    Output:
    - None, emits metadata

    """
    try:
        logging.info("Emitting table description for URN: %s", urn)
        emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

        editable_dataset_properties_aspect = EditableDatasetPropertiesClass(
            description=table_description
        )

        dataset_snapshot = DatasetSnapshotClass(
            urn=urn,
            aspects=[
                editable_dataset_properties_aspect,
            ]
        )

        mce = MetadataChangeEventClass(proposedSnapshot=dataset_snapshot)

        emitter.emit(mce)
        emitter.close()
        logging.info("Successfully emitted table description metadata")

    except Exception as e:
        logging.error("Error emitting table description for URN %s: %s", urn, e)

In [None]:
# Set the urn for the entity you will be adding metadata to
# (you can find it in the url in DataHub, it should be in the following format)

urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,server.db.schema.table,DEV)'

# Let's also set some table description
table_description = "Placeholder description for testing <3"

# Call the function
table_desc_emitter(gms_server=gms_server,
                   token=token,
                   urn=urn,
                   table_description=table_description)

We have successfully pushed a table description to DataHub :)

Now let's see how it works with column descriptions.

In [None]:
# We define our function for emitting column descriptions

def column_desc_emitter(gms_server: str, token:str, urn:str, column_dict: dict[str, str]) -> None:

    """ 
    Function to emit metadata to PROD environement.

    Input:
    - gms_server  : the host for DataHub
    - token       : authentication token
    - urn         : the PRODUCTION urn we will be pushing to
    - column_dict : a dictionary of column names and their associated descriptions bestie

    Output:
    - None

    """
    try:
        logging.info("Emitting column descriptions for URN: %s", urn)
        emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

        data_to_push = []
        for column_name, column_description in column_dict.items():
            column_data = {
                "fieldPath": column_name,
                "description": column_description
            }
            data_to_push.append(column_data)

        editable_schema_metadata_aspect = EditableSchemaMetadataClass(
            editableSchemaFieldInfo=data_to_push
        )

        dataset_snapshot = DatasetSnapshotClass(
            urn=urn,
            aspects=[
                editable_schema_metadata_aspect,
            ]
        )

        mce = MetadataChangeEventClass(proposedSnapshot=dataset_snapshot)

        emitter.emit(mce)
        emitter.close()
        logging.info("Successfully emitted column description metadata.")

    except Exception as e:
        logging.error("Error emitting column descriptions for URN %s: %s", urn, e)

In [None]:
# Let's use the same URN as before
urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,server.db.schema.table,DEV)'

# Now let's set some column descriptions for the columns in this table
column_descriptions = {'Id':'some description for Id', 
                       'FK_test': 'some description for FK_test'}

# And now we call the function:
column_desc_emitter(gms_server=gms_server,
                    token=token,
                    urn=urn,
                    column_dict=column_descriptions)