# Querying DataHub with REST API

This notebook will aim to show some ways of querying of the ways to query DataHub through the REST API.

In [None]:
# Set the environment variables we need
base_url = 'http://localhost:9002' # Location of DataHub
token = 'TOKEN' # Get your token from Settings or ask an admin to do it for you

In [None]:
# GET ENTITY

import requests
from urllib.parse import quote

# Set the urn for the entity you will be looking at 
# (you can find it in the url in DataHub, it should be in the following format)
urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Asset,DEV)'

# Endpoint for GET request
endpoint = f'/openapi/entities/v1/latest?urns={quote(urn)}'

url = f"{base_url}{endpoint}"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.get(
    url=url,
    headers=headers
)


In [None]:
response

In [None]:
response.json()

In [None]:
# Let's extract the descriptions for the columns for this response
response_dict = response.json()

column_desc = response_dict['responses'][urn]['aspects']['editableSchemaMetadata']['value']['editableSchemaFieldInfo']

In [None]:
column_desc

As we can see, column_desc is now a list of dictionaries with 'fieldPath' holding the name of the column, and 'description' holding the description.

From the response of our query, we can also get a wide range of other metadata - one that may be useful is the column types.

In [None]:
# Let's now get the other metadata for the columns, which is obtained on ingestion

column_types = response_dict['responses'][urn]['aspects']['schemaMetadata']['value']['fields']

In [None]:
column_types

Here we get another list of dictionaries, containing the column type information.

Now let's get the table descriptions.

In [None]:
# # Let's now get the table description

table_description = response_dict['responses'][urn]['aspects']['editableDatasetProperties']['value']['description']

In [None]:
table_description