# Notebook to match the mismatched columns

Find details on mismatched columns on Confluence:
https://rystadenergy.atlassian.net/wiki/spaces/LP/pages/1575747640/Missing+Column+Descriptions

Note: to use this NB, you may need to move it into datahub_descriptions.

Let's start by getting the column descriptions.

In [13]:
from modules.scrape_tables import TableScraperDC
from modules.populate_descriptions import CustomDescriptionsDC
import pandas as pd
import logging
import configparser

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

config = configparser.ConfigParser()
config.read('src/config.conf')

['src/config.conf']

In [14]:
def get_column_descriptions():
    # Set pdf path location to the desired pdf file, in this case the UCube structure
    pdf_path = config['FILES']['pdf_path']
    xlsx_path = config['FILES']['xlsx_path']
    scraper = TableScraperDC(pdf_path=pdf_path, xlsx_path=xlsx_path)
    
    formatted_tables = scraper.extract_tables_from_xlsx(xlsx_path=xlsx_path)
    logging.info("Number of tables extracted: %s.", len(formatted_tables.keys()))

    if formatted_tables:
        logging.info("Formatted tables are ready for use.")
        return formatted_tables
    else:
        logging.info("No tables were extracted or formatted.")
        return{}

In [15]:
column_desc_tables = get_column_descriptions()

2024-02-15 15:04:54,949 - INFO - TableScraperDC initialized
2024-02-15 15:04:54,951 - INFO - Starting table extraction from XLSX: /home/rin/code/llm_proj/datahub_dev/Skuld-LLM/datahub_descriptions/src/UCube structure.xlsx
2024-02-15 15:04:55,014 - INFO - Starting to capture tables...
2024-02-15 15:04:55,015 - INFO - Processing new table: Asset
2024-02-15 15:04:55,016 - INFO - Starting to capture tables...
2024-02-15 15:04:55,017 - INFO - Finished processing table: Asset
2024-02-15 15:04:55,018 - INFO - Processing new table: AssetCompany
2024-02-15 15:04:55,019 - INFO - Finished processing table: AssetCompany
2024-02-15 15:04:55,019 - INFO - Processing new table: AssetLifeCycle
2024-02-15 15:04:55,021 - INFO - Finished processing table: AssetLifeCycle
2024-02-15 15:04:55,022 - INFO - Processing new table: BreakevenPrices
2024-02-15 15:04:55,023 - INFO - Finished processing table: BreakevenPrices
2024-02-15 15:04:55,023 - INFO - Processing new table: Company
2024-02-15 15:04:55,025 - INF

## Case 1 - mismatched capitalizations

In this case we take the table vDataFeed_UCube_AssetCompany. It has a column in MSSQL called FK_owner, but in the description tables it is FK_Owner.

After the initial ingestion, the description for FK_Owner is already there. To match it to the name of the column in MSSQL (which is the same as what appears in the UI), we will take the full GET response and analyze it, to establish the matching.

In [20]:
# Set the environment variables we need
base_url = config['DATAHUB']['base_url'] # Location of DataHub
gms_server = config['DATAHUB']['gms_server']
token = config['DATAHUB']['token']
env = 'DEV'

urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_AssetCompany,DEV)'

In [17]:
# GET ENTITY

import requests
from urllib.parse import quote

# Endpoint for GET request
endpoint = f'/openapi/entities/v1/latest?urns={quote(urn)}'

url = f"{gms_server}{endpoint}"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.get(
    url=url,
    headers=headers
)


In [18]:
response_dict = response.json()

# Column descriptions
column_desc = response_dict['responses'][urn]['aspects']['editableSchemaMetadata']['value']['editableSchemaFieldInfo']

# Column types
column_types = response_dict['responses'][urn]['aspects']['schemaMetadata']['value']['fields']


In [19]:
column_desc

[{'fieldPath': 'Id',
  'description': 'Links to FK_AssetCompany in other tables'},
 {'fieldPath': 'FK_Owner',
  'description': 'Links to FK_Company  in other tables'},
 {'fieldPath': 'FK_Asset', 'description': 'Links to FK_Asset in other tables'},
 {'fieldPath': 'Asset Share',
  'description': 'Asset Share splits all asset data onto the individual shares of the owners according to their working interest, so that these asset shares can be treated independent of the Asset and Company dimensions. The asset-share memberss are named "Asset"-"Company"._x000D__x000D_'},
 {'fieldPath': 'NOC in homecountry',
  'description': 'The NOC in Homecountry variable is "NOC at home" for a NOC in its home country where it has priviligies, otherwise "Other companies".'},
 {'fieldPath': 'Operator Non-operator',
  'description': 'Operator - Non-operator splits the production (or Economics) into the share of the operators and the share of the non-operating owners. Use Operator-Non-operator e.g. to see for a 

In [21]:
# Extract fieldPath values and lowercase them for comparison
desc_field_paths = {d['fieldPath'].lower(): d['fieldPath'] for d in column_desc}
type_field_paths = {t['fieldPath'].lower(): t['fieldPath'] for t in column_types}

# Find common fieldPath values (case-insensitive) using sets
common_field_paths_lower = set(desc_field_paths.keys()) & set(type_field_paths.keys())

# Filter out exact matches and identifying mismatches (case-sensitive)
exact_matches = []
mismatches = []

for common_lower in common_field_paths_lower:
    desc_original = desc_field_paths[common_lower]
    type_original = type_field_paths[common_lower]
    if desc_original == type_original:
        # Means that capitalization is the same and we are good to move on
        exact_matches.append(desc_original)
    else:
        # There is a mismatch between the capitalizations, so we will have to append the description column
        mismatches.append((desc_original, type_original))

exact_matches, mismatches

(['Asset Share',
  'NOC in homecountry',
  'Operator Non-operator',
  'Id',
  'FK_Asset'],
 [('FK_Owner', 'FK_owner')])

In [22]:
mismatches

[('FK_Owner', 'FK_owner')]

In [23]:
# PUSH DESCRIPTIONS TO DATAHUB
# Note that this is for only one column and needs to be reworked for multiple

desc_original = mismatches[0][0] #the capitalization from the xlsx, which we need to use to get the description
type_original = mismatches[0][1] #the correct capitalization as ingested into DataHub

# desc_original = 'FK_Owner'
# type_original = 'FK_owner'

# Call the metadata emitter
metadata_emitter = CustomDescriptionsDC(tables_dict={}, env=env, gms_server=gms_server, token=token)

# Set up the dictionary to be pushed - remember that this needs to be complete! We can't just push the individual column, we have to push the rest too.
# So we will grab them from column_desc and then modify
column_description_dict = {item['fieldPath']: item['description'] for item in column_desc}

# Replace the key 'FK_Owner' with 'FK_owner' in the transformed dictionary
if desc_original in column_description_dict:
    column_description_dict[type_original] = column_description_dict.pop(desc_original)

# Emit the metadata
metadata_emitter.column_desc_emitter(gms_server=gms_server, token=token, urn=urn, column_dict=column_description_dict)

2024-02-15 15:06:04,202 - INFO - CustomDescriptionsDC initialized for environment: DEV
2024-02-15 15:06:04,205 - INFO - Emitting column descriptions for URN: urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_AssetCompany,DEV)
2024-02-15 15:06:04,235 - INFO - Successfully emitted column description metadata.


## Case 2 - Abbreviations vol. 1

Some tables have been abbreviated in MSSQL. 

For example, we see 'NPVCurrentYearLLLC' in DataHub but in the XLSX table we have 'NPV Current Year Low Low Low Case'. We have to match these columns.

This kind of abbreviation is for the case for the NPV and Production tables.

In [24]:
# We have our descriptions for each column in the tables. Let's take NPV as an example.
column_desc_tables['Production']

Unnamed: 0,column_name,description
0,Id,all tables have Id as Primary Key
1,Fk_Asset,Links to FK_Asset from other tables
2,FK_OtherParameter,Links to FK_OtherParameter from other tables
3,FK_Company,Links to FK_Company from other tables
4,FK_OilAndGas,Links to FK_OilAndGas from other tables
5,FK_HistoricCompany,Links to FK_HistoricCompany from other tables
6,FK_DataQuality,Links to FK_DataQuality from other tables
7,FK_LifeCycleTS,Links to FK_LifeCycleTS from other tables
8,FK_LifeCycle,Links to FK_LifeCycle from other tables
9,FK_OilAndGasType,Links to FK_EOilAndGasTypee from other tables


In [33]:
# Set the environment variables we need

# LOCAL
base_url = config['DATAHUB']['base_url'] # Location of DataHub
gms_server = config['DATAHUB']['gms_server']
token = config['DATAHUB']['token']
env = 'DEV'

urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Production,DEV)'

In [34]:
# Start by getting the entities as before

import requests
from urllib.parse import quote

# Endpoint for GET request
endpoint = f'/openapi/entities/v1/latest?urns={quote(urn)}'

url = f"{gms_server}{endpoint}"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.get(
    url=url,
    headers=headers
)

response_dict = response.json()

# Column descriptions
column_desc = response_dict['responses'][urn]['aspects']['editableSchemaMetadata']['value']['editableSchemaFieldInfo']
# Column types
column_types = response_dict['responses'][urn]['aspects']['schemaMetadata']['value']['fields']


In [35]:
column_types

[{'fieldPath': 'Id',
  'nullable': False,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'BIGINT()',
  'recursive': False,
  'isPartOfKey': False},
 {'fieldPath': 'FK_Asset',
  'nullable': True,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'BIGINT()',
  'recursive': False,
  'isPartOfKey': False},
 {'fieldPath': 'Year',
  'nullable': False,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'SMALLINT()',
  'recursive': False,
  'isPartOfKey': False},
 {'fieldPath': 'FK_Company',
  'nullable': True,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'BIGINT()',
  'recursive': False,
  'isPartOfKey': False},
 {'fieldPath': 'FK_OilAndGas',
  'nullable': True,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'BIGINT()',
  'recursive': False,
  'isPartOfKey': False},
 {'fieldPath': 'FK_DataQuality',
  'nullable': True,
  'type': {'type': {'__type': 'NumberType'}},
  'nativeDataType': 'BIGINT()',
  'recurs

In [36]:
# Function to correct the abbreviations

def correct_field_name(field_name):
    parts = field_name.split()  # Split the field name into parts
    corrected_parts = []
    for part in parts:
        if part == 'Low' or part == 'Mid' or part == 'Case':
            corrected_parts.append(part[0])  # Keep only the first letter of Low, Mid and Case
        else:
            corrected_parts.append(part)  # Keep the other parts
    corrected_name = ''.join(corrected_parts)  # Join the parts without spaces
    return corrected_name

In [37]:
# Get existing column names from DataHub
existing_column_names = {item['fieldPath'] for item in column_types}

In [38]:
# Iterate over column_desc and correct the `fieldPath` values if they do not exist in column_types
for field in column_desc:
    if field['fieldPath'] not in existing_column_names:
        corrected_field_path = correct_field_name(field['fieldPath'])
        # Update fieldPath in column_desc with the corrected name
        field['fieldPath'] = corrected_field_path

In [39]:
column_desc

[{'fieldPath': 'Id', 'description': 'all tables have Id as Primary Key'},
 {'fieldPath': 'FK_Asset',
  'description': 'Links to FK_Asset from other tables'},
 {'fieldPath': 'Year', 'description': 'Year'},
 {'fieldPath': 'FK_Company',
  'description': 'Links to FK_Company from other tables'},
 {'fieldPath': 'FK_OilAndGas',
  'description': 'Links to FK_OilAndGas from other tables'},
 {'fieldPath': 'FK_DataQuality',
  'description': 'Links to FK_DataQuality from other tables'},
 {'fieldPath': 'FK_OtherParameter',
  'description': 'Links to FK_OtherParameter from other tables'},
 {'fieldPath': 'FK_HistoricCompany',
  'description': 'Links to FK_HistoricCompany from other tables'},
 {'fieldPath': 'FK_LifeCycleTS',
  'description': 'Links to FK_LifeCycleTS from other tables'},
 {'fieldPath': 'FK_LifeCycle',
  'description': 'Links to FK_LifeCycle from other tables'},
 {'fieldPath': 'FK_OilAndGasType',
  'description': 'Links to FK_EOilAndGasTypee from other tables'},
 {'fieldPath': 'Product

In [40]:
# Call the metadata emitter
metadata_emitter = CustomDescriptionsDC(tables_dict={}, env=env, gms_server=gms_server, token=token)

# Create dictionary based on the list column_desc
column_description_dict = {item['fieldPath']: item['description'] for item in column_desc}

# Emit the metadata
metadata_emitter.column_desc_emitter(gms_server=gms_server, token=token, urn=urn, column_dict=column_description_dict)

2024-02-15 15:08:38,989 - INFO - CustomDescriptionsDC initialized for environment: DEV
2024-02-15 15:08:38,990 - INFO - Emitting column descriptions for URN: urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Production,DEV)
2024-02-15 15:08:39,017 - INFO - Successfully emitted column description metadata.


# Case 3 - Abbreviations vol. 2

Another abbreviation is for the case of the Prices tables. Here we have 'OilLLLC' in MSSQL and '	
Brent Oil price Low Low Low Case' in the XLSX files. So we need to drop the words Brent and price and then apply the function as above.

The other case for this table is that 'GasLLLC' stands for 'Henry Hub Gas price Low Low Low Case'. So we need to drop Henry Hub and price.

In [41]:
column_desc_tables['Prices']

Unnamed: 0,column_name,description
0,Id,Links to FK_Prices in other tables
1,Brent oil price,The Brent oil price is applied for all assest ...
2,East Asia LNG price,East Asian LNG: this price is a combination of...
3,Europe Continental gas price,European continental: historic (prior to 2017)...
4,Henry Hub Gas price Forward Case,Gas price Forward Case is the market gas price...
5,Henry Hub Gas price High Case,Gas price High Case (120 USD/boe) is the marke...
6,Henry Hub Gas price Low Case,Gas price High Case (50 USD/boe) is the market...
7,Henry Hub Gas price Mid Case,Gas Price Mid Case (100 USD/boe) is the refere...
8,UK gas price,United Kingdom: historical gas prices correspo...
9,Henry Hub gas price,"Henry Hub gas price, historical, and future no..."


In [42]:
# Set the environment variables we need

# LOCAL
base_url = config['DATAHUB']['base_url'] # Location of DataHub
gms_server = config['DATAHUB']['gms_server']
token = config['DATAHUB']['token']
env = 'DEV'

# # DEPLOYMENT
# base_url = 'http://10.10.55.28:9002'
# gms_server = 'http://10.10.55.28:8080'
# token = "TOKEN"
# env = 'DEV'

urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Prices,DEV)'

In [43]:
# Start by getting the entities as before

import requests
from urllib.parse import quote

# Endpoint for GET request
endpoint = f'/openapi/entities/v1/latest?urns={quote(urn)}'

url = f"{gms_server}{endpoint}"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.get(
    url=url,
    headers=headers
)

response_dict = response.json()

# Column descriptions
column_desc = response_dict['responses'][urn]['aspects']['editableSchemaMetadata']['value']['editableSchemaFieldInfo']
# Column types
column_types = response_dict['responses'][urn]['aspects']['schemaMetadata']['value']['fields']


In [44]:
# Get existing column names from DataHub
existing_column_names = {item['fieldPath'] for item in column_types}

In [45]:
existing_column_names

{'Brent Oil price Forward Case',
 'Brent Oil price High Case',
 'Brent Oil price Low Case',
 'Brent Oil price Low Low Case',
 'Brent Oil price Mid Case',
 'Brent Oil price Mid Low Case',
 'Brent oil price',
 'CPI',
 'East Asia LNG price',
 'Europe Continental gas price',
 'GasLLLC',
 'GasMLLC',
 'GasMMLC',
 'GasMMLLC',
 'Henry Hub Gas price Forward Case',
 'Henry Hub Gas price High Case',
 'Henry Hub Gas price Low Case',
 'Henry Hub Gas price Mid Case',
 'Henry Hub gas price',
 'Id',
 'OilLLLC',
 'OilMLLC',
 'OilMMLC',
 'OilMMLLC',
 'UK gas price',
 'WTI Cushing oil price',
 'Year'}

In [46]:
# Function to correct the abbreviations

# Define the list of allowed keywords
allowed_keywords = ['Oil', 'Gas', 'Low', 'Mid', 'Case']

def correct_field_name(field_name):
    parts = field_name.split()  # Split the field name into parts
     # Keep only the parts that are in the list of allowed keywords
    filtered_parts = [part for part in parts if part in allowed_keywords]

    corrected_parts = []
    for part in filtered_parts:
        if part == 'Low' or part == 'Mid' or part == 'Case':
            corrected_parts.append(part[0])  # Keep only the first letter of Low, Mid and Case
        else:
            corrected_parts.append(part)  # Keep the other parts
    corrected_name = ''.join(corrected_parts)  # Join the parts without spaces
    return corrected_name

In [47]:
# Iterate over column_desc and correct the `fieldPath` values if they do not exist in column_types
for field in column_desc:
    if field['fieldPath'] not in existing_column_names:
        corrected_field_path = correct_field_name(field['fieldPath'])
        # Update fieldPath in column_desc with the corrected name
        field['fieldPath'] = corrected_field_path

In [48]:
column_desc

[{'fieldPath': 'Id', 'description': 'Links to FK_Prices in other tables'},
 {'fieldPath': 'Brent oil price',
  'description': 'The Brent oil price is applied for all assest outside North America. At asset level price discounts are made for different oil qualities (API, sulphur, total acid), and geographies. The oil price determines the economy, and when production is cut-off.'},
 {'fieldPath': 'East Asia LNG price',
  'description': 'East Asian LNG: this price is a combination of both spot and long-term contracts. Historical prices are based on company reported average realized prices. For the future, Rystad Energy assesses East Asian spot LNG using in-house forecasts of global LNG supply and demand. We calculate long term oil-linked contract prices using our Brent crude price forecasts. '},
 {'fieldPath': 'Europe Continental gas price',
  'description': 'European continental: historic (prior to 2017) gas prices for the European continental market correspond to German border prices (av

In [49]:
# Call the metadata emitter
metadata_emitter = CustomDescriptionsDC(tables_dict={}, env=env, gms_server=gms_server, token=token)

# Create dictionary based on the list column_desc
column_description_dict = {item['fieldPath']: item['description'] for item in column_desc}

# Emit the metadata
metadata_emitter.column_desc_emitter(gms_server=gms_server, token=token, urn=urn, column_dict=column_description_dict)

2024-02-15 15:09:21,672 - INFO - CustomDescriptionsDC initialized for environment: DEV
2024-02-15 15:09:21,674 - INFO - Emitting column descriptions for URN: urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Prices,DEV)
2024-02-15 15:09:21,708 - INFO - Successfully emitted column description metadata.


## Case 4 - Abbreviations vol. 3

Finally, we have the Resources table, where we see the following abbreviations:

| MSSQL | XLSX |
|-------|------|
| 1P Reserves | 1P Reserves Forward Case |
| 1P Reserves HC | 1P Reserves High Case |
| 1P Reserves LC | 1P Reserves Low Case |
| 1P Reserves MC | 1P Reserves Mid Case |

... and so on. The same repeats for 2P Reserves, Original Resources and Remaining Resources.

Let's fix it :)

In [50]:
# Set the environment variables we need

# LOCAL
base_url = config['DATAHUB']['base_url'] # Location of DataHub
gms_server = config['DATAHUB']['gms_server']
token = config['DATAHUB']['token']
env = 'DEV'

# # DEPLOYMENT
# base_url = 'http://10.10.55.28:9002'
# gms_server = 'http://10.10.55.28:8080'
# token = "TOKEN"
# env = 'DEV'

urn = 'urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Resources,DEV)'

In [52]:
# Start by getting the entities as before

import requests
from urllib.parse import quote

# Endpoint for GET request
endpoint = f'/openapi/entities/v1/latest?urns={quote(urn)}'

url = f"{gms_server}{endpoint}"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.get(
    url=url,
    headers=headers
)

response_dict = response.json()

# Column descriptions
column_desc = response_dict['responses'][urn]['aspects']['editableSchemaMetadata']['value']['editableSchemaFieldInfo']
# Column types
column_types = response_dict['responses'][urn]['aspects']['schemaMetadata']['value']['fields']


In [53]:
# Get existing column names from DataHub
existing_column_names = {item['fieldPath'] for item in column_types}

In [54]:
def correct_reserves_field_name_v2(field_name):
    # Mapping for the specific keywords to their abbreviations
    keyword_mappings = {
        'Forward': 'F',
        'High': 'H',
        'Low': 'L',
        'Mid': 'M',
        'Case': 'C'  # Change 'Case' to 'C' instead of removing it
    }
    
    # Prefixes to check for in field names
    prefixes = ['1P Reserves', '2P Reserves', 'Original Resources', 'Remaining Resources']
    
    # Find which prefix the field name starts with, if any
    prefix = next((p for p in prefixes if field_name.startswith(p)), None)
    
    if prefix:
        # Remove the prefix from the field name to work with the rest
        suffix = field_name[len(prefix):].strip()
        
        # Split the rest of the field name into parts and apply the mapping
        parts = suffix.split()
        corrected_parts = [keyword_mappings.get(part, '') for part in parts if part in keyword_mappings]
        
        # Rejoin the corrected parts with the prefix, adding a space before the abbreviations if there are any
        corrected_name = prefix + (' ' + ''.join(corrected_parts) if corrected_parts else '')
    else:
        # If the field name doesn't start with any known prefix, return it unchanged
        corrected_name = field_name
    
    return corrected_name

In [55]:
# Iterate over column_desc and correct the `fieldPath` values if they do not exist in column_types
for field in column_desc:
    if field['fieldPath'] not in existing_column_names:
        corrected_field_path = correct_reserves_field_name_v2(field['fieldPath'])
        # Update fieldPath in column_desc with the corrected name
        field['fieldPath'] = corrected_field_path

In [56]:
column_desc

[{'fieldPath': 'Id', 'description': 'all tables have Id as Primary Key'},
 {'fieldPath': 'FK_OilAndGas',
  'description': 'Links to FK_OilAndGas from other tables'},
 {'fieldPath': 'FK_Asset',
  'description': 'Links to FK_Asset from other tables'},
 {'fieldPath': 'FK_OtherParameter',
  'description': 'Links to FK_OtherParameter from other tables'},
 {'fieldPath': 'FK_LifeCycleTS',
  'description': 'Links to FK_LifeCycleTS from other tables'},
 {'fieldPath': 'FK_LifeCycle',
  'description': 'Links to FK_LifeCycle from other tables'},
 {'fieldPath': 'FK_Company',
  'description': 'Links to FK_Company from other tables'},
 {'fieldPath': 'FK_HistoricCompany',
  'description': 'Links to FK_HistoricCompany from other tables'},
 {'fieldPath': 'FK_Reserves',
  'description': 'Links to FK_reserves from other tables'},
 {'fieldPath': 'FK_OilAndGasType',
  'description': 'Links to FK_EOilAndGasTypee from other tables'},
 {'fieldPath': '1P Reserves',
  'description': '1P Reserve is the proven res

In [57]:
# Call the metadata emitter
metadata_emitter = CustomDescriptionsDC(tables_dict={}, env=env, gms_server=gms_server, token=token)

# Create dictionary based on the list column_desc
column_description_dict = {item['fieldPath']: item['description'] for item in column_desc}

# Emit the metadata
metadata_emitter.column_desc_emitter(gms_server=gms_server, token=token, urn=urn, column_dict=column_description_dict)

2024-02-15 15:09:50,353 - INFO - CustomDescriptionsDC initialized for environment: DEV
2024-02-15 15:09:50,354 - INFO - Emitting column descriptions for URN: urn:li:dataset:(urn:li:dataPlatform:mssql,ekofisk.RECubeDataRelease.dbo.vDataFeed_UCube_Resources,DEV)
2024-02-15 15:09:50,389 - INFO - Successfully emitted column description metadata.
