## Steps to create config and quickly debug conversion configs

* Writing sample config as a variable which then can be written to a temporary path for processing afterward.
* Create a sample dataframe with only the required columns that we need to carry out the conversion.
* Once we are satisfied we can write it back to the conversion_configs folder.

In [17]:
import tempfile

import polars as pl

from focus_converter.configs.base_config import ConversionPlan
from focus_converter.converter import FocusConverter

In [18]:
sample_converter_config = """
plan_name: Generate ServiceCategory for aws data using a map.
column: line_item_product_code
conversion_type: lookup
focus_column: ServiceCategory
conversion_args:
    reference_dataset_path: "conversion_configs/aws/mapping_files/aws_category_mapping.csv"
    source_value: product_code
    destination_value: ServiceCategory
"""

In [24]:
# Proposed conversion file name. 
# Configs follow the patters of 'D{0-9}{0-9}{0-9}_S{0-9}{0-9}{0-9}.yaml", where first set of integrers represent the dimension id and the next
# number represents the order in which this should be executed. Some dimensions may have more than one step therefore it is important to have
# a priority.

proposed_file_name = "D012_S001.yaml"

In [25]:
# sample dataframe with one row and the column that we want to test and write conversion on.

test_data_frame = pl.DataFrame(
    [
        {"a": 1, "line_item_product_code": "AWSCloudWAN"}
    ]
).lazy()

# Lazy creates a pointer in memory for it to be used as a conversion plan as apposed to eager computation. 

In [28]:
from os import path

with tempfile.TemporaryDirectory() as temp_directory:
    # Write conversion config to temporary path, this way we know it is always refreshed, and we don't have to restart the notebook.
    conversion_file_path = path.join(temp_directory, proposed_file_name)
    with open(conversion_file_path, "wb") as fd:
        fd.write(sample_converter_config.encode())

    # Now we load the config, ensuring it is valid and the conversion_args are validated.
    conversion_plan = ConversionPlan.load_yaml(
        conversion_file_path
    )

focus_converter = FocusConverter()
focus_converter.plans = {"aws": [conversion_plan]}
column_exprs = focus_converter.prepare_horizontal_conversion_plan(
    provider="aws"
)
converted_lf = focus_converter.apply_plan(lf=test_data_frame)

In [29]:
converted_lf.collect()

a,line_item_product_code,Provider,ServiceCategory
i64,str,str,str
1,"""AWSCloudWAN""","""aws""","""Networking"""
