# 📊 02_Manage_Config_Store

This notebook allows you to **safely insert or update (upsert)** configuration key-value pairs  
into the `aerodemo_config_store` Delta table without dropping or recreating it.

✅ Use this notebook to:
- Add new configuration keys.
- Update existing configuration values.
- Manage environment-specific configs (`dev`, `staging`, `prod`).




## 🔧 Upsert Single Config Key-Value

The cell below will:
- Define the target environment (`env`).
- Provide the `config_key` and new `config_value`.
- Perform a Delta Lake `MERGE` (safe upsert) into the config store.

Make sure to update the variables (`env`, `config_key`, `config_value`) as needed.

In [0]:
# Import Spark session
spark = spark

# Set catalog and schema
CATALOG = "arao"
SCHEMA = "aerodemo"
TABLE_NAME = f"{CATALOG}.{SCHEMA}.aerodemo_config_store"

# Target values to upsert
env = "dev"
config_key = "e2e_workflow_job_id"
config_value = "NEW_JOB_ID_999999"

print(f"✅ Upserting → env: {env}, key: {config_key}, value: {config_value}")

# Create single-row DataFrame for the upsert
from pyspark.sql import Row

new_row = [Row(env=env, config_key=config_key, config_value=config_value)]
df = spark.createDataFrame(new_row)

# Perform MERGE (upsert)`
spark.sql(f"""
MERGE INTO {TABLE_NAME} AS target
USING (SELECT '{env}' AS env, '{config_key}' AS config_key, '{config_value}' AS config_value) AS source
ON target.env = source.env AND target.config_key = source.config_key
WHEN MATCHED THEN UPDATE SET target.config_value = source.config_value
WHEN NOT MATCHED THEN INSERT (env, config_key, config_value) VALUES (source.env, source.config_key, source.config_value)
""")

print(f"✅ Successfully upserted {config_key} for environment '{env}' into {TABLE_NAME}")

> 💡 Tip: You can copy this cell and change the `env`, `config_key`, and `config_value`  
> to manage multiple configs as needed.

## 🔧 Batch Upsert Multiple Configs

This cell will:
- Take a list of `(env, config_key, config_value)` entries.
- Convert them into a DataFrame.
- Perform a Delta Lake `MERGE` to upsert all keys into the config table **in one shot**.

This avoids looping row by row and is more efficient.

In [0]:
from pyspark.sql import Row

# Full CONFIG dictionary (same as before)
CONFIG = {
    "dev": {
        "catalog": "arao",
        "schema": "aerodemo",
        "pat_token": "YOUR_DEV_PAT_TOKEN",
        "databricks_instance": "https://e2-demo-field-eng.cloud.databricks.com/",
        "e2e_workflow_job_id": "962701319088715",
        "workflow_configs": {
            "workflow_name": "AeroDemo End to End Workflow",
            "existing_cluster_id": "0527-220936-f3oreeiv",
            "dlt_pipeline_id": "a2ccd850-4b28-4f30-9a53-0fd5f5499713"
        }
    },
    "staging": {
        "catalog": "arao_staging",
        "schema": "aerodemo_staging",
        "pipeline_ids": {
            "full_pipeline": "staging-pipeline-id-here",
            "registration_pipeline": "staging-registration-pipeline-id-here"
        },
        "pat_token": "dapi-STAGING-XXXXXX",
        "databricks_instance": "https://e2-demo-field-eng.cloud.databricks.com/",
        "e2e_workflow_job_id": "staging-864722071013094",
        "workflow_configs": {
            "workflow_name": "AeroDemo_DataPipeline_Staging",
            "existing_cluster_id": "staging-cluster-id",
            "dlt_pipeline_id": "staging-dlt-pipeline-id"
        }
    },
    "prod": {
        "catalog": "arao",
        "schema": "aerodemo",
        "pipeline_ids": {
            "full_pipeline": "prod-pipeline-id-here",
            "registration_pipeline": "prod-registration-pipeline-id-here"
        },
        "pat_token": "dapi-PROD-XXXXXX",
        "databricks_instance": "https://e2-demo-field-eng.cloud.databricks.com/",
        "e2e_workflow_job_id": "864722071013094",
        "workflow_configs": {
            "workflow_name": "AeroDemo_DataPipeline_Prod",
            "existing_cluster_id": "prod-cluster-id",
            "dlt_pipeline_id": "prod-dlt-pipeline-id"
        }
    }
}

# Flatten configs into (env, key, value) triples
configs_to_upsert = []

def flatten_config(env, d, parent_key=""):
    for k, v in d.items():
        new_key = f"{parent_key}.{k}" if parent_key else k
        if isinstance(v, dict):
            flatten_config(env, v, new_key)
        else:
            configs_to_upsert.append((env, new_key, str(v)))

for env, config_dict in CONFIG.items():
    flatten_config(env, config_dict)

print(f"🔍 Prepared {len(configs_to_upsert)} flattened configs for upsert.")

# Set catalog, schema, and table name
CATALOG = "arao"
SCHEMA = "aerodemo"
TABLE_NAME = f"{CATALOG}.{SCHEMA}.aerodemo_config_store"

# ✅ Optional: Clear all configs first
CLEAR_BEFORE_INSERT = True

if CLEAR_BEFORE_INSERT:
    spark.sql(f"DELETE FROM {TABLE_NAME}")
    print(f"⚠️ Cleared all existing configs from {TABLE_NAME}")

# Convert list to DataFrame
df = spark.createDataFrame([Row(env=env, config_key=key, config_value=value) for env, key, value in configs_to_upsert])

# Perform MERGE (upsert)
temp_view = "temp_config_upserts"
df.createOrReplaceTempView(temp_view)

spark.sql(f"""
MERGE INTO {TABLE_NAME} AS target
USING {temp_view} AS source
ON target.env = source.env AND target.config_key = source.config_key
WHEN MATCHED THEN UPDATE SET target.config_value = source.config_value
WHEN NOT MATCHED THEN INSERT (env, config_key, config_value) VALUES (source.env, source.config_key, source.config_value)
""")

print(f"✅ Successfully upserted {len(configs_to_upsert)} configs into {TABLE_NAME}")

> 💡 You can update the `configs_to_upsert` list with as many key-value pairs  
> as you want across different environments.