<h1>Auto ETL Assistant</h1>

This notebook is meant as an entrypoint to use the Auto ETL Assistant. The assistant will write metadata descriptions for input tables and columns using ingested `yml` files structured in the following way:
<br>

```
tables:
        - name: foo
            description: Description of the table
            columns:
            - name: column1
                description: Description of column1
            - name: column2
                description: Description of column2
        - name: bar
            description: Description of another table
            columns:
            - name: columnA
                description: Description of columnA
```

The underlying libraries are included, however, if you would like to process a directory of `yml` metadata and apply them to the associated tables in a single `catalog.schema` defined by the `SOURCE_CATALOG` and `SOURCE_SCHEMA` configuration variables below. Please fill in the global variables in the cell below and place your metadata in the `./metadata` directory in this repository. Alternatively, you can change the path to the metadata files in the global variables using the `PATH_TO_METADATA` config variable below.

Use the `DESTINATION_TABLES` configuration variable (python list format) to direct the assistant at the desitnation (silver, etc) tables you would like to be the target of the ETL process. These must be the fully qualified (`catalog.schema.table`) names for each desitnation table.

Finally, we need a Genie space available for the ETL Assistant to use. Please create one in the Databricks UI and provide the URL in the `GENIE_SPACE_URL` configuration value below.

In [None]:
%pip install pyyaml
%pip install -U -qqqq mlflow-skinny[databricks] langgraph==0.3.4 databricks-langchain databricks-agents uv
dbutils.library.restartPython()

In [None]:
SOURCE_CATALOG = "< your catalog name here >"
SOURCE_SCHEMA = "< your schema name here>"

DESTINATION_TABLES = ["catalog_foo.schema_foo.destination_table1", "catalog_bar.schema_bar.destination_table2"]  # List of destination tables to write to

GENIE_SPACE_URL = "< your genie room id here >" # e.g., "https://genie.example.com/genie/rooms/1234567890abcdef"

# Only change the following path if you have a different directory structure
PATH_TO_METADATA = "./metadata"

In [None]:
from utils.register_sql_functions import register_describe_table_function
register_describe_table_function(catalog=SOURCE_CATALOG)

In [None]:
from utils.metadata_processor import process_metadata
process_metadata(PATH_TO_METADATA, SOURCE_CATALOG, SOURCE_SCHEMA)

In [None]:
# Run this cell to register the LLM ETL Agent
result = dbutils.notebook.run("./agents/ua-genie-agents-etl-assist", 500, {"source_catalog": SOURCE_CATALOG, 
                                                                          "source_schema": SOURCE_SCHEMA, 
                                                                          "destination_catalog": DESTINATION_CATALOG, 
                                                                          "destination_schema": DESTINATION_SCHEMA, 
                                                                          "destination_tables": DESTINATION_TABLES})