# 05.2 - Staging zone (Bronze)

## 1. Setting up bronze schema

To be able to create the Iceberg tables we need to create the schema where the we are going to store the bronze tables. In this case we are going to use a full load approach, therefore we need to drop the existing schema and tables, before creating the new one. We do this with the `clean_schema` method in the `utils/iceberg.py` file.


In [None]:
def clean_schema(spark, schema):
    if spark.catalog.databaseExists(schema):

        for t in spark.catalog.listTables(schema):
            spark.sql(f"DROP TABLE IF EXISTS {schema}.{t.name}")

        spark.sql(f"DROP DATABASE {schema}")

    spark.sql(f"CREATE DATABASE {schema}")

## 2. Writing tables to Iceberg 

To write our Iceberg tables we can use an IOManager as we did for the landing zone

In [None]:
class IcebergIOManager(dg.ConfigurableIOManager):
    """This IOManager will take a pyspark dataframe and store it in an Iceberg table 

    Downstream ops can either load this dataframe into a spark session or simply retrieve a path
    to where the data is stored.
    """

    pyspark: dg.ResourceDependency[PySparkResource]
    catalog: str
    

    def handle_output(self, context: dg.OutputContext, obj: DataFrame):
        if isinstance(obj, DataFrame):
            row_count = obj.count()
            obj.writeTo(f"{context.asset_key.parts[0]}.{context.asset_key.parts[-1]}".lower()).create()
        else:
            raise Exception(f"Outputs of type {type(obj)} not supported.")

        context.add_output_metadata({"row_count": row_count})


    def load_input(self, context) -> DataFrame:
        if context.dagster_type.typing_type == DataFrame:
            # return pyspark dataframe
            self.pyspark.spark_session.read.format("iceberg").load(f"iceberg.{context.asset_key.parts[0]}.{context.asset_key.parts[-1]}".lower())

        raise Exception(
            f"Inputs of type {context.dagster_type} not supported. Please specify a valid type "
            "for this input either on the argument of the @asset-decorated function."
        )


## 3. Defining the table assets

Having cleaned the schema and being able to write the Iceberg tables we need to define the table assets for the bronze schema.
Define your bronze assets in `assets/bronze` module. See the `assets/bronze/aw_core.py` file for a template for completing the assets                                                                                          
                                                                                                       

In [None]:
# assets/bronze/aw_core.py

def get_bronze_aw_core_table_asset(table: str):
   # define the table asset here
   raise NotImplementedError()

def get_bronze_aw_core_assets():
    return [get_bronze_aw_core_table_asset(table) for table in aw_core_tables]


ASSETS = get_bronze_aw_core_assets()

## 4. Registering the asset job

After defining the table assets we need to register them in the `definitions.py` file and the job to materialize them

In [None]:
# definitions.py

from .assets import bronze

@dg.definitions
def defs():
    # ...
    bronze_assets = dg.load_assets_from_package_module(bronze)

    # ...
    
    return dg.Definitions(
        assets=[*sample_assets, *landing_assets, *bronze_assets],
        #...
    )

In [3]:
# jobs.py

#...

bronze_job =  dg.define_asset_job(
    "bronze_assets_job",
    selection=dg.AssetSelection.groups(ASSET_GROUP_BRONZE),
    executor_def=dg.multiprocess_executor.configured({"max_concurrent": 1}),
)

#...

jobs = [sample_job, landing_job, bronze_job]