<center><a href="https://ilum.cloud"><img src="../logo.svg" alt="ILUM Logo"></a></center>

<center><h1 style="padding-left: 32px;">Silver to Gold</h1></center>
<center>Welcome to the Ilum Interactive Capabilities Tutorial! In this section, you can transform the data from the silver layer to meet the assumptions of the gold layer. Let's dive in!</center>
</br>

### The Gold Layer

The **Gold Layer** is the topmost layer of the Medallion architecture. It stores data that has been cleansed, transformed, and enriched, ready for analysis and consumption by business users.

#### **Typical Data in the Gold Layer**
The data stored in the Gold Layer often includes:
- **De-duplicated and Reconciled Data**: Ensures accuracy and consistency.
- **Complete and Accurate Data**: Contains all necessary fields for analysis.
- **Business-Aligned Data**: Meets specific business requirements and objectives.
- **Enriched Data**: Augmented with additional information for deeper insights.

This data is typically stored in relational databases, data warehouses, or other data marts, either locally or in the cloud.

#### **Purpose of the Gold Layer**
The Gold Layer serves several essential purposes:
- **Single Source of Truth**: Provides a reliable and consistent view of data for business users.
- **Facilitates Analysis and Reporting**: Ensures data is ready for advanced analytics and reporting.
- **Supports Decision-Making**: Powers strategic and operational decisions across the organization.

#### **Applications of Gold Layer Data**
Data from the Gold Layer can be used for:
- Trend analysis.
- Customer behavior analysis.
- Financial and operational analysis.
- Risk management and forecasting.


#### **Summary**
The **Gold Layer** is the final step in the Medallion architecture, where data is prepared for use in real-world applications. Here, the data is fully cleaned, organized, and optimized to answer specific business questions or power reports, dashboards, and machine learning models. Building on the structured data from the Silver Layer, the Gold Layer ensures that data is easy to understand, reliable, and ready to drive decisions.

As a next step, let’s walk through an example of transforming data into the Gold Layer.

---

### Example: Creating the Gold Layer from the Silver Layer

In this example, we will transform data from the Silver Layer into the Gold Layer by applying business-specific transformations, aggregations, and enrichment.

#### **Step 1: Set Up the Environment**

First, we'll need to load the spark magic extension. You can do this by running the following command:

In [None]:
%load_ext sparkmagic.magics

Ilum's Bundled Jupyter is ready to work out of the box and has a predefined endpoint address, which points to ```livy-proxy```. 

Use **%manage_spark** to create new session. 

Choose between Scala or Python, adjust Spark settings if necessary, and then click the `Create Session` button. As simple as that. 

The following example is written in `Python`.

In [None]:
%manage_spark

Before we start processing, we need to import the necessary libraries.

In [None]:
%%spark

    from pyspark.sql.functions import sort_array, collect_list, concat_ws, count

**Creating a Dedicated Database for the Use Case**

A good practice in data engineering is to separate data within dedicated databases for specific use cases. This approach helps maintain data organization and makes it easier to manage, query, and scale.

For this use case, we will create a database named `example_gold`. This will ensure that all data related to this use case is stored in a structured and isolated manner.

To create the database, we use the following command:

In [None]:
%%spark

    spark.sql("CREATE DATABASE example_gold")

#### **Step 2: Load Data from the Silver Layer**
The first stage of processing in this layer is to read data from the silver layer.

In [None]:
%%spark

    animals_df = spark.read.table("example_silver.animals")
    owners_silver_df = spark.read.table("example_silver.owners")

#### **Step 3: Transform Data for Business Needs**
One of the business requirements is to count the number of animals per owner and provide their names in one column.

In [None]:
%%spark

    animals_count = (
        animals_df.groupby("owner_id")
        .agg(
            concat_ws(", ", sort_array(collect_list("animal_name"))).alias("animals_names"),
            count("animal_name").alias("animals_qty"),
        )
    )
    
    animals_count.sort("owner_id").show(5)

Then let's combine it into a result table.

In [None]:
%%spark

    owners_df = (
        owners_silver_df.join(animals_count, animals_count.owner_id == owners_silver_df.owner_id, "right")
        .select(
            owners_silver_df.owner_id,
            owners_silver_df.first_name,
            owners_silver_df.last_name,
            animals_count.animals_names,
            animals_count.animals_qty,
            owners_silver_df.mobile,
            owners_silver_df.email,
        )
        .sort("owner_id")
    )

    owners_df.show(5)

#### **Step 4: Save Data to the Gold Layer**
Save the transformed and enriched data to the Gold Layer in Delta format. \
The use of the delta format in this case allows access to the history of changes and optimizes the amount of memory consumed.

In [None]:
%%spark

    animals_df.write.format("delta").saveAsTable("example_gold.animals")
    owners_df.write.format("delta").saveAsTable("example_gold.owners")

#### **Summary**
In this example:

 - **We loaded data** from the Silver Layer.
 - **We transformed and enriched the data** by applying business-specific aggregations and calculations.
 - **We saved the final data** to the Gold Layer in Delta format, making it ready for business consumption. 

By structuring data in the Gold Layer, businesses can leverage it for trend analysis, customer behavior insights, financial forecasting, and more, enabling smarter, data-driven decisions.

### Cleaning up

Now that you’re done with your work, you should clean them up to free up resources when they’re no longer in use. 
Simply click on the Delete buttons!

![Ilum session clean](../../images/clean_ilum_jupyter_session.png)

In [None]:
%manage_spark