<center><a href="https://ilum.cloud"><img src="../logo.svg" alt="ILUM Logo"></a></center>

<center><h1 style="padding-left: 32px;">Raw data to Bronze</h1></center>
<center>Welcome to the Ilum Interactive Capabilities Tutorial! In this section you can load the first batch of data into the bronze layer. Let's dive in!</center>
</br>

### The Bronze Layer

The **Bronze Layer** is the foundational level of the Medallion architecture, designed to store raw, unprocessed data collected from diverse sources such as ERP systems, CRM platforms, analytical databases, and more.

#### **Typical Data in the Bronze Layer**
The data stored in the Bronze Layer often includes:
- **Transactional Data**: Sales, purchases, production records, etc.
- **Demographic Data**: Customer, employee, and supplier information.
- **Financial Data**: Accounts, balance sheets, and financial statements.
- **Operational Data**: Machine performance metrics and production process details.

This data is typically stored in formats such as CSV, JSON, or XML and can reside either in local memory or in the cloud.

#### **Purpose of the Bronze Layer**
The Bronze Layer serves several critical purposes:
- **Raw Data Access**: Acts as a repository for raw, unprocessed data.
- **Landing Zone**: Serves as the initial storage location for data from various sources.
- **Foundation for Further Processing**: Forms the basis for the Silver Layer in the Medallion architecture.

#### **Applications of Bronze Layer Data**
Data from the Bronze Layer can be leveraged for:
- Analyzing trends and customer behavior.
- Identifying opportunities to enhance efficiency.
- Supporting informed business decision-making.

#### **Summary**
The Bronze Layer is a vital component of the Medallion architecture. By providing access to raw data, it enables subsequent layers to process, analyze, and derive valuable insights, ultimately supporting data-driven decision-making across organizations.


As a continuation, let's now walk through an example of loading data into the **Bronze Layer**.

---

### Example: Loading Data into the Bronze Layer

In this example, we will demonstrate how to load raw data into the Bronze Layer of the Medallion architecture. Let’s assume we have a CSV file containing sales transaction data.


#### **Step 1: Set Up the Environment**

First, we'll need to load the spark magic extension. You can do this by running the following command:

In [None]:
%load_ext sparkmagic.magics

Ilum's Bundled Jupyter is ready to work out of the box and has a predefined endpoint address, which points to ```livy-proxy```. 

Use **%manage_spark** to create new session. 

Choose between Scala or Python, adjust Spark settings if necessary, and then click the `Create Session` button. As simple as that. 

The following example is written in `Python`.

In [None]:
%manage_spark

Before we start processing, we need to import the necessary libraries.

In [None]:
%%spark
   
    import pandas as pd

**Creating a Dedicated Database for the Use Case**

A good practice in data engineering is to separate data within dedicated databases for specific use cases. This approach helps maintain data organization and makes it easier to manage, query, and scale.

For this use case, we will create a database named `example_bronze`. This will ensure that all data related to this use case is stored in a structured and isolated manner.

To create the database, we use the following command:

In [None]:
%%spark

    spark.sql("CREATE DATABASE example_bronze")

#### **Step 2: Load Raw Data**
The second step is to push the data into the bronze layer. This is usually done automatically from many different sources, but for this notebook the test data will be loaded manually.

Below, each of the three sample data packages is downloaded from a remote repository without any processing.

**Animals:**

In [None]:
%%spark

    animals_url = 'https://raw.githubusercontent.com/ilum-cloud/ilum-python-examples/main/animals.csv'
    
    animals_df = spark.createDataFrame(pd.read_csv(animals_url))
    animals_df.printSchema()
    animals_df.show(5)

**Owners:**

In [None]:
%%spark

    owners_url = 'https://raw.githubusercontent.com/ilum-cloud/ilum-python-examples/main/owners.csv'

    owners_df = spark.createDataFrame(pd.read_csv(owners_url))
    owners_df.printSchema()
    owners_df.show(5)

**Species:**

In [None]:
%%spark
    
    species_url = 'https://raw.githubusercontent.com/ilum-cloud/ilum-python-examples/main/species.csv'

    species_df = spark.createDataFrame(pd.read_csv(species_url))
    species_df.printSchema()
    species_df.show(5)

#### **Step 3: Save Data to the Bronze Layer**
In this step, we will save the raw data to a dedicated Bronze Layer location. Since Ilum provides integrated S3 storage, no credentials are required to access the storage.

In [None]:
%%spark

    animals_df.write.format("csv").saveAsTable("example_bronze.animals")
    owners_df.write.format("csv").saveAsTable("example_bronze.owners")
    species_df.write.format("csv").saveAsTable("example_bronze.species")

#### **Summary**
In this example:
 - **We loaded raw data** from a CSV file containing sales transactions.
 - **We saved the data in CSV format** to a dedicated Bronze Layer location using Ilum's integrated S3 storage.

Storing raw data in the Bronze Layer this way ensures a solid foundation for further processing and analysis in the higher layers of the Medallion architecture.

### Cleaning up

Now that you’re done with your work, you should clean them up to free up resources when they’re no longer in use. 
Simply click on the Delete buttons!

![Ilum session clean](../../images/clean_ilum_jupyter_session.png)

In [None]:
%manage_spark

#### [Click here to proceed to the "Bronze to silver" section.](2_Bronze_to_silver.ipynb)

