# 🥉 Bronze Layer - Data Ingestion (Raw Layer)

The **Bronze Layer** is the raw data ingestion layer in the Medallion Architecture. In this layer, data is ingested as-is from the source CSV files into Delta tables with minimal or no transformation. Below is the implementation for the Bronze Layer using Databricks and Apache Spark.

---

## 📁 Step 1: Create the Bronze Database

In [0]:
%sql
CREATE DATABASE IF NOT EXISTS bronze;

## 📥 Step 2: Ingest Raw CSV Files into Delta Tables

Each dataset is read from DBFS (Databricks File System) and written to Delta Lake format into the `bronze` schema.

### 🏃‍♂️ Athletes

In [0]:
df_athletes = spark.read\
                    .format("csv")\
                    .option("inferschema","true")\
                    .option("header","true")\
                    .load("dbfs:/FileStore/tokyo_project_data1/Athletes.csv")

df_athletes.write.format("delta").mode("overwrite").saveAsTable("bronze.athletes")

### 🧢 Coaches

In [0]:
df_coaches = spark.read\
                    .format("csv")\
                    .option("inferschema","true")\
                    .option("header","true")\
                    .load("dbfs:/FileStore/tokyo_project_data1/Coaches.csv")

df_coaches.write.format("delta").mode("overwrite").saveAsTable("bronze.coaches")

### 🚻 Entries Gender

In [0]:
df_entriesGender = spark.read\
                            .format("csv")\
                            .option("inferschema","true")\
                            .option("header","true")\
                            .load("dbfs:/FileStore/tokyo_project_data1/EntriesGender.csv")

df_entriesGender.write.format("delta").mode("overwrite").saveAsTable("bronze.entriesGender")

### 🏅 Medals

In [0]:
from pyspark.sql.functions import col

df_medals = spark.read\
                    .format("csv")\
                    .option("inferschema","true")\
                    .option("header","true")\
                    .load("dbfs:/FileStore/tokyo_project_data1/Medals.csv")

df_medals = df_medals.toDF(*[c.strip().lower().replace(" ", "_").replace("(", "").replace(")", "") for c in df_medals.columns])

df_medals.write.format("delta").mode("overwrite").saveAsTable("bronze.medals")

### 👥 Teams

In [0]:
df_teams = spark.read\
                    .format("csv")\
                    .option("inferschema","true")\
                    .option("header","true")\
                    .load("dbfs:/FileStore/tokyo_project_data1/Teams.csv")

df_teams.write.format("delta").mode("overwrite").saveAsTable("bronze.teams")

# ✅ Output

All ingested tables are saved as Delta tables under the `bronze` schema. These raw tables will be used as input for further processing in the **Silver Layer** (Cleaned & Enriched Data).