### Medallion Architecture — Spark SQL Version (Colab)

This notebook demonstrates the **Medallion Architecture** using **Spark SQL** in Google Colab.
All datasets are expected under **`/content`**.

- No notebook magics are used.
- SQL is executed with `spark.sql(""" ... """)`.
- Each code cell has a preceding Markdown explanation.
- Logic mirrors the working PySpark DataFrame notebooks, expressed in **readable SQL**.

#### Environment Setup
Create or reuse a Spark session. Keep this minimal to avoid distractions for learners.

In [None]:

# Minimal Spark setup (Colab): assumes Java/Spark already installed per your environment bootstrap.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Medallion-SQL").getOrCreate()

# Helper to show current Spark version and confirm session
print("Spark version:", spark.version)

# Set a base for data files. Adjust if needed.
DATA_BASE = "/content"


#### Create a dedicated database for the project
We keep Bronze/Silver/Gold separated for clarity.

In [None]:
spark.sql("""
CREATE DATABASE IF NOT EXISTS bronze;
CREATE DATABASE IF NOT EXISTS silver;
CREATE DATABASE IF NOT EXISTS gold;
""")

#### Register raw CSVs as external Bronze tables (schema-on-read)
We keep original data untouched and reproducible. Headers are respected and types inferred.

In [None]:
spark.sql("""
DROP TABLE IF EXISTS bronze.cust_info_csv;
CREATE TABLE bronze.cust_info_csv
USING csv
OPTIONS (
  header 'true',
  inferSchema 'true',
  path '/content/cust_info.csv'
);
""")

In [None]:
spark.sql("""
DROP TABLE IF EXISTS bronze.prd_info_csv;
CREATE TABLE bronze.prd_info_csv
USING csv
OPTIONS (
  header 'true',
  inferSchema 'true',
  path '/content/prd_info.csv'
);
""")

In [None]:
spark.sql("""
DROP TABLE IF EXISTS bronze.sales_details_csv;
CREATE TABLE bronze.sales_details_csv
USING csv
OPTIONS (
  header 'true',
  inferSchema 'true',
  path '/content/sales_details.csv'
);
""")

In [None]:
spark.sql("""

DROP TABLE IF EXISTS bronze.px_cat_g1v2_csv;
CREATE TABLE bronze.px_cat_g1v2_csv
USING csv
OPTIONS (header 'true', inferSchema 'true', path '/content/PX_CAT_G1V2.csv');

DROP TABLE IF EXISTS bronze.cust_az12_csv;
CREATE TABLE bronze.cust_az12_csv
USING csv
OPTIONS (header 'true', inferSchema 'true', path '/content/CUST_AZ12.csv');

DROP TABLE IF EXISTS bronze.loc_a101_csv;
CREATE TABLE bronze.loc_a101_csv
USING csv
OPTIONS (header 'true', inferSchema 'true', path '/content/LOC_A101.csv');
""")

#### Quick sanity checks on Bronze tables
We avoid any transformations here—only basic previews.

In [None]:
spark.sql("""
SELECT * FROM bronze.cust_info_csv LIMIT 10;
""")

In [None]:
spark.sql("""
SELECT * FROM bronze.prd_info_csv LIMIT 10;
""")

In [None]:
spark.sql("""
SELECT * FROM bronze.sales_details_csv LIMIT 10;
""")