                                          `Bronze Layer`                                          
Let's start working on first step in mediallion structure which is Bronze layer and steps involves in this layer setup

1.Imports + Setup Logging

2.Load .env & MongoDB URI

3.MongoDB connection

4.CSV load

5.Insert into MongoDB

6.Preview records

In [8]:
# 🔹 Purpose: Load raw CSV and insert into MongoDB Bronze Layer

import pandas as pd
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import logging

In [9]:
# Step 1: Setup & Logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

load_dotenv()
MONGO_URI = os.getenv("MONGO_URI")

In [10]:
# Step 2: Connect to MongoDB
try:
    client = MongoClient(MONGO_URI)
    db = client["healthcare"]
    bronze_collection = db["heart_disease_bronze"]
    logging.info(" Connected to MongoDB Atlas")
except Exception as e:
    logging.error(" Failed to connect to MongoDB", exc_info=True)
    raise e

2025-06-10 02:48:57,283 - INFO -  Connected to MongoDB Atlas


In [11]:
# Ensure the collection is empty before inserting new data

# Step 3: Load Raw Data

csv_path = "../data/heart_disease.csv"

try:
    df_raw = pd.read_csv(csv_path)
    logging.info(f" Loaded data with shape: {df_raw.shape}")
except FileNotFoundError:
    logging.error(f" CSV file not found at path: {csv_path}")
    raise
except Exception as e:
    logging.error(" Error reading CSV", exc_info=True)
    raise

2025-06-10 02:48:57,301 - INFO -  Loaded data with shape: (920, 16)


In [12]:
# Step 4: Insert into MongoDB

try:
    records = df_raw.to_dict(orient="records")
    bronze_collection.delete_many({})  # Optional: Clear previous data
    bronze_collection.insert_many(records)
    logging.info(f"Inserted {len(records)} records into 'heart_disease_bronze' collection")
except Exception as e:
    logging.error("Failed to insert records into MongoDB", exc_info=True)
    raise

2025-06-10 02:49:00,889 - INFO - Inserted 920 records into 'heart_disease_bronze' collection


In [13]:
# Step 5: Preview

df_raw.head()

Unnamed: 0,id,age,sex,dataset,cp,trestbps,chol,fbs,restecg,thalch,exang,oldpeak,slope,ca,thal,num
0,1,63,Male,Cleveland,typical angina,145.0,233.0,True,lv hypertrophy,150.0,False,2.3,downsloping,0.0,fixed defect,0
1,2,67,Male,Cleveland,asymptomatic,160.0,286.0,False,lv hypertrophy,108.0,True,1.5,flat,3.0,normal,2
2,3,67,Male,Cleveland,asymptomatic,120.0,229.0,False,lv hypertrophy,129.0,True,2.6,flat,2.0,reversable defect,1
3,4,37,Male,Cleveland,non-anginal,130.0,250.0,False,normal,187.0,False,3.5,downsloping,0.0,normal,0
4,5,41,Female,Cleveland,atypical angina,130.0,204.0,False,lv hypertrophy,172.0,False,1.4,upsloping,0.0,normal,0


In [14]:
print(f" Inserted {len(records)} documents into heart_disease_bronze.")

 Inserted 920 documents into heart_disease_bronze.


Bronze Layer
- Raw heart disease dataset loaded from CSV (`heart_disease.csv`)

- Connected to MongoDB Atlas using `pymongo`

- Stored as JSON documents in `heart_disease_bronze` collection under `healthcare` database

- Code includes logging, error handling, and connection validation
