<a href="https://colab.research.google.com/drive/1d5HqYy7v3fKrU-zRBrxEW8AnAtk_gic8?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üèóÔ∏è Stage 0: Data Generation Pipeline & Quality Audit

**Project:** End-to-End Retail Data Engineering Platform
**Author:** Isabella Rond√≥n

## üìå Overview
This notebook automates the creation of the transactional database environment. Since Colab does not have a persistent database, we perform the following steps in real-time:
1.  **Install & Configure** a PostgreSQL server instance.
2.  **Define Schema:** Execute DDL commands to create the Relational Model.
3.  **Populate Data:** Use Python `Faker` to generate realistic records.
4.  **Audit:** Run SQL consistency checks to ensure Referential Integrity.

## 1. Install PostgreSQL and Python Libraries

In [None]:
# Update package lists to fix 404 errors
!sudo apt-get update > /dev/null

# Install Postgres Server
!sudo apt-get -y -q install postgresql postgresql-contrib libpq-dev > /dev/null

# Install Python Drivers and Faker (CAMBIO AQU√ç: usamos jupysql)
!pip install psycopg2-binary faker tqdm jupysql > /dev/null

print("‚úÖ Installation complete.")

## 2. Start Database Service & Configure User

In [None]:
# Start the service
!service postgresql start

# Create the database and configure the user 'postgres' with password 'postgres'
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';"
!sudo -u postgres psql -c "CREATE DATABASE walgreens_dataset;"

print("‚úÖ Database 'walgreens_dataset' is running on localhost.")

## 3. Define Database Schema (DDL)

In [None]:
import psycopg2

# Configuration to connect to the Colab Postgres instance
DB_CONFIG = {
    'dbname': 'walgreens_dataset',
    'user': 'postgres',
    'password': 'postgres', # We set this in the previous step
    'host': 'localhost',
    'port': '5432'
}

ddl_script = """
-- Main Tables
CREATE TABLE Category (category_id SERIAL PRIMARY KEY, category_name VARCHAR(255), description VARCHAR(255));
CREATE TABLE Branch (branch_id SERIAL PRIMARY KEY, country VARCHAR(100), state VARCHAR(100), city VARCHAR(100), street VARCHAR(255), phone VARCHAR(50), active BOOLEAN);
CREATE TABLE Customer (customer_id SERIAL PRIMARY KEY, first_name VARCHAR(100), last_name VARCHAR(100), country VARCHAR(100), state VARCHAR(100), city VARCHAR(100), street VARCHAR(255), is_member BOOLEAN, phone VARCHAR(50), points_available INT);
CREATE TABLE Coupon (coupon_id SERIAL PRIMARY KEY, code VARCHAR(50), type VARCHAR(50), value INT, min_purchase INT, max_usage_global INT, max_usage_customer INT, valid_from DATE, valid_until DATE, status VARCHAR(50), description VARCHAR(255));
CREATE TABLE Payment_Method (payment_method_id SERIAL PRIMARY KEY, method VARCHAR(100), active BOOLEAN);

-- Dependent Tables
CREATE TABLE Product (product_id SERIAL PRIMARY KEY, category_id INT REFERENCES Category(category_id), name VARCHAR(255), description VARCHAR(255), brand VARCHAR(100), unit_price FLOAT, stock INT, active BOOLEAN);
CREATE TABLE Employee (employee_id SERIAL PRIMARY KEY, branch_id INT REFERENCES Branch(branch_id), first_name VARCHAR(100), last_name VARCHAR(100), country VARCHAR(100), state VARCHAR(100), city VARCHAR(100), street VARCHAR(255), phone VARCHAR(50), position VARCHAR(100), active BOOLEAN);
CREATE TABLE Sale (sale_id SERIAL PRIMARY KEY, employee_id INT REFERENCES Employee(employee_id), customer_id INT REFERENCES Customer(customer_id), branch_id INT REFERENCES Branch(branch_id), date DATE, channel VARCHAR(100), points_generated INT);
CREATE TABLE Sale_Detail (detail_id SERIAL PRIMARY KEY, product_id INT REFERENCES Product(product_id), sale_id INT REFERENCES Sale(sale_id), quantity INT);
CREATE TABLE Redeemed_Coupon (redemption_id SERIAL PRIMARY KEY, sale_id INT REFERENCES Sale(sale_id), coupon_id INT REFERENCES Coupon(coupon_id), amount FLOAT);
CREATE TABLE Invoice (invoice_id SERIAL PRIMARY KEY, sale_id INT REFERENCES Sale(sale_id));
CREATE TABLE Invoice_Detail (invoice_detail_id SERIAL PRIMARY KEY, invoice_id INT REFERENCES Invoice(invoice_id), payment_method_id INT REFERENCES Payment_Method(payment_method_id), payment_date DATE, subtotal FLOAT, tax FLOAT, total FLOAT, discount FLOAT);
CREATE TABLE Points_Movement (movement_id SERIAL PRIMARY KEY, customer_id INT REFERENCES Customer(customer_id), sale_id INT REFERENCES Sale(sale_id), points INT);
"""

try:
    conn = psycopg2.connect(**DB_CONFIG)
    cursor = conn.cursor()
    cursor.execute(ddl_script)
    conn.commit()
    print("‚úÖ Tables created successfully.")
except Exception as e:
    print(f"Error: {e}")
finally:
    if conn: conn.close()

# ‚öôÔ∏è Configuration & Scaling
**Customize your scenario:**

This pipeline is fully parametric. You can adjust the dictionary below to simulate different operational scales‚Äîfrom a ***small pilot test*** to a ***high-volume stress test***.

* **Demo Mode:** Keep default values for quick execution.
* **Stress Test:** Increase `'sales'` to `100000` or more to test database performance under load.

### Define Dataset Volume

In [None]:

# Feel free to modify these integers to generate more data.
QUANTITIES = {
    'branches': 50,         # Number of physical stores
    'employees': 200,       # Total staff across all branches
    'categories': 30,       # Product categories (Fixed logic)
    'products': 1000,       # Total SKU count
    'customers': 2000,      # Registered loyalty members
    'coupons': 50,          # Active promotional codes
    'payment_methods': 5,   # Cash, Credit, etc.
    'sales': 6000           # Total transactions to generate
}

print(f"‚öôÔ∏è Configuration loaded. Will generate {QUANTITIES['sales']} transactions.")

## 4. Run Data Population Script (Faker)

In [None]:
import psycopg2
import psycopg2.extras
from faker import Faker
import random
from datetime import datetime, timedelta

# --- COLAB DB CONFIGURATION ---
DB_CONFIG = {
    'dbname': 'walgreens_dataset',
    'user': 'postgres',
    'password': 'postgres',
    'host': 'localhost',
    'port': '5432'
}

# --- REALISTIC DATA FOR WALGREENS (USA Context) ---
WALGREENS_STATES = ['FL', 'TX', 'CA', 'IL', 'NY', 'PA', 'NC', 'GA', 'OH', 'MI']
PAYMENT_METHODS_LIST = ['Visa Credit Card', 'Mastercard Credit Card', 'Debit Card', 'Walgreens Pay', 'Cash']
EMPLOYEE_POSITIONS = ['Cashier', 'Pharmacist', 'Pharmacy Technician', 'Shift Lead', 'Store Manager', 'Beauty Consultant', 'Customer Service Associate']

# 30 Categories
CATEGORIES_LIST = [
    "Prescription Drugs (Rx)", "Over-the-Counter (OTC)", "Pain Relief", "Vitamins & Supplements",
    "Digestive Health", "Allergy & Sinus", "Wound Care", "First Aid",
    "Skin Care", "Hair Care", "Oral Hygiene", "Feminine Care",
    "Deodorants", "Makeup", "Facial Care", "Fragrances", "Baby Formula & Food",
    "Diapers & Wipes", "Baby Care", "Household Cleaning", "Paper & Plastic",
    "Batteries & Bulbs", "Snacks", "Beverages", "Candy", "Frozen Food",
    "Breakfast & Cereal", "Photo & Electronics", "Contact Lenses", "Cards & Gifts"
]

# Product Templates (English)
PRODUCT_TEMPLATES = {
    "Prescription Drugs (Rx)": ["Metformin 500mg", "Lisinopril 10mg", "Atorvastatin 20mg", "Amlodipine 5mg", "Losartan 50mg"],
    "Over-the-Counter (OTC)": ["Allegra 24hr", "Zyrtec 10mg", "Claritin 24hr", "Pepto-Bismol", "Tums Antacid"],
    "Pain Relief": ["Ibuprofen 200mg (200 ct)", "Advil Liqui-Gels", "Tylenol Extra Strength 500mg", "Aleve PM", "Motrin IB"],
    "Vitamins & Supplements": ["Vitamin C 1000mg", "Vitamin D3 2000IU", "Omega-3 Fish Oil", "Multivitamin (Men)", "Melatonin 5mg"],
    "Digestive Health": ["Probiotic 10 Billion", "Metamucil Fiber", "Lactaid Pills", "Imodium A-D", "Dulcolax Laxative"],
    "Allergy & Sinus": ["Flonase Nasal Spray", "Benadryl Allergy", "Sudafed PE Congestion", "Xyzal 24hr", "Nasacort 24hr"],
    "Wound Care": ["Band-Aid Variety Pack", "Neosporin Ointment", "Isopropyl Alcohol", "Sterile Gauze Pads", "Medical Tape"],
    "First Aid": ["First Aid Kit", "Digital Thermometer", "ACE Elastic Bandage", "Medical Scissors", "Reusable Ice Pack"],
    "Skin Care": ["CeraVe Moisturizing Cream", "Neutrogena Hydro Boost", "Vaseline Petroleum Jelly", "Aquaphor Healing Ointment", "Gold Bond Lotion"],
    "Hair Care": ["Head & Shoulders Shampoo", "Pantene Conditioner", "L'Or√©al Hair Color", "TRESemm√© Hairspray", "Batiste Dry Shampoo"],
    "Oral Hygiene": ["Crest 3D White Toothpaste", "Listerine Mouthwash", "Oral-B Glide Floss", "Philips Sonicare Toothbrush", "Baking Soda"],
    "Feminine Care": ["Tampax Pearl Tampons", "Always Ultra Thin Pads", "Carefree Pantiliners", "DivaCup Menstrual Cup", "Summer's Eve Wash"],
    "Deodorants": ["Dove Antiperspirant", "Old Spice Deodorant", "Secret Clinical Strength", "Degree Men", "Native Natural Deodorant"],
    "Makeup": ["Maybelline Great Lash Mascara", "L'Or√©al Infallible Foundation", "NYX Brow Pencil", "Revlon Lipstick", "e.l.f. Concealer"],
    "Facial Care": ["CeraVe Facial Cleanser", "Garnier Micellar Water", "St. Ives Apricot Scrub", "Thayers Witch Hazel Toner", "Bior√© Pore Strips"],
    "Fragrances": ["Chanel No. 5", "Dior Sauvage", "Versace Bright Crystal", "Polo Blue", "Ariana Grande Cloud"],
    "Baby Formula & Food": ["Enfamil NeuroPro Formula", "Similac Pro-Advance", "Gerber Puffs", "Organic Apple Sauce (Baby)", "Nursery Water"],
    "Diapers & Wipes": ["Pampers Swaddlers (Size 1)", "Huggies Little Movers (Size 3)", "WaterWipes", "Pampers Sensitive Wipes"],
    "Baby Care": ["Johnson's Baby Shampoo", "Desitin Diaper Rash Cream", "Avent Pacifier", "Dr. Brown's Bottle", "VTech Baby Monitor"],
    "Household Cleaning": ["Clorox Wipes", "Lysol Disinfectant Spray", "Tide Liquid Detergent", "Downy Fabric Softener", "Windex Glass Cleaner"],
    "Paper & Plastic": ["Charmin Ultra Soft (12 Rolls)", "Bounty Paper Towels (6 Rolls)", "Glad Trash Bags (13 Gal)", "Disposable Paper Plates", "Solo Cups (50 ct)"],
    "Batteries & Bulbs": ["Duracell AA Batteries (16 pk)", "Energizer AAA Batteries (12 pk)", "GE LED Bulb 60W", "Philips Hue Smart Bulb", "CR2032 Lithium Batteries"],
    "Snacks": ["Lay's Classic Potato Chips", "Doritos Nacho Cheese", "Cheetos Crunchy", "Oreo Cookies", "Pringles Original"],
    "Beverages": ["Coca-Cola 2L", "Pepsi 2L", "Bottled Water 1L", "Orange Juice 1.5L", "Gatorade Fruit Punch"],
    "Candy": ["M&M's Milk Chocolate", "Snickers Bar", "Haribo Goldbears", "Reese's Peanut Butter Cups", "Skittles Original"],
    "Frozen Food": ["DiGiorno Pepperoni Pizza", "Ben & Jerry's Ice Cream", "Tyson Chicken Nuggets", "Stouffer's Mac & Cheese", "Hot Pockets"],
    "Breakfast & Cereal": ["Cheerios Cereal", "Frosted Flakes", "Quaker Oats", "Folgers Classic Roast Coffee", "Nutella Hazelnut Spread"],
    "Photo & Electronics": ["USB-C Cable (1m)", "Anker Wall Charger", "Apple EarPods", "SanDisk 64GB MicroSD", "Portable Power Bank"],
    "Contact Lenses": ["Renu Contact Solution", "Saline Solution", "Lubricating Eye Drops", "Contact Lens Case"],
    "Cards & Gifts": ["Amazon Gift Card $25", "Starbucks Gift Card $15", "Hallmark Birthday Card", "PlayStation Store Card $20", "Generic Product"]
}

# Product Descriptions Map (English)
PRODUCT_DESCRIPTIONS = {
    "Metformin 500mg": "Glucose control for type 2 diabetes.", "Lisinopril 10mg": "Treatment for high blood pressure.",
    "Atorvastatin 20mg": "Lowers high cholesterol.", "Amlodipine 5mg": "Treatment for high blood pressure and chest pain.",
    "Losartan 50mg": "Treatment for high blood pressure.", "Allegra 24hr": "Non-drowsy allergy relief. 24 hours.",
    "Zyrtec 10mg": "24-hour allergy relief, indoor and outdoor.", "Claritin 24hr": "Non-drowsy allergy relief.",
    "Pepto-Bismol": "Relief for indigestion, heartburn, and upset stomach.", "Tums Antacid": "Chewable antacid for fast heartburn relief.",
    "Ibuprofen 200mg (200 ct)": "Temporary relief of pain and fever. 200 tablets.", "Advil Liqui-Gels": "Liquid filled capsules for fast pain relief.",
    "Tylenol Extra Strength 500mg": "Pain reliever and fever reducer.", "Aleve PM": "Sleep aid plus pain relief for nighttime.",
    "Motrin IB": "Ibuprofen tablets for pain relief.", "Vitamin C 1000mg": "Immune support supplement.",
    "Vitamin D3 2000IU": "Support for bone and immune health.", "Omega-3 Fish Oil": "Fish oil supplement for heart health.",
    "Multivitamin (Men)": "Complete daily multivitamin for men.", "Melatonin 5mg": "Natural sleep aid supplement.",
    "Probiotic 10 Billion": "Supports digestive health and intestinal balance.", "Metamucil Fiber": "Psyllium fiber supplement for regularity.",
    "Lactaid Pills": "Helps digest dairy products.", "Imodium A-D": "Anti-diarrheal control.",
    "Dulcolax Laxative": "Gentle and predictable constipation relief.", "Flonase Nasal Spray": "24-hour allergy relief nasal spray.",
    "Benadryl Allergy": "Antihistamine for allergy and cold relief.", "Sudafed PE Congestion": "Non-drowsy nasal decongestant.",
    "Xyzal 24hr": "24-hour allergy relief tablets.", "Nasacort 24hr": "No-drip allergy nasal spray.",
    "Band-Aid Variety Pack": "Assorted sizes of adhesive bandages.", "Neosporin Ointment": "Antibiotic protection for minor wounds.",
    "Isopropyl Alcohol": "First aid antiseptic.", "Sterile Gauze Pads": "Sterile pads for wound cleaning and covering.",
    "Medical Tape": "Paper tape for securing bandages.", "First Aid Kit": "Compact kit with essential emergency supplies.",
    "Digital Thermometer": "Fast read oral, rectal, or underarm thermometer.", "ACE Elastic Bandage": "Compression bandage with clips.",
    "Medical Scissors": "Shears for cutting bandages and clothing.", "Reusable Ice Pack": "Flexible ice pack for cold therapy.",
    "CeraVe Moisturizing Cream": "Daily moisturizing cream for dry skin.", "Neutrogena Hydro Boost": "Hydrating water gel with hyaluronic acid.",
    "Vaseline Petroleum Jelly": "100% pure white petroleum jelly.", "Aquaphor Healing Ointment": "Advanced therapy for dry, cracked skin.",
    "Gold Bond Lotion": "Medicated body lotion for itch relief.", "Head & Shoulders Shampoo": "Dandruff shampoo for daily use.",
    "Pantene Conditioner": "Repairing and hydrating hair conditioner.", "L'Or√©al Hair Color": "Permanent hair color kit for home use.",
    "TRESemm√© Hairspray": "Extra hold hair spray.", "Batiste Dry Shampoo": "Dry shampoo to refresh hair between washes.",
    "Crest 3D White Toothpaste": "Whitening toothpaste with fluoride.", "Listerine Mouthwash": "Antiseptic mouthwash for fresh breath.",
    "Oral-B Glide Floss": "Mint flavored dental floss.", "Philips Sonicare Toothbrush": "Rechargeable electric toothbrush.",
    "Baking Soda": "Pure baking soda for cleaning and deodorizing.", "Tampax Pearl Tampons": "Tampons with plastic applicator.",
    "Always Ultra Thin Pads": "Ultra thin sanitary pads with wings.", "Carefree Pantiliners": "Thin daily pantiliners.",
    "DivaCup Menstrual Cup": "Reusable silicone menstrual cup.", "Summer's Eve Wash": "Hypoallergenic feminine cleansing wash.",
    "Dove Antiperspirant": "Stick antiperspirant, 48h protection.", "Old Spice Deodorant": "Men's deodorant stick, fresh scent.",
    "Secret Clinical Strength": "Clinical strength soft solid antiperspirant.", "Degree Men": "MotionSense antiperspirant stick for sports.",
    "Native Natural Deodorant": "Aluminum and paraben free deodorant.", "Maybelline Great Lash Mascara": "Washable mascara, very black.",
    "L'Or√©al Infallible Foundation": "Longwear liquid foundation.", "NYX Brow Pencil": "Micro brow pencil with spoolie.",
    "Revlon Lipstick": "Creamy lipstick, various shades.", "e.l.f. Concealer": "High coverage liquid concealer.",
    "CeraVe Facial Cleanser": "Hydrating facial cleanser for normal to dry skin.", "Garnier Micellar Water": "All-in-one micellar cleansing water.",
    "St. Ives Apricot Scrub": "Apricot facial scrub for exfoliation.", "Thayers Witch Hazel Toner": "Alcohol-free facial toner with aloe vera.",
    "Bior√© Pore Strips": "Deep cleansing nose strips.", "Chanel No. 5": "Classic floral fragrance for women.",
    "Dior Sauvage": "Fresh and spicy cologne for men.", "Versace Bright Crystal": "Floral and fruity perfume for women.",
    "Polo Blue": "Classic aquatic cologne for men.", "Ariana Grande Cloud": "Sweet and airy perfume for women.",
    "Enfamil NeuroPro Formula": "Infant formula powder, 0-12 months.", "Similac Pro-Advance": "Infant formula with HMO, 0-12 months.",
    "Gerber Puffs": "Cereal snack for crawlers, strawberry apple.", "Organic Apple Sauce (Baby)": "Organic apple puree pouch.",
    "Nursery Water": "Purified water with minerals for mixing formula.", "Pampers Swaddlers (Size 1)": "Disposable diapers, size 1.",
    "Huggies Little Movers (Size 3)": "Active baby diapers, size 3.", "WaterWipes": "Baby wipes made with 99.9% water.",
    "Pampers Sensitive Wipes": "Hypoallergenic wipes for sensitive skin.", "Johnson's Baby Shampoo": "Tear-free baby shampoo.",
    "Desitin Diaper Rash Cream": "Rapid relief cream for diaper rash.", "Avent Pacifier": "Silicone pacifier, 0-6 months.",
    "Dr. Brown's Bottle": "Anti-colic baby bottle, 8 oz.", "VTech Baby Monitor": "Digital audio baby monitor.",
    "Clorox Wipes": "Disinfecting wipes, lemon scent.", "Lysol Disinfectant Spray": "Kills 99.9% of viruses and bacteria.",
    "Tide Liquid Detergent": "Original scent liquid laundry detergent.", "Downy Fabric Softener": "Liquid fabric conditioner, April Fresh.",
    "Windex Glass Cleaner": "Streak-free shine for glass and surfaces.", "Charmin Ultra Soft (12 Rolls)": "Ultra soft toilet paper, mega rolls.",
    "Bounty Paper Towels (6 Rolls)": "Absorbent paper towels, select-a-size.", "Glad Trash Bags (13 Gal)": "ForceFlex tall kitchen drawstring bags.",
    "Disposable Paper Plates": "Heavy duty paper plates, 50 count.", "Solo Cups (50 ct)": "Red plastic party cups, 16 oz.",
    "Duracell AA Batteries (16 pk)": "Alkaline AA batteries, long lasting.", "Energizer AAA Batteries (12 pk)": "Max alkaline AAA batteries.",
    "GE LED Bulb 60W": "Soft white LED general purpose bulb.", "Philips Hue Smart Bulb": "White and color ambiance LED bulb.",
    "CR2032 Lithium Batteries": "3V lithium coin batteries, 2 pack.", "Lay's Classic Potato Chips": "Classic salted potato chips.",
    "Doritos Nacho Cheese": "Nacho cheese flavored tortilla chips.", "Cheetos Crunchy": "Crunchy cheese flavored snacks.",
    "Oreo Cookies": "Chocolate sandwich cookies with cream filling.", "Pringles Original": "Original flavored potato crisps in a can.",
    "Coca-Cola 2L": "Carbonated cola soft drink. 2 Liter bottle.", "Pepsi 2L": "Carbonated cola soft drink. 2 Liter bottle.",
    "Bottled Water 1L": "Purified drinking water, 1 liter.", "Orange Juice 1.5L": "100% pure orange juice, pulp free.",
    "Gatorade Fruit Punch": "Thirst quencher sports drink.", "M&M's Milk Chocolate": "Milk chocolate candies with candy shell.",
    "Snickers Bar": "Chocolate bar with peanuts, caramel, and nougat.", "Haribo Goldbears": "Gummy bears, assorted fruit flavors.",
    "Reese's Peanut Butter Cups": "Milk chocolate cups with peanut butter.", "Skittles Original": "Chewy fruit flavored candies.",
    "DiGiorno Pepperoni Pizza": "Rising crust frozen pepperoni pizza.", "Ben & Jerry's Ice Cream": "Cherry Garcia ice cream pint.",
    "Tyson Chicken Nuggets": "Breaded chicken nuggets.", "Stouffer's Mac & Cheese": "Frozen macaroni and cheese.",
    "Hot Pockets": "Frozen sandwiches, pepperoni pizza.", "Cheerios Cereal": "Whole grain oat cereal, gluten free.",
    "Frosted Flakes": "Sugar frosted corn flakes cereal.", "Quaker Oats": "Old fashioned rolled oats.",
    "Folgers Classic Roast Coffee": "Medium roast ground coffee.", "Nutella Hazelnut Spread": "Hazelnut spread with cocoa.",
    "USB-C Cable (1m)": "USB-C to USB-C charging and data cable.", "Anker Wall Charger": "Dual port USB wall charger.",
    "Apple EarPods": "Wired headphones with Lightning connector.", "SanDisk 64GB MicroSD": "Memory card with adapter.",
    "Portable Power Bank": "10000mAh external battery pack.", "Renu Contact Solution": "Multi-purpose contact lens solution.",
    "Saline Solution": "Sterile saline solution for rinsing.", "Lubricating Eye Drops": "Relief for dry eyes.",
    "Contact Lens Case": "Screw-top contact lens storage case.", "Amazon Gift Card $25": "Gift card redeemable at Amazon.com.",
    "Starbucks Gift Card $15": "Gift card redeemable at Starbucks.", "Hallmark Birthday Card": "Happy Birthday greeting card.",
    "PlayStation Store Card $20": "Gift card for PlayStation Network.", "Generic Product": "Generic description for this item."
}

# -----------------------------------------------------------------

# Initialize Faker for USA
fake = Faker('en_US')

def connect_db():
    """Establishes connection to the PostgreSQL database"""
    try:
        conn = psycopg2.connect(**DB_CONFIG)
        print("Database connection successful.")
        return conn
    except psycopg2.Error as e:
        print(f"Error connecting to database: {e}")
        exit(1)

def populate_simple_tables(cursor):
    """Populates tables without foreign keys."""

    print("Populating Payment Methods...")
    payment_method_ids = []
    for name in PAYMENT_METHODS_LIST:
        cursor.execute(
            "INSERT INTO Payment_Method (method, active) VALUES (%s, %s) RETURNING payment_method_id",
            (name, True)
        )
        payment_method_ids.append(cursor.fetchone()[0])

    print("Populating Coupons...")
    coupon_ids = []
    for _ in range(QUANTITIES['coupons']):
        cursor.execute(
            """
            INSERT INTO Coupon (code, type, value, min_purchase, max_usage_global, max_usage_customer, valid_from, valid_until, status, description)
            VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s) RETURNING coupon_id
            """,
            (
                fake.unique.bothify(text='PROMO-????-####'), 'Percentage', random.randint(5, 20), random.randint(20, 50),
                random.randint(1000, 5000), random.randint(1, 3), fake.past_date(start_date='-1y'),
                fake.future_date(end_date='+1y'), 'Active', fake.sentence(nb_words=6)
            )
        )
        coupon_ids.append(cursor.fetchone()[0])

    print("Populating Categories...")
    category_ids = []
    for name in CATEGORIES_LIST:
        cursor.execute(
            "INSERT INTO Category (category_name, description) VALUES (%s, %s) RETURNING category_id",
            (name, fake.sentence(nb_words=4))
        )
        category_ids.append(cursor.fetchone()[0])

    print("Populating Branches...")
    branch_ids = []
    for _ in range(QUANTITIES['branches']):
        cursor.execute(
            """
            INSERT INTO Branch (country, state, city, street, phone, active)
            VALUES (%s, %s, %s, %s, %s, %s) RETURNING branch_id
            """,
            (
                'USA', random.choice(WALGREENS_STATES), fake.city(),
                fake.street_address(), fake.phone_number(), True
            )
        )
        branch_ids.append(cursor.fetchone()[0])

    print("Populating Customers...")
    customer_ids = []
    # Removed TQDM (Progress Bar) for GitHub compatibility
    for _ in range(QUANTITIES['customers']):
        cursor.execute(
            """
            INSERT INTO Customer (first_name, last_name, country, state, city, street, is_member, phone, points_available)
            VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) RETURNING customer_id
            """,
            (
                fake.first_name(), fake.last_name(), 'USA', random.choice(WALGREENS_STATES),
                fake.city(), fake.street_address(), random.choice([True, False]),
                fake.phone_number(), random.randint(0, 5000)
            )
        )
        customer_ids.append(cursor.fetchone()[0])

    return payment_method_ids, coupon_ids, category_ids, branch_ids, customer_ids

def populate_dependent_tables(cursor, category_ids, branch_ids):
    """
    Populates Employee (depends on Branch) and Product (depends on Category).
    Returns a map of employees per branch.
    """

    print("Populating Employees...")

    employees_per_branch = {branch_id: [] for branch_id in branch_ids}

    # Removed TQDM (Progress Bar) for GitHub compatibility
    for _ in range(QUANTITIES['employees']):
        assigned_branch = random.choice(branch_ids)

        cursor.execute(
            """
            INSERT INTO Employee (branch_id, first_name, last_name, country, state, city, street, phone, position, active)
            VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s) RETURNING employee_id
            """,
            (
                assigned_branch, fake.first_name(), fake.last_name(), 'USA',
                random.choice(WALGREENS_STATES), fake.city(), fake.street_address(),
                fake.phone_number(), random.choice(EMPLOYEE_POSITIONS), True
            )
        )
        employee_id = cursor.fetchone()[0]
        employees_per_branch[assigned_branch].append(employee_id)

    print("Populating Products...")
    product_data = {} # Dict to store {id: price}

    # Map category IDs to names
    cat_id_to_name = dict(zip(category_ids, CATEGORIES_LIST))

    # Removed TQDM (Progress Bar) for GitHub compatibility
    for _ in range(QUANTITIES['products']):
        cat_id = random.choice(category_ids)
        cat_name = cat_id_to_name[cat_id]

        # --- ENGLISH LOGIC ---
        templates = PRODUCT_TEMPLATES.get(cat_name, ["Generic Product"])
        product_name = random.choice(templates)

        description = PRODUCT_DESCRIPTIONS.get(product_name, "Description not available.")

        brand = random.choice(["Walgreens", "Nice!", "Finest Nutrition", "Advil", "Tylenol", "CeraVe", "Dove", "L'Or√©al", "Coca-Cola", "PepsiCo"])
        # ---------------------

        price = round(random.uniform(1.99, 89.99), 2)
        stock = random.randint(50, 500)

        cursor.execute(
            """
            INSERT INTO Product (category_id, name, description, brand, unit_price, stock, active)
            VALUES (%s, %s, %s, %s, %s, %s, %s) RETURNING product_id
            """,
            (
                cat_id, product_name, description, brand, price, stock, True
            )
        )
        product_id = cursor.fetchone()[0]
        product_data[product_id] = price

    return employees_per_branch, product_data

def populate_transactions(cursor, customer_ids, employees_per_branch, branch_ids, product_data, payment_method_ids, coupon_ids):
    """
    Populates transactional core: Sale, Sale_Detail, Invoice, etc.
    """

    print(f"Populating {QUANTITIES['sales']} Sales and details...")

    product_ids = list(product_data.keys())

    print("Generating customer activity distribution (long tail)...")
    customer_weights = [random.lognormvariate(0, 1.5) for _ in range(len(customer_ids))]

    print("Generating branch performance distribution (long tail)...")
    branch_weights = [random.lognormvariate(0, 1.5) for _ in range(len(branch_ids))]

    # Removed TQDM (Progress Bar) for GitHub compatibility
    for _ in range(QUANTITIES['sales']):
        try:
            # --- 1. Create Sale ---
            sale_date = fake.date_time_between(start_date='-2y', end_date='now')
            customer_id = random.choices(customer_ids, weights=customer_weights, k=1)[0]

            # --- Branch and Employee Logic (CORREGIDO - FIXED) ---
            branch_id = random.choices(branch_ids, weights=branch_weights, k=1)[0]
            valid_employees = employees_per_branch[branch_id]

            if not valid_employees:
                alt_branch = random.choice([b for b in branch_ids if employees_per_branch[b]])
                valid_employees = employees_per_branch[alt_branch]
                branch_id = alt_branch # Fix Geo-Consistency

            employee_id = random.choice(valid_employees)

            points = random.randint(10, 100)

            cursor.execute(
                """
                INSERT INTO Sale (employee_id, customer_id, branch_id, date, channel, points_generated)
                VALUES (%s, %s, %s, %s, %s, %s) RETURNING sale_id
                """,
                (
                    employee_id, customer_id, branch_id, sale_date,
                    random.choice(['Store', 'Online', 'App']), points
                )
            )
            sale_id = cursor.fetchone()[0]

            # --- 2. Create Sale Details ---
            num_products = random.randint(1, 6)
            sale_subtotal = 0.0
            details_to_insert = []

            selected_products = random.sample(product_ids, num_products)

            for prod_id in selected_products:
                quantity = random.randint(1, 3)
                price = product_data[prod_id]
                sale_subtotal += (price * quantity)
                details_to_insert.append((prod_id, sale_id, quantity))

            psycopg2.extras.execute_batch(
                cursor,
                "INSERT INTO Sale_Detail (product_id, sale_id, quantity) VALUES (%s, %s, %s)",
                details_to_insert
            )

            # --- 3. Create Invoice ---
            cursor.execute(
                "INSERT INTO Invoice (sale_id) VALUES (%s) RETURNING invoice_id",
                (sale_id,)
            )
            invoice_id = cursor.fetchone()[0]

            # --- 4. Apply Coupon (Optional) ---
            discount_amount = 0.0
            if random.random() < 0.20:
                coupon_id = random.choice(coupon_ids)
                discount_amount = round(sale_subtotal * 0.10, 2)

                cursor.execute(
                    "INSERT INTO Redeemed_Coupon (sale_id, coupon_id, amount) VALUES (%s, %s, %s)",
                    (sale_id, coupon_id, discount_amount)
                )

            # --- 5. Create Invoice Detail (Payment) ---
            tax_amount = round(sale_subtotal * 0.08, 2)
            total_amount = round(sale_subtotal + tax_amount - discount_amount, 2)

            cursor.execute(
                """
                INSERT INTO Invoice_Detail (invoice_id, payment_method_id, payment_date, subtotal, tax, total, discount)
                VALUES (%s, %s, %s, %s, %s, %s, %s)
                """,
                (
                    invoice_id, random.choice(payment_method_ids), sale_date + timedelta(minutes=random.randint(1, 5)),
                    round(sale_subtotal, 2), tax_amount, total_amount, discount_amount
                )
            )

            # --- 6. Points Movement ---
            if points > 0:
                cursor.execute(
                    "INSERT INTO Points_Movement (customer_id, sale_id, points) VALUES (%s, %s, %s)",
                    (customer_id, sale_id, points * -1)
                )

        except Exception as e:
            print(f"Error generating transaction: {e}. Skipping.")
            cursor.connection.rollback()
            continue

def main():
    conn = None
    try:
        conn = connect_db()
        cursor = conn.cursor()

        # --- Clear Tables and Reset IDs (English Table Names) ---
        print("--- [!] CLEARING AND RESETTING TABLES (RESTART IDENTITY) ---")
        cursor.execute("""
            TRUNCATE TABLE
                Category, Branch, Customer, Coupon, Payment_Method,
                Product, Employee, Sale, Sale_Detail, Redeemed_Coupon,
                Invoice, Invoice_Detail, Points_Movement
            RESTART IDENTITY CASCADE;
        """)
        print("--- Tables cleared and sequences reset to 1. ---")
        # -----------------------------------------------

        print("--- Starting Database Population ---")

        # 1. Populate simple master tables
        payment_method_ids, coupon_ids, category_ids, branch_ids, customer_ids = populate_simple_tables(cursor)

        # 2. Populate dependent tables
        employees_per_branch, product_data = populate_dependent_tables(cursor, category_ids, branch_ids)

        # 3. Populate transactional core
        populate_transactions(
            cursor, customer_ids, employees_per_branch, branch_ids,
            product_data, payment_method_ids, coupon_ids
        )

        print("\n--- Committing changes (COMMIT) ---")
        conn.commit()
        print("\nSUCCESS! The database has been populated successfully.")

    except psycopg2.Error as e:
        print(f"\nError during population: {e}")
        if conn:
            print("Rolling back changes (ROLLBACK)...")
            conn.rollback()

    finally:
        if conn:
            cursor.close()
            conn.close()
            print("Database connection closed.")

if __name__ == "__main__":
    main()

# üîç Data Quality & Consistency Audit
We perform a series of SQL queries to ensure Referential Integrity and Business Logic consistency.

## 5. Execute Consistency Checks

In [None]:
%load_ext sql
%sql postgresql://postgres:postgres@localhost:5432/walgreens_dataset

print("--- 1. Master Level (Orphans Check) ---")
# 1. Orphan Products
%sql SELECT 'Product without Category' AS Error, p.product_id, p.name FROM Product p LEFT JOIN Category c ON p.category_id = c.category_id WHERE c.category_id IS NULL;

# 2. Orphan Employees
%sql SELECT 'Employee without Branch' AS Error, e.employee_id, e.first_name FROM Employee e LEFT JOIN Branch b ON e.branch_id = b.branch_id WHERE b.branch_id IS NULL;

print("--- 2. Transaction Level (Ghost Records) ---")
# 3. Sales with ghost Customer
%sql SELECT 'Sale without Customer' AS Error, s.sale_id FROM Sale s LEFT JOIN Customer c ON s.customer_id = c.customer_id WHERE c.customer_id IS NULL;

# 5. Sales with ghost Branch
%sql SELECT 'Sale without Branch' AS Error, s.sale_id FROM Sale s LEFT JOIN Branch b ON s.branch_id = b.branch_id WHERE b.branch_id IS NULL;

print("--- 3. Business Logic Audit ---")
# Case A: Geo-Consistency (Employee Branch vs Sale Branch)
%sql SELECT s.sale_id, 'Geo Inconsistency' AS Error, s.branch_id AS Sale_Branch, e.branch_id AS Employee_Branch FROM Sale s JOIN Employee e ON s.employee_id = e.employee_id WHERE s.branch_id != e.branch_id LIMIT 10;