**1) Environment Setup and Drive Mount**

Cell ensures all necessary components are ready: the Faker library is installed, Google Drive is mounted, and required Python modules imported.

In [None]:
### 1. Install Library, Mount Drive, and Import Modules ###

# Install the Faker library - synthetic data
%pip install faker

# Mount Google Drive to project folder
from google.colab import drive
drive.mount('/content/drive')

import csv
import random
import os
from datetime import datetime, timedelta
from faker import Faker

Collecting faker
  Downloading faker-38.0.0-py3-none-any.whl.metadata (15 kB)
Downloading faker-38.0.0-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-38.0.0
Mounted at /content/drive


**2) Configuration and Utility Functions**

Cell sets the project's output directory, establishes the random seeds for reproducible data generation, and defines the core write_csv utility.

In [None]:
### 2. Configuration, Seeding, and Data Utility ###

# --- CONFIGURATION ---
fake = Faker()
# Set seeds for Faker and Python's random module
Faker.seed(42)
random.seed(42)

# Set OUTPUT DIRECTORY
output_dir = '/content/drive/MyDrive/SQL Project (Group 2)/Simulating Data (Maya)/Simulated Data Files'
os.makedirs(output_dir, exist_ok=True)

# --- Data Writing Utility ---
def write_csv(filename, headers, data_rows):
    """
    Writes data to CSV file in configured output directory.

    CRITICAL STEP: Converts Python's empty strings ('') to None.
    Allows SQLAlchemy/PostgreSQL to correctly interpret the missing values as SQL NULL.
    """
    filepath = os.path.join(output_dir, filename)
    with open(filepath, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        # Convert empty strings to None
        clean_rows = [[(val if val != '' else None) for val in row] for row in data_rows]
        writer.writerows(clean_rows)

**3) Data Generation**

Section generates 150 unique address records. Records serve as the **primary key** source for the `addresses` table.

The first 10 IDs (1-10) are reserved for **Offices**, and IDs 11-110 are used for **Client Mailing Addresses**. IDs 111-150 are intentionally reused to simulate **Property** density (multi-unit buildings).

**4) Generate and Save Addresses**

Cell executes the logic to generate the data for the addresses table and saves the file.

In [None]:
# --- Generate Addresses (150) ---
addresses = []

for i in range(150):
    street = fake.street_address()
    # Randomly select a city from the Tri-State area for realistic geographic clustering.
    city = random.choice(['New York', 'Brooklyn', 'Queens', 'Bronx', 'Staten Island',
                          'Yonkers', 'Newark', 'Jersey City', 'Stamford', 'Bridgeport'])
    state = random.choice(['NY', 'NJ', 'CT'])
    zip_code = fake.postcode()
    # Simulate coordinates for Tri-State area
    lat = round(random.uniform(40.5, 41.5), 6)
    lon = round(random.uniform(-74.5, -73.5), 6)

    # Note: 'line2' is set to '' (empty string) to be converted to SQL NULL by the write_csv function.
    addresses.append([i+1, street, '', city, state, zip_code, lat, lon])

# Save the addresses.csv file
write_csv('addresses2.csv',
    ['address_id', 'line1', 'line2', 'city', 'state_code', 'postal_code', 'latitude', 'longitude'],
    addresses)

print("SUCCESS: addresses.csv (150 records) saved to Google Drive.")

SUCCESS: addresses.csv (150 records) saved to Google Drive.


**5) Next Steps**

The foundational `addresses.csv` file has been saved.

**Proceed to Notebook 2: `02_People_Org.ipynb`**