# Generate CustomerTradeName Sample Data

## Overview
This notebook generates sample CustomerTradeName data that links to the Customer table, providing trade names for Business and Government customers only.

## Business Rules
- **Individual customers**: No trade name records (realistic business logic)
- **Business customers**: Compliance-approved trade names (Contoso, Fabrikam)
- **Government customers**: Fictional planetary authorities (avoids real government mapping)

## Input
- Reads from Customer_Core_Fields_Samples.csv to get CustomerId and CustomerTypeId

## Output
- **CustomerTradeName_samples.csv**: Trade name records for Business and Government customers
- Contains TradeNameId, periods, and notes

## Integration
- Perfect referential integrity with Customer table via CustomerId
- Only creates records for non-Individual customers

---

In [5]:
import pandas as pd
import numpy as np
import random
import os
from datetime import datetime, date, timedelta

# Set seed for reproducible results
random.seed(42)
np.random.seed(42)

# Configuration
INPUT_FOLDER = "C:\\temp\\samples\\output_as_input"  # Use output genereated from another program 
INPUT_FILE = "Customer_Samples.csv"
OUTPUT_FOLDER = "C:\\temp\\samples\\output"
OUTPUT_FILE = "CustomerTradeName_Samples.csv"

# Create output directory
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

# Remove existing output file if it exists
output_path = os.path.join(OUTPUT_FOLDER, OUTPUT_FILE)
if os.path.exists(output_path):
    os.remove(output_path)
    print(f"🗑️ Removed existing file: {output_path}")

print(f"🎯 GENERATING CUSTOMER TRADE NAME SAMPLE DATA")
print(f"Input: {INPUT_FOLDER}\\{INPUT_FILE}")
print(f"Output: {OUTPUT_FOLDER}\\{OUTPUT_FILE}")
print("="*60)

# Read customer data
try:
    input_path = os.path.join(INPUT_FOLDER, INPUT_FILE)
    print(f"📂 Reading customer input: {input_path}")
    
    if not os.path.exists(input_path):
        print(f"❌ File not found: {input_path}")
        print("💡 Please run Generate_Customer_Core_Fields.ipynb first to create customer data")
        raise FileNotFoundError(f"Customer_core_fields.csv not found")
    
    df_customers = pd.read_csv(input_path)
    
    print(f"✅ Customer file read successfully!")
    print(f"📊 Shape: {df_customers.shape}")
    print(f"📋 Required columns: CustomerId, CustomerTypeId")
    
    # Verify required columns
    required_columns = ['CustomerId', 'CustomerTypeId']
    missing_columns = [col for col in required_columns if col not in df_customers.columns]
    
    if missing_columns:
        print(f"❌ Missing required columns: {missing_columns}")
        raise ValueError(f"Missing columns: {missing_columns}")
    
    print(f"✅ All required columns found")
    
except Exception as e:
    print(f"❌ Error reading customer file: {e}")
    raise

print("✅ Customer data loaded successfully!")

🎯 GENERATING CUSTOMER TRADE NAME SAMPLE DATA
Input: C:\temp\samples\output_as_input\Customer_Samples.csv
Output: C:\temp\samples\output\CustomerTradeName_Samples.csv
📂 Reading customer input: C:\temp\samples\output_as_input\Customer_Samples.csv
✅ Customer file read successfully!
📊 Shape: (513, 14)
📋 Required columns: CustomerId, CustomerTypeId
✅ All required columns found
✅ Customer data loaded successfully!


In [6]:
# Generate CustomerTradeName data
print("\n🔄 GENERATING TRADE NAME DATA")
print("="*50)

# Business trade names (using Contoso/Fabrikam per compliance)
business_names = [
    "Contoso Corp", "Contoso Industries", "Contoso Solutions", "Contoso Enterprises",
    "Contoso Manufacturing", "Contoso Technology", "Contoso Services", "Contoso Group",
    "Fabrikam Inc", "Fabrikam Systems", "Fabrikam Group", "Fabrikam Technologies",
    "Fabrikam Manufacturing", "Fabrikam Solutions", "Fabrikam Services", "Fabrikam Industries"
]

# Government trade names (using fictional planets per compliance)
government_names = [
    "Planet Mars Authority", "Jupiter Department of Commerce", "Saturn Municipal Services",
    "Venus Regional Office", "Neptune State Agency", "Mercury City Government",
    "Uranus Federal Bureau", "Pluto District Office", "Europa Space Authority",
    "Titan Regional Services", "Ganymede Municipal Office", "Callisto State Department"
]

def generate_trade_name_records(customers_df):
    """Generate trade name records for Business and Government customers only"""
    
    # Filter to only Business and Government customers
    eligible_customers = customers_df[customers_df['CustomerTypeId'].isin(['Business', 'Government'])]
    
    print(f"📊 Total customers: {len(customers_df)}")
    print(f"📊 Eligible for trade names: {len(eligible_customers)}")
    
    business_customers = eligible_customers[eligible_customers['CustomerTypeId'] == 'Business']
    government_customers = eligible_customers[eligible_customers['CustomerTypeId'] == 'Government']
    
    print(f"  └─ Business customers: {len(business_customers)}")
    print(f"  └─ Government customers: {len(government_customers)}")
    
    trade_name_records = []
    trade_name_counter = 1
    
    # Generate trade names for each eligible customer
    for idx, customer in eligible_customers.iterrows():
        customer_id = customer['CustomerId']
        customer_type = customer['CustomerTypeId']
        
        # Generate TradeNameId
        trade_name_id = f"TN-{trade_name_counter:03d}"
        
        # Select appropriate trade name based on customer type
        if customer_type == 'Business':
            trade_name = random.choice(business_names)
        else:  # Government
            trade_name = random.choice(government_names)
        
        # Generate realistic period dates
        # Most trade names start when customer was established and are ongoing
        start_date = date(2018, 1, 1) + timedelta(days=random.randint(0, 1460))  # Random date 2018-2021
        
        # 95% of trade names are still active (no end date)
        end_date = None
        if random.random() < 0.05:  # 5% have ended
            end_date = start_date + timedelta(days=random.randint(365, 1095))  # 1-3 years later
        
        trade_name_records.append({
            'CustomerId': customer_id,
            'CustomerTypeId': customer_type,
            'TradeNameId': trade_name_id,
            'TradeName': trade_name,
            'PeriodStartDate': start_date,
            'PeriodEndDate': end_date,
            'CustomerTradeNameNote': f'Generated {customer_type.lower()} trade name for compliance testing'
        })
        
        trade_name_counter += 1
    
    return trade_name_records

# Generate the trade name records
trade_name_records = generate_trade_name_records(df_customers)

# Create DataFrame
df_trade_names = pd.DataFrame(trade_name_records)

print(f"\n✅ Trade name generation complete!")
print(f"📊 Total trade name records: {len(df_trade_names)}")


🔄 GENERATING TRADE NAME DATA
📊 Total customers: 513
📊 Eligible for trade names: 160
  └─ Business customers: 102
  └─ Government customers: 58

✅ Trade name generation complete!
📊 Total trade name records: 160


In [7]:
# Display analysis and save data
print("\n📊 TRADE NAME DATA ANALYSIS")
print("="*50)

# Distribution by customer type
print("🎯 Trade Names by Customer Type:")
type_dist = df_trade_names['CustomerTypeId'].value_counts()
for ctype, count in type_dist.items():
    print(f"  {ctype:12}: {count:3d} trade names")

# Business trade name distribution
print("\n🎯 Business Trade Name Distribution:")
business_trades = df_trade_names[df_trade_names['CustomerTypeId'] == 'Business']
if len(business_trades) > 0:
    business_name_dist = business_trades['TradeName'].value_counts().head(10)
    for name, count in business_name_dist.items():
        print(f"  {name}: {count}")

# Government trade name distribution
print("\n🎯 Government Trade Name Distribution:")
gov_trades = df_trade_names[df_trade_names['CustomerTypeId'] == 'Government']
if len(gov_trades) > 0:
    gov_name_dist = gov_trades['TradeName'].value_counts().head(10)
    for name, count in gov_name_dist.items():
        print(f"  {name}: {count}")

# Period analysis
active_trades = len(df_trade_names[df_trade_names['PeriodEndDate'].isnull()])
ended_trades = len(df_trade_names[df_trade_names['PeriodEndDate'].notnull()])

print(f"\n🎯 Trade Name Period Status:")
print(f"  Active (no end date): {active_trades:3d} ({active_trades/len(df_trade_names)*100:5.1f}%)")
print(f"  Ended (has end date): {ended_trades:3d} ({ended_trades/len(df_trade_names)*100:5.1f}%)")

# Date ranges
start_dates = pd.to_datetime(df_trade_names['PeriodStartDate'])
print(f"\n🎯 Period Start Dates:")
print(f"  Earliest: {start_dates.min().strftime('%Y-%m-%d')}")
print(f"  Latest: {start_dates.max().strftime('%Y-%m-%d')}")

# Sample records
print(f"\n📋 Sample Trade Name Records (First 10):")
sample_df = df_trade_names[['CustomerId', 'CustomerTypeId', 'TradeNameId', 'TradeName', 'PeriodStartDate', 'PeriodEndDate']].head(10)
for idx, row in sample_df.iterrows():
    end_info = f"to {row['PeriodEndDate']}" if pd.notnull(row['PeriodEndDate']) else "ongoing"
    print(f"  {row['CustomerId']} ({row['CustomerTypeId']}): {row['TradeName']} [{row['PeriodStartDate']} {end_info}]")

# Save to CSV
output_path = os.path.join(OUTPUT_FOLDER, OUTPUT_FILE)
df_trade_names.to_csv(output_path, index=False)

print(f"\n💾 SAVED TO: {output_path}")
print(f"📊 Total Records: {len(df_trade_names)}")
print(f"📈 Columns: {', '.join(df_trade_names.columns)}")

print(f"\n✅ COMPLIANCE VERIFICATION:")
print(f"  ✅ Business names: Contoso/Fabrikam approved names only")
print(f"  ✅ Government names: Fictional planetary authorities")
print(f"  ✅ Individual customers: Correctly excluded (no trade names)")
print(f"  ✅ Referential integrity: All CustomerIds match customer data")

print(f"\n✅ CustomerTradeName sample data generation complete!")
print(f"📋 Ready for database import with perfect referential integrity!")


📊 TRADE NAME DATA ANALYSIS
🎯 Trade Names by Customer Type:
  Business    : 102 trade names
  Government  :  58 trade names

🎯 Business Trade Name Distribution:
  Contoso Group: 12
  Contoso Solutions: 11
  Fabrikam Inc: 11
  Fabrikam Group: 8
  Fabrikam Manufacturing: 7
  Fabrikam Solutions: 7
  Contoso Services: 6
  Contoso Technology: 6
  Fabrikam Systems: 5
  Fabrikam Industries: 5

🎯 Government Trade Name Distribution:
  Ganymede Municipal Office: 10
  Europa Space Authority: 8
  Uranus Federal Bureau: 7
  Jupiter Department of Commerce: 6
  Neptune State Agency: 5
  Callisto State Department: 4
  Venus Regional Office: 4
  Titan Regional Services: 4
  Saturn Municipal Services: 3
  Planet Mars Authority: 3

🎯 Trade Name Period Status:
  Active (no end date): 150 ( 93.8%)
  Ended (has end date):  10 (  6.2%)

🎯 Period Start Dates:
  Earliest: 2018-01-04
  Latest: 2021-12-31

📋 Sample Trade Name Records (First 10):
  CID-002 (Government): Ganymede Municipal Office [2018-08-17 to 20