# 🏞️ Woodland Play Cafe - Clean Data Analysis

**Purpose**: Find missing records in tax report and identify amount mismatches

**Key Features**:
- ✅ Proper grouping by Order ID (CRITICAL FIX)
- ✅ Comprehensive missing data analysis
- ✅ Amount mismatch detection
- ✅ Clean, organized output
- ✅ No duplicate code


In [62]:
# Import required libraries
import pandas as pd
import numpy as np
from datetime import datetime

print("📊 Libraries imported successfully!")


📊 Libraries imported successfully!


In [63]:
# Load data files and prepare for August 2025 analysis
print("📂 Loading data files...")

WoodTrans = pd.read_excel("/Users/vijayaraghavandevaraj/Downloads/Wood - TransReport.xlsx")
WoodTax = pd.read_excel("/Users/vijayaraghavandevaraj/Downloads/Wood - Tax Report.xlsx")

print(f"   • Transaction Report: {len(WoodTrans):,} records")
print(f"   • Tax Report: {len(WoodTax):,} records")

# Prepare data for August 2025
trans_pos_df = WoodTrans[WoodTrans['Source'] == 'pos'].copy()
trans_pos_df['Transaction Type'] = 'sale'
trans_pos_df['Module'] = np.where(
    trans_pos_df['Order ID'].astype(str).str.startswith('MEM'),
    'memberships', 
    trans_pos_df['Source']
)
trans_pos_module_df = trans_pos_df[trans_pos_df['Module'] == 'pos'].copy()
trans_pos_module_df['Transaction Date'] = pd.to_datetime(trans_pos_module_df['Transaction Date'])
trans_pos_module_df['Month'] = trans_pos_module_df['Transaction Date'].dt.to_period('M')

tax_pos_df = WoodTax[WoodTax['Module Name'] == 'pos'].copy()
tax_pos_df['Date'] = pd.to_datetime(tax_pos_df['Date'])
tax_pos_df['Month'] = tax_pos_df['Date'].dt.to_period('M')

target_month = "2025-08"
trans_aug = trans_pos_module_df[trans_pos_module_df['Month'] == target_month]
tax_aug = tax_pos_df[tax_pos_df['Month'] == target_month]

print(f"   • August transactions: {len(trans_aug):,} records")
print(f"   • August tax records: {len(tax_aug):,} records")


📂 Loading data files...
   • Transaction Report: 1,836 records
   • Tax Report: 717 records
   • August transactions: 700 records
   • August tax records: 692 records


# Tax data preparation and filtering
tax_pos_df = WoodTax[WoodTax['Module Name'] == 'pos'].copy()
tax_pos_df['Date'] = pd.to_datetime(tax_pos_df['Date'])
tax_pos_df['Month'] = tax_pos_df['Date'].dt.to_period('M')

# Filter for August 2025
target_month = "2025-08"
trans_aug = trans_pos_module_df[trans_pos_module_df['Month'] == target_month]
tax_aug = tax_pos_df[tax_pos_df['Month'] == target_month]

print(f"   • August transactions: {len(trans_aug):,} records")
print(f"   • August tax records: {len(tax_aug):,} records")


In [64]:
# CRITICAL: Group transaction data by Order ID
print("🔄 Grouping transaction data by Order ID (CRITICAL STEP)...")

trans_aug_grouped = trans_aug.groupby('Order ID').agg({
    'Amount': 'sum',
    'Transaction Date': 'first',
    'Location': 'first',
    'Payment Type': 'first',
    'Payment Gateway': 'first'
}).reset_index()

print(f"   • Original transaction records: {len(trans_aug):,}")
print(f"   • Unique Order IDs (grouped): {len(trans_aug_grouped):,}")
print("   ✅ Now properly comparing unique orders vs tax records")


🔄 Grouping transaction data by Order ID (CRITICAL STEP)...
   • Original transaction records: 700
   • Unique Order IDs (grouped): 696
   ✅ Now properly comparing unique orders vs tax records


In [65]:
# Find missing records in tax report
print("🔍 Finding missing records in tax report...")

missing_in_tax = trans_aug_grouped[~trans_aug_grouped['Order ID'].isin(tax_aug['Order ID'])]

if len(missing_in_tax) > 0:
    print(f"   • Missing records: {len(missing_in_tax)}")
    print(f"   • Missing amount: ${missing_in_tax['Amount'].sum():.2f}")
    
    print(f"\n   📋 Missing Order IDs:")
    missing_details = missing_in_tax[['Order ID', 'Transaction Date', 'Amount', 'Location', 'Payment Type']].copy()
    missing_details = missing_details.sort_values('Transaction Date')
    
    for i, (_, row) in enumerate(missing_details.iterrows(), 1):
        print(f"      {i:2d}. {row['Order ID']} | ${row['Amount']:6.2f} | {row['Transaction Date'].strftime('%m/%d')} | {row['Payment Type']}")
else:
    print("   ✅ No missing records found!")


🔍 Finding missing records in tax report...
   • Missing records: 4
   • Missing amount: $414.26

   📋 Missing Order IDs:
       1. 1754496643962 | $ 14.27 | 08/06 | physicalCard
       2. 1754594250742 | $ 14.54 | 08/07 | physicalCard
       3. 1756325127801 | $ 31.82 | 08/27 | physicalCard
       4. 1755643008336 | $353.63 | 08/31 | physicalCard


In [66]:
# Find amount mismatches and complete summary
print("💰 Finding amount mismatches...")

# Find common orders and merge for comparison
common_orders = trans_aug_grouped[trans_aug_grouped['Order ID'].isin(tax_aug['Order ID'])].copy()
common_tax_orders = tax_aug[tax_aug['Order ID'].isin(trans_aug_grouped['Order ID'])].copy()

merged_comparison = pd.merge(
    common_orders[['Order ID', 'Amount', 'Transaction Date', 'Location']], 
    common_tax_orders[['Order ID', 'Total Sum', 'Tip', 'Tax']], 
    on='Order ID', 
    how='inner'
)

# Calculate differences
merged_comparison['Amount_Diff'] = merged_comparison['Amount'] - merged_comparison['Total Sum']
merged_comparison['Abs_Diff'] = abs(merged_comparison['Amount_Diff'])

# Find significant mismatches
significant_mismatches = merged_comparison[merged_comparison['Abs_Diff'] > 0.01]

print(f"   • Total matching orders: {len(merged_comparison):,}")
print(f"   • Orders with amount differences: {len(significant_mismatches)}")

if len(significant_mismatches) > 0:
    print(f"   • Total discrepancy: ${merged_comparison['Amount_Diff'].sum():.2f}")
    
    print(f"\n   🔍 Amount Differences:")
    mismatch_details = significant_mismatches[['Order ID', 'Amount', 'Total Sum', 'Amount_Diff', 'Abs_Diff']].copy()
    mismatch_details = mismatch_details.sort_values('Abs_Diff', ascending=False)
    
    for i, (_, row) in enumerate(mismatch_details.iterrows(), 1):
        print(f"      {i:2d}. {row['Order ID']} | Trans: ${row['Amount']:6.2f} | Tax: ${row['Total Sum']:6.2f} | Diff: ${row['Amount_Diff']:+6.2f}")
else:
    print("   ✅ No significant amount mismatches found!")

# Complete summary
print("\n" + "="*80)
print("📊 COMPLETE ANALYSIS SUMMARY")
print("="*80)

print(f"\n📈 TRANSACTION vs TAX REPORT COMPARISON:")
print(f"   • Transaction Report Total: ${trans_aug_grouped['Amount'].sum():,.2f}")
print(f"   • Tax Report Total:        ${tax_aug['Total Sum'].sum():,.2f}")
print(f"   • Difference:              ${trans_aug_grouped['Amount'].sum() - tax_aug['Total Sum'].sum():,.2f}")

print(f"\n🔍 MISSING RECORDS:")
print(f"   • Number of missing orders: {len(missing_in_tax)}")
print(f"   • Missing amount: ${missing_in_tax['Amount'].sum():,.2f}")

print(f"\n💰 AMOUNT DIFFERENCES:")
print(f"   • Number of different amounts: {len(significant_mismatches)}")
print(f"   • Total amount difference: ${merged_comparison['Amount_Diff'].sum():,.2f}")
print(f"   • Average difference: ${merged_comparison['Amount_Diff'].mean():.2f}")
print(f"   • Largest positive difference: ${merged_comparison['Amount_Diff'].max():,.2f}")
print(f"   • Largest negative difference: ${merged_comparison['Amount_Diff'].min():,.2f}")

print(f"\n📊 SUMMARY TOTALS:")
print(f"   • Total Transaction Orders: {len(trans_aug_grouped):,}")
print(f"   • Total Tax Report Orders:  {len(tax_aug):,}")
print(f"   • Matching Orders:          {len(merged_comparison):,}")
print(f"   • Missing Orders:           {len(missing_in_tax):,}")
print(f"   • Orders with Differences:  {len(significant_mismatches):,}")

print("\n" + "="*80)
print(f"✅ Analysis completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


💰 Finding amount mismatches...
   • Total matching orders: 692
   • Orders with amount differences: 1
   • Total discrepancy: $12.88

   🔍 Amount Differences:
       1. 1754080488535 | Trans: $ 25.76 | Tax: $ 12.88 | Diff: $+12.88

📊 COMPLETE ANALYSIS SUMMARY

📈 TRANSACTION vs TAX REPORT COMPARISON:
   • Transaction Report Total: $9,853.98
   • Tax Report Total:        $9,426.84
   • Difference:              $427.14

🔍 MISSING RECORDS:
   • Number of missing orders: 4
   • Missing amount: $414.26

💰 AMOUNT DIFFERENCES:
   • Number of different amounts: 1
   • Total amount difference: $12.88
   • Average difference: $0.02
   • Largest positive difference: $12.88
   • Largest negative difference: $-0.00

📊 SUMMARY TOTALS:
   • Total Transaction Orders: 696
   • Total Tax Report Orders:  692
   • Matching Orders:          692
   • Missing Orders:           4
   • Orders with Differences:  1

✅ Analysis completed: 2025-09-14 15:52:54


In [67]:
# DETAILED ORDER ID REPORTING
print("📋 DETAILED ORDER ID REPORT")
print("="*80)

print(f"\n🔍 MISSING RECORDS DETAILED LIST:")
print(f"Number of missing orders: {len(missing_in_tax)}")
print(f"Total missing amount: ${missing_in_tax['Amount'].sum():,.2f}")
print("-" * 80)

if len(missing_in_tax) > 0:
    missing_details = missing_in_tax[['Order ID', 'Transaction Date', 'Amount', 'Location', 'Payment Type', 'Payment Gateway']].copy()
    missing_details = missing_details.sort_values('Transaction Date')
    
    for i, (_, row) in enumerate(missing_details.iterrows(), 1):
        print(f"{i}. Order ID: {row['Order ID']}")
        print(f"   Date: {row['Transaction Date'].strftime('%Y-%m-%d')}")
        print(f"   Amount: ${row['Amount']:,.2f}")
        print(f"   Location: {row['Location']}")
        print(f"   Payment Type: {row['Payment Type']}")
        print(f"   Payment Gateway: {row['Payment Gateway']}")
        print()
else:
    print("✅ No missing records found!")

print(f"\n💰 AMOUNT DIFFERENCES DETAILED LIST:")
print(f"Number of different amounts: {len(significant_mismatches)}")
print(f"Total amount difference: ${merged_comparison['Amount_Diff'].sum():,.2f}")
print(f"Average difference: ${merged_comparison['Amount_Diff'].mean():.2f}")
print(f"Largest positive difference: ${merged_comparison['Amount_Diff'].max():,.2f}")
print(f"Largest negative difference: ${merged_comparison['Amount_Diff'].min():,.2f}")
print("-" * 80)

if len(significant_mismatches) > 0:
    mismatch_details = significant_mismatches[['Order ID', 'Amount', 'Total Sum', 'Amount_Diff', 'Tip', 'Tax']].copy()
    mismatch_details = mismatch_details.sort_values('Abs_Diff', ascending=False)
    
    for i, (_, row) in enumerate(mismatch_details.iterrows(), 1):
        print(f"{i}. Order ID: {row['Order ID']}")
        print(f"   Transaction Amount: ${row['Amount']:,.2f}")
        print(f"   Tax Report Amount:  ${row['Total Sum']:,.2f}")
        print(f"   Difference:         ${row['Amount_Diff']:+,.2f}")
        print(f"   Tip: ${row['Tip']:,.2f}")
        print(f"   Tax: ${row['Tax']:,.2f}")
        print()
else:
    print("✅ No significant amount differences found!")

print(f"\n📊 TRANSACTION vs TAX REPORT COMPARISON:")
print(f"   • Transaction Report Total: ${trans_aug_grouped['Amount'].sum():,.2f}")
print(f"   • Tax Report Total:        ${tax_aug['Total Sum'].sum():,.2f}")
print(f"   • Difference:              ${trans_aug_grouped['Amount'].sum() - tax_aug['Total Sum'].sum():,.2f}")

print(f"\n🎯 RECONCILIATION BREAKDOWN:")
print(f"   • Missing Records Amount:    ${missing_in_tax['Amount'].sum():,.2f}")
print(f"   • Amount Differences Total:  ${merged_comparison['Amount_Diff'].sum():,.2f}")
print(f"   • Combined Impact:           ${missing_in_tax['Amount'].sum() + merged_comparison['Amount_Diff'].sum():,.2f}")

print("\n" + "="*80)
print("✅ DETAILED REPORT COMPLETE")


📋 DETAILED ORDER ID REPORT

🔍 MISSING RECORDS DETAILED LIST:
Number of missing orders: 4
Total missing amount: $414.26
--------------------------------------------------------------------------------
1. Order ID: 1754496643962
   Date: 2025-08-06
   Amount: $14.27
   Location: East Nashville
   Payment Type: physicalCard
   Payment Gateway: square

2. Order ID: 1754594250742
   Date: 2025-08-07
   Amount: $14.54
   Location: East Nashville
   Payment Type: physicalCard
   Payment Gateway: square

3. Order ID: 1756325127801
   Date: 2025-08-27
   Amount: $31.82
   Location: East Nashville
   Payment Type: physicalCard
   Payment Gateway: square

4. Order ID: 1755643008336
   Date: 2025-08-31
   Amount: $353.63
   Location: East Nashville
   Payment Type: physicalCard
   Payment Gateway: square


💰 AMOUNT DIFFERENCES DETAILED LIST:
Number of different amounts: 1
Total amount difference: $12.88
Average difference: $0.02
Largest positive difference: $12.88
Largest negative difference: $-0.

KeyError: 'Abs_Diff'