# 🏞️ Woodland Play Cafe - Clean Data Analysis

**Purpose**: Find missing records in tax report and identify amount mismatches

**Key Features**:
- Proper grouping by Order ID
- Comprehensive missing data analysis
- Amount mismatch detection
- Clean, organized output


In [45]:
# Import required libraries
import pandas as pd
import numpy as np
from datetime import datetime

print("📊 Libraries imported successfully!")


📊 Libraries imported successfully!


In [46]:
# Load data files
print("📂 Loading data files...")

WoodTrans = pd.read_excel("/Users/vijayaraghavandevaraj/Downloads/Wood - TransReport.xlsx")
WoodTax = pd.read_excel("/Users/vijayaraghavandevaraj/Downloads/Wood - Tax Report.xlsx")

print(f"   • Transaction Report: {len(WoodTrans):,} records")
print(f"   • Tax Report: {len(WoodTax):,} records")


📂 Loading data files...
   • Transaction Report: 1,913 records
   • Tax Report: 918 records


In [47]:
# Prepare and filter data for August 2025
print("🔧 Preparing data for August 2025 analysis...")

# Transaction data preparation
trans_pos_df = WoodTrans[WoodTrans['Source'] == 'pos'].copy()
trans_pos_df['Transaction Type'] = 'sale'
trans_pos_df['Module'] = np.where(
    trans_pos_df['Order ID'].astype(str).str.startswith('MEM'),
    'memberships', 
    trans_pos_df['Source']
)
trans_pos_module_df = trans_pos_df[trans_pos_df['Module'] == 'pos'].copy()
trans_pos_module_df['Transaction Date'] = pd.to_datetime(trans_pos_module_df['Transaction Date'])
trans_pos_module_df['Month'] = trans_pos_module_df['Transaction Date'].dt.to_period('M')

# Tax data preparation
tax_pos_df = WoodTax[WoodTax['Module Name'] == 'pos'].copy()
tax_pos_df['Date'] = pd.to_datetime(tax_pos_df['Date'])
tax_pos_df['Month'] = tax_pos_df['Date'].dt.to_period('M')

# Filter for August 2025
target_month = "2025-08"
trans_aug = trans_pos_module_df[trans_pos_module_df['Month'] == target_month]
tax_aug = tax_pos_df[tax_pos_df['Month'] == target_month]

print(f"   • August transactions: {len(trans_aug):,} records")
print(f"   • August tax records: {len(tax_aug):,} records")


🔧 Preparing data for August 2025 analysis...
   • August transactions: 699 records
   • August tax records: 693 records


In [48]:
# CRITICAL: Group transaction data by Order ID
print("🔄 Grouping transaction data by Order ID (CRITICAL STEP)...")

trans_aug_grouped = trans_aug.groupby('Order ID').agg({
    'Amount': 'sum',
    'Transaction Date': 'first',
    'Location': 'first',
    'Payment Type': 'first',
    'Payment Gateway': 'first'
}).reset_index()

print(f"   • Original transaction records: {len(trans_aug):,}")
print(f"   • Unique Order IDs (grouped): {len(trans_aug_grouped):,}")
print("   ✅ Now properly comparing unique orders vs tax records")


🔄 Grouping transaction data by Order ID (CRITICAL STEP)...
   • Original transaction records: 699
   • Unique Order IDs (grouped): 696
   ✅ Now properly comparing unique orders vs tax records


In [49]:
# Find missing records in tax report
print("🔍 Finding missing records in tax report...")

missing_in_tax = trans_aug_grouped[~trans_aug_grouped['Order ID'].isin(tax_aug['Order ID'])]

if len(missing_in_tax) > 0:
    print(f"   • Missing records: {len(missing_in_tax)}")
    print(f"   • Missing amount: ${missing_in_tax['Amount'].sum():.2f}")
    
    print(f"\n   📋 Missing Order IDs:")
    missing_details = missing_in_tax[['Order ID', 'Transaction Date', 'Amount', 'Location', 'Payment Type']].copy()
    missing_details = missing_details.sort_values('Transaction Date')
    
    for i, (_, row) in enumerate(missing_details.iterrows(), 1):
        print(f"      {i:2d}. {row['Order ID']} | ${row['Amount']:6.2f} | {row['Transaction Date'].strftime('%m/%d')} | {row['Payment Type']}")
else:
    print("   ✅ No missing records found!")


🔍 Finding missing records in tax report...
   • Missing records: 3
   • Missing amount: $399.99

   📋 Missing Order IDs:
       1. 1754594250742 | $ 14.54 | 08/07 | physicalCard
       2. 1756325127801 | $ 31.82 | 08/27 | physicalCard
       3. 1755643008336 | $353.63 | 08/31 | physicalCard


In [50]:
# Find amount mismatches between reports
print("💰 Finding amount mismatches...")

# Find common orders and merge for comparison
common_orders = trans_aug_grouped[trans_aug_grouped['Order ID'].isin(tax_aug['Order ID'])].copy()
common_tax_orders = tax_aug[tax_aug['Order ID'].isin(trans_aug_grouped['Order ID'])].copy()

merged_comparison = pd.merge(
    common_orders[['Order ID', 'Amount', 'Transaction Date', 'Location']], 
    common_tax_orders[['Order ID', 'Total Sum', 'Tip', 'Tax']], 
    on='Order ID', 
    how='inner'
)

# Calculate differences
merged_comparison['Amount_Diff'] = merged_comparison['Amount'] - merged_comparison['Total Sum']
merged_comparison['Abs_Diff'] = abs(merged_comparison['Amount_Diff'])

# Find significant mismatches
significant_mismatches = merged_comparison[merged_comparison['Abs_Diff'] > 0.01]

print(f"   • Total matching orders: {len(merged_comparison):,}")
print(f"   • Orders with amount differences: {len(significant_mismatches)}")

if len(significant_mismatches) > 0:
    print(f"   • Total discrepancy: ${merged_comparison['Amount_Diff'].sum():.2f}")
    
    print(f"\n   🔍 Amount Differences:")
    mismatch_details = significant_mismatches[['Order ID', 'Amount', 'Total Sum', 'Amount_Diff', 'Abs_Diff']].copy()
    mismatch_details = mismatch_details.sort_values('Abs_Diff', ascending=False)
    
    for i, (_, row) in enumerate(mismatch_details.iterrows(), 1):
        print(f"      {i:2d}. {row['Order ID']} | Trans: ${row['Amount']:6.2f} | Tax: ${row['Total Sum']:6.2f} | Diff: ${row['Amount_Diff']:+6.2f}")
else:
    print("   ✅ No significant amount mismatches found!")


💰 Finding amount mismatches...
   • Total matching orders: 693
   • Orders with amount differences: 0
   ✅ No significant amount mismatches found!


In [51]:
# TALLY SUMMARY FOR EASY REFERENCE

print("="*80)
print("TALLY SUMMARY - EASY REFERENCE")
print("="*80)

print("📊 OVERALL TALLY:")
print(f"   • Total records in trans_aug: {len(trans_aug)}")
print(f"   • Total records in tax_aug: {len(tax_aug)}")
print(f"   • Total matching records: {len(merged_comparison)}")
print(f"   • Records missing in tax_aug: {len(missing_in_tax)}")
print(f"   • Records with amount differences: {len(significant_mismatches)}")

print(f"\n💰 AMOUNT TALLY:")
print(f"   • Total transaction amount (August): ${trans_aug_grouped['Amount'].sum():.2f}")
print(f"   • Total tax report amount (August): ${tax_aug['Total Sum'].sum():.2f}")
print(f"   • Difference (Trans - Tax): ${trans_aug_grouped['Amount'].sum() - tax_aug['Total Sum'].sum():.2f}")
print(f"   • Total missing transaction value: ${missing_in_tax['Amount'].sum():.2f}")
print(f"   • Total amount discrepancy: ${merged_comparison['Amount_Diff'].sum():.2f}")

print(f"\n📋 MISSING RECORD TALLY:")
if len(missing_in_tax) > 0:
    for i, (idx, row) in enumerate(missing_in_tax.iterrows(), 1):
        print(f"   {i:2d}. Order ID: {row['Order ID']} | Amount: ${row['Amount']:6.2f} | Date: {row['Transaction Date'].strftime('%m/%d')} | Payment: {row['Payment Type']}")

print(f"\n🔍 AMOUNT DIFFERENCE TALLY:")
if len(significant_mismatches) > 0:
    for i, (idx, row) in enumerate(significant_mismatches.iterrows(), 1):
        print(f"   {i:2d}. Order ID: {row['Order ID']} | Trans: ${row['Amount']:6.2f} | Tax: ${row['Total Sum']:6.2f} | Diff: ${row['Amount_Diff']:+6.2f}")

print("\n" + "="*80)


TALLY SUMMARY - EASY REFERENCE
📊 OVERALL TALLY:
   • Total records in trans_aug: 699
   • Total records in tax_aug: 693
   • Total matching records: 693
   • Records missing in tax_aug: 3
   • Records with amount differences: 0

💰 AMOUNT TALLY:
   • Total transaction amount (August): $9841.10
   • Total tax report amount (August): $9441.11
   • Difference (Trans - Tax): $399.99
   • Total missing transaction value: $399.99
   • Total amount discrepancy: $-0.00

📋 MISSING RECORD TALLY:
    1. Order ID: 1754594250742 | Amount: $ 14.54 | Date: 08/07 | Payment: physicalCard
    2. Order ID: 1755643008336 | Amount: $353.63 | Date: 08/31 | Payment: physicalCard
    3. Order ID: 1756325127801 | Amount: $ 31.82 | Date: 08/27 | Payment: physicalCard

🔍 AMOUNT DIFFERENCE TALLY:

