# IS_WINNER Analysis

## Objective
Analyze the IS_WINNER flag in AUCTIONS_RESULTS and its relationships across the auction funnel.

## Research Questions
1. What is the distribution of IS_WINNER in bids?
2. How does IS_WINNER relate to RANKING, FINAL_BID, QUALITY, and PACING?
3. Do IS_WINNER=TRUE bids reliably map to IMPRESSIONS?
4. Do IS_WINNER=FALSE bids incorrectly appear in IMPRESSIONS (data quality check)?
5. What is the full funnel conversion from winning bids to purchases?
6. What are win rates by vendor, campaign, and product?
7. Are there temporal patterns in win rates?
8. What data quality issues exist (duplicate winners, orphaned records, etc.)?

## Data Period
October 11, 2025 snapshot

In [2]:
import pandas as pd
import numpy as np
from tqdm import tqdm
import warnings
from datetime import datetime
warnings.filterwarnings('ignore')

print("="*80)
print("IS_WINNER ANALYSIS")
print("="*80)
print(f"Analysis started: {datetime.now()}")
print()

print("Loading data with progress tracking...")
print("="*80)

auctions_results = pd.read_parquet('data/raw_auctions_results_20251011.parquet')
print(f"✓ Loaded AUCTIONS_RESULTS: {len(auctions_results):,} rows")

auctions_users = pd.read_parquet('data/raw_auctions_users_20251011.parquet')
print(f"✓ Loaded AUCTIONS_USERS: {len(auctions_users):,} rows")

impressions = pd.read_parquet('data/raw_impressions_20251011.parquet')
print(f"✓ Loaded IMPRESSIONS: {len(impressions):,} rows")

clicks = pd.read_parquet('data/raw_clicks_20251011.parquet')
print(f"✓ Loaded CLICKS: {len(clicks):,} rows")

purchases = pd.read_parquet('data/raw_purchases_20251011.parquet')
print(f"✓ Loaded PURCHASES: {len(purchases):,} rows")

catalog = pd.read_parquet('data/catalog_20251011.parquet')
print(f"✓ Loaded CATALOG: {len(catalog):,} products")

print()
print("Data loaded successfully.")
print("="*80)

IS_WINNER ANALYSIS
Analysis started: 2025-10-12 19:38:35.241271

Loading data with progress tracking...
✓ Loaded AUCTIONS_RESULTS: 18,838,670 rows
✓ Loaded AUCTIONS_USERS: 413,457 rows
✓ Loaded IMPRESSIONS: 533,146 rows
✓ Loaded CLICKS: 16,706 rows
✓ Loaded PURCHASES: 2,188 rows
✓ Loaded CATALOG: 2,007,695 products

Data loaded successfully.


## Section 1: IS_WINNER Distribution

In [3]:
print("="*80)
print("SECTION 1: IS_WINNER DISTRIBUTION IN AUCTIONS_RESULTS")
print("="*80)
print()

print("IS_WINNER value counts:")
winner_counts = auctions_results['IS_WINNER'].value_counts(dropna=False)
print(winner_counts)
print()

print("IS_WINNER proportions:")
winner_props = auctions_results['IS_WINNER'].value_counts(normalize=True, dropna=False)
print(winner_props)
print()

print("NULL/missing IS_WINNER values:")
null_count = auctions_results['IS_WINNER'].isnull().sum()
print(f"Count: {null_count:,}")
print(f"Proportion: {null_count / len(auctions_results):.4%}")
print()

print("SUMMARY:")
print(f"Total bids: {len(auctions_results):,}")
print(f"Winners (TRUE): {(auctions_results['IS_WINNER'] == True).sum():,} ({(auctions_results['IS_WINNER'] == True).mean():.4%})")
print(f"Losers (FALSE): {(auctions_results['IS_WINNER'] == False).sum():,} ({(auctions_results['IS_WINNER'] == False).mean():.4%})")
print(f"NULL: {null_count:,}")
print()
print("="*80)

SECTION 1: IS_WINNER DISTRIBUTION IN AUCTIONS_RESULTS

IS_WINNER value counts:
IS_WINNER
True     15509104
False     3329566
Name: count, dtype: int64

IS_WINNER proportions:
IS_WINNER
True     0.823259
False    0.176741
Name: proportion, dtype: float64

NULL/missing IS_WINNER values:
Count: 0
Proportion: 0.0000%

SUMMARY:
Total bids: 18,838,670
Winners (TRUE): 15,509,104 (82.3259%)
Losers (FALSE): 3,329,566 (17.6741%)
NULL: 0



## Section 2: IS_WINNER vs RANKING

In [4]:
print("="*80)
print("SECTION 2: IS_WINNER BY RANKING")
print("="*80)
print()
print("Expected: IS_WINNER=TRUE should correspond to low RANKING values (1=best)")
print()

print("RANKING statistics by IS_WINNER:")
ranking_stats = auctions_results.groupby('IS_WINNER')['RANKING'].describe()
print(ranking_stats)
print()

print("IS_WINNER rate by RANKING value (top 20 ranks):")
ranking_winner = auctions_results[auctions_results['RANKING'] <= 20].groupby('RANKING').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
})
ranking_winner.columns = ['winners', 'total_bids', 'win_rate']
print(ranking_winner)
print()

print("Bids per auction by IS_WINNER:")
bids_per_auction = auctions_results.groupby(['AUCTION_ID', 'IS_WINNER']).size().reset_index(name='bid_count')
bids_summary = bids_per_auction.groupby('IS_WINNER')['bid_count'].describe()
print(bids_summary)
print()

print("Winners per auction distribution:")
winners_per_auction = auctions_results[auctions_results['IS_WINNER'] == True].groupby('AUCTION_ID').size()
print("Statistics:")
print(winners_per_auction.describe())
print()
print("Value counts (top 20):")
print(winners_per_auction.value_counts().head(20))
print()

print("RANKING=1 vs IS_WINNER alignment:")
rank1_bids = auctions_results[auctions_results['RANKING'] == 1]
rank1_winners = rank1_bids['IS_WINNER'].sum()
print(f"Total RANKING=1 bids: {len(rank1_bids):,}")
print(f"RANKING=1 bids that are winners: {rank1_winners:,} ({rank1_winners/len(rank1_bids):.4%})")
print(f"RANKING=1 bids that are NOT winners: {len(rank1_bids) - rank1_winners:,}")
print()
print("="*80)

SECTION 2: IS_WINNER BY RANKING

Expected: IS_WINNER=TRUE should correspond to low RANKING values (1=best)

RANKING statistics by IS_WINNER:
                count       mean        std  min   25%   50%   75%   max
IS_WINNER                                                               
False       3329566.0  46.882975  10.621021  1.0  42.0  50.0  54.0  74.0
True       15509104.0  21.909778  13.223720  1.0  11.0  21.0  32.0  64.0

IS_WINNER rate by RANKING value (top 20 ranks):
         winners  total_bids  win_rate
RANKING                               
1         408642      410373  0.995782
2         399757      402777  0.992502
3         392311      396057  0.990542
4         388309      392973  0.988132
5         385143      390713  0.985744
6         381869      388138  0.983849
7         379879      386811  0.982079
8         377389      385077  0.980035
9         375184      383497  0.978323
10        373369      382310  0.976613
11        371914      380631  0.977099
12        3

## Section 3: IS_WINNER vs Bid Characteristics

In [5]:
print("="*80)
print("SECTION 3: IS_WINNER BY BID CHARACTERISTICS")
print("="*80)
print()

print("FINAL_BID statistics by IS_WINNER:")
bid_stats = auctions_results.groupby('IS_WINNER')['FINAL_BID'].describe()
print(bid_stats)
print()

if 'QUALITY' in auctions_results.columns:
    print("QUALITY statistics by IS_WINNER:")
    quality_stats = auctions_results.groupby('IS_WINNER')['QUALITY'].describe()
    print(quality_stats)
    print()
    
    print("QUALITY non-null counts:")
    quality_counts = auctions_results.groupby('IS_WINNER')['QUALITY'].apply(lambda x: x.notna().sum())
    print(quality_counts)
    print()

if 'PACING' in auctions_results.columns:
    print("PACING statistics by IS_WINNER:")
    pacing_stats = auctions_results.groupby('IS_WINNER')['PACING'].describe()
    print(pacing_stats)
    print()
    
    print("PACING non-null counts:")
    pacing_counts = auctions_results.groupby('IS_WINNER')['PACING'].apply(lambda x: x.notna().sum())
    print(pacing_counts)
    print()

if 'PRICE' in auctions_results.columns:
    print("PRICE statistics by IS_WINNER:")
    price_stats = auctions_results.groupby('IS_WINNER')['PRICE'].describe()
    print(price_stats)
    print()

if 'CONVERSION_RATE' in auctions_results.columns:
    print("CONVERSION_RATE statistics by IS_WINNER:")
    cvr_stats = auctions_results.groupby('IS_WINNER')['CONVERSION_RATE'].describe()
    print(cvr_stats)
    print()

print("="*80)

SECTION 3: IS_WINNER BY BID CHARACTERISTICS

FINAL_BID statistics by IS_WINNER:
                count       mean        std  min  25%  50%   75%    max
IS_WINNER                                                              
False       3329566.0   8.375678  10.822076  0.0  2.0  5.0  11.0  100.0
True       15509104.0  12.537361  15.008167  1.0  3.0  7.0  17.0  100.0

QUALITY statistics by IS_WINNER:
                count      mean       std       min      25%       50%  \
IS_WINNER                                                                
False       3329566.0  0.035637  0.027196  0.000001  0.01137  0.034112   
True       15509104.0  0.036890  0.028282  0.000001  0.01389  0.032409   

                75%       max  
IS_WINNER                      
False      0.052003  0.706957  
True       0.053439  0.847945  

QUALITY non-null counts:
IS_WINNER
False     3329566
True     15509104
Name: QUALITY, dtype: int64

PACING statistics by IS_WINNER:
                count      mean       st

## Section 4: IS_WINNER to IMPRESSIONS Linkage

In [6]:
print("="*80)
print("SECTION 4: IS_WINNER TO IMPRESSIONS LINKAGE")
print("="*80)
print()
print("Expected: IS_WINNER=TRUE bids should have corresponding IMPRESSIONS records")
print("Join key: (AUCTION_ID, PRODUCT_ID) as per data documentation")
print()

winners = auctions_results[auctions_results['IS_WINNER'] == True].copy()
print(f"Total IS_WINNER=TRUE bids: {len(winners):,}")
print()

print("Creating composite keys...")
winners['composite_key'] = winners['AUCTION_ID'].astype(str) + '||' + winners['PRODUCT_ID'].astype(str)
impressions['composite_key'] = impressions['AUCTION_ID'].astype(str) + '||' + impressions['PRODUCT_ID'].astype(str)
print()

print("Unique composite keys:")
print(f"Winners: {winners['composite_key'].nunique():,}")
print(f"Impressions: {impressions['composite_key'].nunique():,}")
print()

print("Join analysis:")
winners_with_impressions = winners['composite_key'].isin(impressions['composite_key'])
print(f"Winners WITH matching impressions: {winners_with_impressions.sum():,} ({winners_with_impressions.mean():.4%})")
print(f"Winners WITHOUT matching impressions: {(~winners_with_impressions).sum():,} ({(~winners_with_impressions).mean():.4%})")
print()

impressions_with_winners = impressions['composite_key'].isin(winners['composite_key'])
print(f"Impressions WITH matching winners: {impressions_with_winners.sum():,} ({impressions_with_winners.mean():.4%})")
print(f"Impressions WITHOUT matching winners: {(~impressions_with_winners).sum():,} ({(~impressions_with_winners).mean():.4%})")
print()

print("="*80)

SECTION 4: IS_WINNER TO IMPRESSIONS LINKAGE

Expected: IS_WINNER=TRUE bids should have corresponding IMPRESSIONS records
Join key: (AUCTION_ID, PRODUCT_ID) as per data documentation

Total IS_WINNER=TRUE bids: 15,509,104

Creating composite keys...

Unique composite keys:
Winners: 15,508,376
Impressions: 529,184

Join analysis:
Winners WITH matching impressions: 529,178 (3.4120%)
Winners WITHOUT matching impressions: 14,979,926 (96.5880%)

Impressions WITH matching winners: 533,098 (99.9910%)
Impressions WITHOUT matching winners: 48 (0.0090%)



## Section 5: Data Quality - Losers with Impressions

In [7]:
print("="*80)
print("SECTION 5: LOSERS (IS_WINNER=FALSE) TO IMPRESSIONS")
print("="*80)
print()
print("Expected: IS_WINNER=FALSE bids should NOT have impressions")
print()

losers = auctions_results[auctions_results['IS_WINNER'] == False].copy()
print(f"Total IS_WINNER=FALSE bids: {len(losers):,}")
print()

losers['composite_key'] = losers['AUCTION_ID'].astype(str) + '||' + losers['PRODUCT_ID'].astype(str)

losers_with_impressions = losers['composite_key'].isin(impressions['composite_key'])
print(f"Losers WITH impressions (unexpected): {losers_with_impressions.sum():,} ({losers_with_impressions.mean():.4%})")
print(f"Losers WITHOUT impressions (expected): {(~losers_with_impressions).sum():,} ({(~losers_with_impressions).mean():.4%})")
print()

if losers_with_impressions.sum() > 0:
    print("ANOMALY DETECTED: Some losing bids have impressions")
    print("Sample of losers with impressions:")
    anomaly_sample = losers[losers_with_impressions].head(10)[['AUCTION_ID', 'PRODUCT_ID', 'RANKING', 'IS_WINNER', 'FINAL_BID']]
    print(anomaly_sample)
    print()
else:
    print("✓ DATA QUALITY OK: No losing bids have impressions")
    print()

print("="*80)

SECTION 5: LOSERS (IS_WINNER=FALSE) TO IMPRESSIONS

Expected: IS_WINNER=FALSE bids should NOT have impressions

Total IS_WINNER=FALSE bids: 3,329,566

Losers WITH impressions (unexpected): 0 (0.0000%)
Losers WITHOUT impressions (expected): 3,329,566 (100.0000%)

✓ DATA QUALITY OK: No losing bids have impressions



## Section 6: Full Funnel Flow

In [8]:
print("="*80)
print("SECTION 6: FULL FUNNEL FLOW FROM IS_WINNER")
print("="*80)
print()
print("Tracing: AUCTIONS_RESULTS (IS_WINNER) -> IMPRESSIONS -> CLICKS -> PURCHASES")
print()

print("Creating composite keys for clicks...")
clicks['composite_key'] = clicks['AUCTION_ID'].astype(str) + '||' + clicks['PRODUCT_ID'].astype(str)
print()

print("Funnel metrics:")
print(f"1. Bids (IS_WINNER=TRUE): {len(winners):,}")
print(f"2. Impressions: {len(impressions):,}")
print(f"3. Clicks: {len(clicks):,}")
print(f"4. Purchases: {len(purchases):,}")
print()

print("Conversion rates:")
print(f"Winners -> Impressions: {winners_with_impressions.sum() / len(winners):.4%}")

impressions_with_clicks = impressions['composite_key'].isin(clicks['composite_key'])
print(f"Impressions -> Clicks: {impressions_with_clicks.sum() / len(impressions):.4%}")
print()

print("Detailed click analysis:")
clicks_from_winners = clicks['composite_key'].isin(winners['composite_key'])
print(f"Clicks from IS_WINNER=TRUE bids: {clicks_from_winners.sum():,} ({clicks_from_winners.mean():.4%})")
print(f"Clicks from IS_WINNER=FALSE bids: {(~clicks_from_winners).sum():,} ({(~clicks_from_winners).mean():.4%})")
print()

print("Funnel summary:")
print(f"  Winning bids: {len(winners):,}")
print(f"  -> Generated impressions: {winners_with_impressions.sum():,} ({winners_with_impressions.sum()/len(winners):.2%})")
print(f"  -> Generated clicks: {clicks_from_winners.sum():,} ({clicks_from_winners.sum()/len(winners):.2%})")
print()
print("="*80)

SECTION 6: FULL FUNNEL FLOW FROM IS_WINNER

Tracing: AUCTIONS_RESULTS (IS_WINNER) -> IMPRESSIONS -> CLICKS -> PURCHASES

Creating composite keys for clicks...

Funnel metrics:
1. Bids (IS_WINNER=TRUE): 15,509,104
2. Impressions: 533,146
3. Clicks: 16,706
4. Purchases: 2,188

Conversion rates:
Winners -> Impressions: 3.4120%
Impressions -> Clicks: 2.8035%

Detailed click analysis:
Clicks from IS_WINNER=TRUE bids: 16,704 (99.9880%)
Clicks from IS_WINNER=FALSE bids: 2 (0.0120%)

Funnel summary:
  Winning bids: 15,509,104
  -> Generated impressions: 529,178 (3.41%)
  -> Generated clicks: 16,704 (0.11%)



## Section 7: Auction-Level Analysis

In [9]:
print("="*80)
print("SECTION 7: AUCTION-LEVEL ANALYSIS")
print("="*80)
print()
print("Analyzing auctions by winner characteristics")
print()

print("Auctions with at least one winner:")
auctions_with_winners = winners['AUCTION_ID'].unique()
print(f"Count: {len(auctions_with_winners):,}")
print(f"Proportion: {len(auctions_with_winners) / auctions_users['AUCTION_ID'].nunique():.4%}")
print()

print("Auctions without any winners:")
all_auctions = auctions_users['AUCTION_ID'].unique()
auctions_no_winners = set(all_auctions) - set(auctions_with_winners)
print(f"Count: {len(auctions_no_winners):,}")
print(f"Proportion: {len(auctions_no_winners) / len(all_auctions):.4%}")
print()

print("Bid characteristics for auctions with vs without winners:")
auctions_results['has_winner'] = auctions_results['AUCTION_ID'].isin(auctions_with_winners)
print("\nBid count by auction winner status:")
bid_counts = auctions_results.groupby(['AUCTION_ID', 'has_winner']).size().reset_index(name='bid_count')
print(bid_counts.groupby('has_winner')['bid_count'].describe())
print()

print("Winners per auction (for auctions with winners):")
winners_per_auction_df = winners.groupby('AUCTION_ID').size().reset_index(name='n_winners')
print(winners_per_auction_df['n_winners'].describe())
print()
print("="*80)

SECTION 7: AUCTION-LEVEL ANALYSIS

Analyzing auctions by winner characteristics

Auctions with at least one winner:
Count: 408,634
Proportion: 98.8376%

Auctions without any winners:
Count: 4,806
Proportion: 1.1624%

Bid characteristics for auctions with vs without winners:

Bid count by auction winner status:
               count       mean        std  min   25%   50%   75%    max
has_winner                                                              
False         1731.0   5.745234   4.160831  1.0   1.0   5.0  10.0   10.0
True        408634.0  46.077235  16.059581  1.0  41.0  50.0  58.0  198.0

Winners per auction (for auctions with winners):
count    408634.000000
mean         37.953533
std          13.595209
min           1.000000
25%          34.000000
50%          40.000000
75%          48.000000
max         168.000000
Name: n_winners, dtype: float64



## Section 8: Win Rates by Vendor/Campaign/Product

In [10]:
print("="*80)
print("SECTION 8: VENDOR/CAMPAIGN/PRODUCT LEVEL PATTERNS")
print("="*80)
print()

print("Win rates by vendor:")
vendor_stats = auctions_results.groupby('VENDOR_ID').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
}).reset_index()
vendor_stats.columns = ['VENDOR_ID', 'wins', 'total_bids', 'win_rate']
vendor_stats = vendor_stats.sort_values('win_rate', ascending=False)
print("Top 20 vendors by win rate (min 100 bids):")
vendor_stats_filtered = vendor_stats[vendor_stats['total_bids'] >= 100]
print(vendor_stats_filtered.head(20).to_string(index=False))
print()
print("Overall vendor win rate statistics:")
print(vendor_stats['win_rate'].describe())
print()

print("Win rates by campaign:")
campaign_stats = auctions_results.groupby('CAMPAIGN_ID').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
}).reset_index()
campaign_stats.columns = ['CAMPAIGN_ID', 'wins', 'total_bids', 'win_rate']
print("Campaign win rate statistics:")
print(campaign_stats['win_rate'].describe())
print()
print("Top 20 campaigns by total wins:")
print(campaign_stats.sort_values('wins', ascending=False).head(20).to_string(index=False))
print()

print("Win rates by product:")
product_stats = auctions_results.groupby('PRODUCT_ID').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
}).reset_index()
product_stats.columns = ['PRODUCT_ID', 'wins', 'total_bids', 'win_rate']
print("Product win rate statistics:")
print(product_stats['win_rate'].describe())
print()
print("Products with 100% win rate (top 20 by bid count):")
perfect_winners = product_stats[product_stats['win_rate'] == 1.0].sort_values('total_bids', ascending=False)
print(perfect_winners.head(20).to_string(index=False))
print()
print("="*80)

SECTION 8: VENDOR/CAMPAIGN/PRODUCT LEVEL PATTERNS

Win rates by vendor:
Top 20 vendors by win rate (min 100 bids):
                       VENDOR_ID  wins  total_bids  win_rate
019736f3dbe97263b48e047eb5c6ea67   273         273       1.0
0199c1f8f5a27e71b8bc151b75b226f5   167         167       1.0
0199c4434a9b7ad097364dad595dda7b   247         247       1.0
0199c117112c7c33a584a6b38db4cad2   294         294       1.0
0199c10108c27672aad2ea92905bf945   135         135       1.0
0199c0fad9367cb2aed735019c051426   156         156       1.0
0198978e216a7b80b21d908f8e73f71e   107         107       1.0
01904119bc0d7a19a8f0e474248632e1   194         194       1.0
0199c4ce935c77f2a0bde922ab8ad8d1   111         111       1.0
0199c560855f76638d3a8b546ab1c9bc   241         241       1.0
0199c53b25f8761281fee8bbe22cf6ce   150         150       1.0
0199bc6528747dd39cecf7cae15fd308   110         110       1.0
0193a3ac66037a829643d76e86e6dc98   108         108       1.0
0199bc4b7cd174808dfe3d74dfee894

## Section 9: Temporal Patterns

In [11]:
print("="*80)
print("SECTION 9: TIME-BASED PATTERNS")
print("="*80)
print()

print("Merging with auction timestamps...")
auctions_full = auctions_results.merge(
    auctions_users[['AUCTION_ID', 'CREATED_AT']],
    on='AUCTION_ID',
    how='left'
)
print(f"Merged {len(auctions_full):,} records")
print()

print("Converting timestamps...")
auctions_full['created_at'] = pd.to_datetime(auctions_full['CREATED_AT'])

print("Date range:")
print(f"Min: {auctions_full['created_at'].min()}")
print(f"Max: {auctions_full['created_at'].max()}")
print()

auctions_full['date'] = auctions_full['created_at'].dt.date
auctions_full['hour'] = auctions_full['created_at'].dt.hour
auctions_full['day_of_week'] = auctions_full['created_at'].dt.dayofweek

print("Win rate by hour of day:")
hourly_stats = auctions_full.groupby('hour').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
})
hourly_stats.columns = ['wins', 'total_bids', 'win_rate']
print(hourly_stats)
print()

print("Win rate by day of week (0=Monday, 6=Sunday):")
dow_stats = auctions_full.groupby('day_of_week').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
})
dow_stats.columns = ['wins', 'total_bids', 'win_rate']
print(dow_stats)
print()

print("Win rate by date (first 10 days):")
daily_stats = auctions_full.groupby('date').agg({
    'IS_WINNER': ['sum', 'count', 'mean']
})
daily_stats.columns = ['wins', 'total_bids', 'win_rate']
print(daily_stats.head(10))
print()
print("="*80)

SECTION 9: TIME-BASED PATTERNS

Merging with auction timestamps...
Merged 18,840,598 records

Converting timestamps...


KeyError: 'CREATED_AT'

## Section 10: Data Quality Checks

In [None]:
print("="*80)
print("SECTION 10: DATA QUALITY AND INTEGRITY CHECKS")
print("="*80)
print()

print("Checking for duplicate auction/product combinations in winners:")
winner_dupes = winners.groupby(['AUCTION_ID', 'PRODUCT_ID']).size()
duplicate_winners = winner_dupes[winner_dupes > 1]
print(f"Duplicate winner entries: {len(duplicate_winners):,}")
if len(duplicate_winners) > 0:
    print("Sample duplicates:")
    print(duplicate_winners.head(20))
    print()
else:
    print("✓ No duplicate winners for same auction/product")
    print()

print("Checking for winners without AUCTION_ID in auctions_users:")
orphan_winners = ~winners['AUCTION_ID'].isin(auctions_users['AUCTION_ID'])
print(f"Winners without auction record: {orphan_winners.sum():,} ({orphan_winners.mean():.4%})")
print()

print("Checking for losers without AUCTION_ID in auctions_users:")
orphan_losers = ~losers['AUCTION_ID'].isin(auctions_users['AUCTION_ID'])
print(f"Losers without auction record: {orphan_losers.sum():,} ({orphan_losers.mean():.4%})")
print()

print("Checking for auctions in impressions not in auctions_users:")
orphan_impressions = ~impressions['AUCTION_ID'].isin(auctions_users['AUCTION_ID'])
print(f"Impressions without auction record: {orphan_impressions.sum():,} ({orphan_impressions.mean():.4%})")
print()

print("Summary of data integrity:")
if len(duplicate_winners) == 0:
    print("  ✓ No duplicate winners")
else:
    print(f"  ⚠ {len(duplicate_winners):,} duplicate winner combinations")

if orphan_winners.sum() == 0:
    print("  ✓ All winners have auction records")
else:
    print(f"  ⚠ {orphan_winners.sum():,} orphaned winner records")

if losers_with_impressions.sum() == 0:
    print("  ✓ No losing bids have impressions")
else:
    print(f"  ⚠ {losers_with_impressions.sum():,} losing bids with impressions")

print()
print("="*80)

## Summary

In [None]:
print("="*80)
print("SUMMARY STATISTICS")
print("="*80)
print()

print("Total records processed:")
print(f"  Auction results: {len(auctions_results):,}")
print(f"  Auctions: {len(auctions_users):,}")
print(f"  Impressions: {len(impressions):,}")
print(f"  Clicks: {len(clicks):,}")
print(f"  Purchases: {len(purchases):,}")
print()

print("IS_WINNER distribution:")
print(f"  Winners (TRUE): {(auctions_results['IS_WINNER'] == True).sum():,} ({(auctions_results['IS_WINNER'] == True).mean():.4%})")
print(f"  Losers (FALSE): {(auctions_results['IS_WINNER'] == False).sum():,} ({(auctions_results['IS_WINNER'] == False).mean():.4%})")
print(f"  NULL: {auctions_results['IS_WINNER'].isnull().sum():,}")
print()

print("Funnel linkage quality:")
print(f"  Winners with impressions: {winners_with_impressions.sum() / len(winners):.4%}")
print(f"  Impressions with winner bids: {impressions_with_winners.sum() / len(impressions):.4%}")
print(f"  Impressions with clicks: {impressions_with_clicks.sum() / len(impressions):.4%}")
print()

print(f"Analysis completed: {datetime.now()}")
print("="*80)