# GameInsights Integration Tests

This notebook tests the library's **actual functionality** against real APIs, complementing the mocked unit tests in `tests/`.

## Test Layers

| Layer | Purpose | Speed |
|-------|---------|-------|
| Pytest | Unit testing, mocked responses | Fast (~10s) |
| **This Notebook** | Integration testing, real APIs | Slower (~1-2 min) |

## Prerequisites

```bash
# Install with notebook dependencies
poetry install --with notebooks

# Set API keys (optional - some tests will skip without them)
export STEAM_WEB_API_KEY="your_key_here"
export GAMALYTIC_API_KEY="your_key_here"
```

## Running Tests

1. Run cells sequentially (they build on each other)
2. Some cells may fail if APIs are down - this is expected
3. Toggle `RUN_EXPENSIVE_TESTS` for large batch operations

---

## Section 1: Setup & Configuration

In [1]:
# Cell 1: Imports & environment setup
import os
import time
from datetime import datetime
from typing import Any, Dict, List

import pandas as pd

# Import gameinsights
from gameinsights import Collector
from gameinsights.model.game_data import GameDataModel
from gameinsights.sources import (
    SteamStore,
    SteamSpy,
    Gamalytic,
    SteamCharts,
    SteamReview,
    SteamAchievements,
    HowLongToBeat,
    SteamUser,
)
from gameinsights.sources.base import BaseSource

# Check for API keys
STEAM_KEY = os.getenv("STEAM_WEB_API_KEY")
GAMALYTIC_KEY = os.getenv("GAMALYTIC_API_KEY")

print("=" * 50)
print("GameInsights Integration Tests")
print("=" * 50)
print(f"\nEnvironment Check:")
print(f"  Steam API Key: {'✓ Configured' if STEAM_KEY else '✗ Not set (limited tests)'}")
print(f"  Gamalytic API Key: {'✓ Configured' if GAMALYTIC_KEY else '✗ Not set (works without key)'}")
print(f"\nTest started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

GameInsights Integration Tests

Environment Check:
  Steam API Key: ✗ Not set (limited tests)
  Gamalytic API Key: ✗ Not set (works without key)

Test started at: 2026-01-20 03:06:39


In [2]:
# Cell 2: Test configuration

# Well-known games with reliable data for testing
TEST_APPIDS = {
    "Dota 2": "570",              # Free-to-play, very popular
    "Counter-Strike 2": "730",    # Popular multiplayer
    "Elden Ring": "1245620",      # Paid, recent AAA
    "Stardew Valley": "413150",   # Popular indie
    "Portal 2": "620",            # Classic single-player
    "Team Fortress 2": "440",     # Free, classic
    "Half-Life 2": "220",         # Classic single-player
}

# Default test game (Stardew Valley - paid game with complete data)
DEFAULT_GAME = "Stardew Valley"
DEFAULT_APPID = TEST_APPIDS[DEFAULT_GAME]

# Test user (for SteamUser tests - needs to be a real Steam ID)
# Using Valve employee (public profile)
TEST_STEAM_ID = "76561197974192246"  # Gabe Newell's Steam ID (public profile)

# Toggle for expensive tests (large batch operations)
RUN_EXPENSIVE_TESTS = False

# Number of games for batch tests
BATCH_SIZE_SMALL = 5
BATCH_SIZE_LARGE = 20 if RUN_EXPENSIVE_TESTS else 5

print(f"\nConfiguration:")
print(f"  Test games: {len(TEST_APPIDS)}")
print(f"  Default game: {DEFAULT_GAME} ({DEFAULT_APPID})")
print(f"  Expensive tests: {RUN_EXPENSIVE_TESTS}")
print(f"  Small batch size: {BATCH_SIZE_SMALL}")
print(f"  Large batch size: {BATCH_SIZE_LARGE}")
print(f"\nTest games:")
for name, appid in list(TEST_APPIDS.items())[:5]:
    print(f"  - {name} ({appid})")
if len(TEST_APPIDS) > 5:
    print(f"  ... and {len(TEST_APPIDS) - 5} more")


Configuration:
  Test games: 7
  Default game: Stardew Valley (413150)
  Expensive tests: False
  Small batch size: 5
  Large batch size: 5

Test games:
  - Dota 2 (570)
  - Counter-Strike 2 (730)
  - Elden Ring (1245620)
  - Stardew Valley (413150)
  - Portal 2 (620)
  ... and 2 more


In [3]:
# Cell 3: Helper functions

def time_it(func, *args, **kwargs):
    """Time a function execution and return elapsed time and result."""
    start = time.time()
    result = func(*args, **kwargs)
    elapsed = time.time() - start
    return elapsed, result


def validate_game_data(data: Dict[str, Any], appid: str) -> List[str]:
    """Validate game data and return list of issues."""
    issues = []
    
    # Required fields
    required_fields = ["steam_appid", "name"]
    for field in required_fields:
        if field not in data or data[field] is None:
            issues.append(f"Missing required field: {field}")
    
    # Validate appid matches
    if "steam_appid" in data and str(data["steam_appid"]) != appid:
        issues.append(f"AppID mismatch: expected {appid}, got {data.get('steam_appid')}")
    
    return issues


def print_test_result(test_name: str, passed: bool, details: str = ""):
    """Print a formatted test result."""
    status = "✓ PASS" if passed else "✗ FAIL"
    print(f"{status}: {test_name}")
    if details:
        print(f"       {details}")


def display_all_fields(data: Dict[str, Any], title: str = "Fetched Data"):
    """Display all fields from fetched data."""
    print(f"\n{title}")
    print("-" * 50)
    for key, value in data.items():
        if value is not None:
            value_str = str(value)
            if len(value_str) > 80:
                value_str = value_str[:77] + "..."
            print(f"  {key}: {value_str}")
        else:
            print(f"  {key}: (null/empty)")


def display_game_summary(game: Dict[str, Any]) -> None:
    """Display a summary of game data."""
    print(f"\n  Game: {game.get('name', 'Unknown')}")
    print(f"  {'-' * 40}")
    
    summary_fields = [
        ('steam_appid', 'AppID'),
        ('developers', 'Developers'),
        ('price_final', 'Price'),
        ('owners', 'Owners'),
        ('ccu', 'Current Players'),
        ('comp_main', 'Main Story Time'),
        ('positive_reviews', 'Positive Reviews'),
    ]
    
    for field, label in summary_fields:
        value = game.get(field)
        if value is not None:
            print(f"    {label}: {value}")


# Test results tracking
test_results = []

def record_test(test_name: str, passed: bool, source: str, duration: float, details: str = ""):
    """Record a test result."""
    test_results.append({
        "test_name": test_name,
        "passed": passed,
        "source": source,
        "duration_seconds": round(duration, 2),
        "details": details,
        "timestamp": datetime.now().isoformat()
    })

print("\nHelper functions loaded successfully.")


Helper functions loaded successfully.


---

## Section 2: Single Source Tests

Tests each data source individually to ensure they can fetch and parse real data.

In [4]:
# Cell 4: SteamStore - Game metadata
print("\n" + "="*50)
print("Testing: SteamStore")
print("="*50)

appid = DEFAULT_APPID
store = SteamStore()

try:
    elapsed, result = time_it(store.fetch, appid, verbose=False)
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Fetched data for {data.get('name')} ({appid}) in {elapsed:.2f}s")
        
        issues = validate_game_data(data, appid)
        
        if issues:
            print_test_result("SteamStore fetch", False, "; ".join(issues))
            record_test("SteamStore fetch", False, "SteamStore", elapsed, "; ".join(issues))
        else:
            print_test_result("SteamStore fetch", True)
            record_test("SteamStore fetch", True, "SteamStore", elapsed)
            display_all_fields(data, "All fields from SteamStore:")
    else:
        print_test_result("SteamStore fetch", False, result.get("error", "Unknown error"))
        record_test("SteamStore fetch", False, "SteamStore", elapsed, result.get("error"))
except Exception as e:
    print_test_result("SteamStore fetch", False, f"Exception: {str(e)}")
    record_test("SteamStore fetch", False, "SteamStore", 0, str(e))

# Cleanup
BaseSource.close_session()



Testing: SteamStore
✓ Fetched data for Stardew Valley (413150) in 0.41s
✓ PASS: SteamStore fetch

All fields from SteamStore:
--------------------------------------------------
  steam_appid: 413150
  name: Stardew Valley
  type: game
  is_coming_soon: False
  release_date: Feb 26, 2016
  is_free: False
  price_currency: USD
  price_initial: 14.99
  price_final: 14.99
  developers: ['ConcernedApe']
  publishers: ['ConcernedApe']
  platforms: ['windows', 'mac', 'linux']
  categories: ['Single-player', 'Multi-player', 'Co-op', 'Online Co-op', 'LAN Co-op', 'Shar...
  genres: ['Indie', 'RPG', 'Simulation']
  metacritic_score: 89
  recommendations: 831196
  achievements: 49
  content_rating: [{'rating_type': 'esrb', 'rating': 'e10'}, {'rating_type': 'pegi', 'rating': ...


In [5]:
# Cell 5: SteamSpy - Ownership and review data
print("\n" + "="*50)
print("Testing: SteamSpy")
print("="*50)

appid = DEFAULT_APPID
spy = SteamSpy()

try:
    elapsed, result = time_it(spy.fetch, appid, verbose=False)
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Fetched data for {data.get('name')} ({appid}) in {elapsed:.2f}s")
        
        issues = validate_game_data(data, appid)
        
        if issues:
            print_test_result("SteamSpy fetch", False, "; ".join(issues))
            record_test("SteamSpy fetch", False, "SteamSpy", elapsed, "; ".join(issues))
        else:
            print_test_result("SteamSpy fetch", True)
            record_test("SteamSpy fetch", True, "SteamSpy", elapsed)
            display_all_fields(data, "All fields from SteamSpy:")
    else:
        print_test_result("SteamSpy fetch", False, result.get("error", "Unknown error"))
        record_test("SteamSpy fetch", False, "SteamSpy", elapsed, result.get("error"))
except Exception as e:
    print_test_result("SteamSpy fetch", False, f"Exception: {str(e)}")
    record_test("SteamSpy fetch", False, "SteamSpy", 0, str(e))

BaseSource.close_session()



Testing: SteamSpy
✓ Fetched data for Stardew Valley (413150) in 0.16s
✓ PASS: SteamSpy fetch

All fields from SteamSpy:
--------------------------------------------------
  steam_appid: 413150
  name: Stardew Valley
  developers: ConcernedApe
  publishers: ConcernedApe
  positive_reviews: 872384
  negative_reviews: 13811
  owners: 20,000,000 .. 50,000,000
  average_forever: 4628
  average_2weeks: 626
  median_forever: 1698
  median_2weeks: 243
  price: 1499
  initial_price: 1499
  discount: 0
  ccu: 50662
  languages: English, German, Spanish - Spain, Japanese, Portuguese - Brazil, Russian, Sim...
  genres: Indie, RPG, Simulation
  tags: ['Farming Sim', 'Pixel Graphics', 'Multiplayer', 'Life Sim', 'RPG', 'Relaxing...


In [6]:
# Cell 6: Gamalytic - Sales data
print("\n" + "="*50)
print("Testing: Gamalytic")
print("="*50)

appid = DEFAULT_APPID
gamalytic = Gamalytic(api_key=GAMALYTIC_KEY) if GAMALYTIC_KEY else Gamalytic()

try:
    elapsed, result = time_it(gamalytic.fetch, appid, verbose=False)
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Fetched data for {data.get('name')} ({appid}) in {elapsed:.2f}s")
        
        issues = validate_game_data(data, appid)
        
        if issues:
            print_test_result("Gamalytic fetch", False, "; ".join(issues))
            record_test("Gamalytic fetch", False, "Gamalytic", elapsed, "; ".join(issues))
        else:
            print_test_result("Gamalytic fetch", True)
            record_test("Gamalytic fetch", True, "Gamalytic", elapsed)
            display_all_fields(data, "All fields from Gamalytic:")
    else:
        print_test_result("Gamalytic fetch", False, result.get("error", "Unknown error"))
        record_test("Gamalytic fetch", False, "Gamalytic", elapsed, result.get("error"))
except Exception as e:
    print_test_result("Gamalytic fetch", False, f"Exception: {str(e)}")
    record_test("Gamalytic fetch", False, "Gamalytic", 0, str(e))

BaseSource.close_session()



Testing: Gamalytic
✓ Fetched data for Stardew Valley (413150) in 1.11s
✓ PASS: Gamalytic fetch

All fields from Gamalytic:
--------------------------------------------------
  steam_appid: 413150
  name: Stardew Valley
  price: 14.99
  reviews: 968487
  reviews_steam: 830939
  followers: 1029397
  average_playtime_h: 65.53708629132689
  review_score: 98
  tags: ['Farming Sim', 'Pixel Graphics', 'Multiplayer', 'Life Sim', 'RPG', 'Relaxing...
  genres: ['Indie', 'RPG', 'Simulation']
  features: ['Single-player', 'Online Co-op', 'LAN Co-op', 'Shared/Split Screen Co-op', '...
  languages: ['English', 'German', 'Spanish - Spain', 'Japanese', 'Portuguese - Brazil', '...
  developers: ['ConcernedApe']
  publishers: ['ConcernedApe']
  release_date: 1456462800000
  first_release_date: 1456462800000
  unreleased: False
  early_access: False
  copies_sold: 30131536
  estimated_revenue: 280277328
  players: 32434788
  owners: 37055718


In [7]:
# Cell 7: SteamCharts - Player count from HTML
print("\n" + "="*50)
print("Testing: SteamCharts")
print("="*50)

appid = DEFAULT_APPID
charts = SteamCharts()

try:
    elapsed, result = time_it(charts.fetch, appid, verbose=False)
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Fetched data for {data.get('name')} ({appid}) in {elapsed:.2f}s")
        
        # SteamCharts returns appid as integer, convert for comparison
        data_appid = str(data.get('steam_appid', data.get('appid', '')))
        if data_appid != appid:
            print_test_result("SteamCharts fetch", False, f"AppID mismatch: {data_appid} vs {appid}")
            record_test("SteamCharts fetch", False, "SteamCharts", elapsed, f"AppID mismatch")
        else:
            print_test_result("SteamCharts fetch", True)
            record_test("SteamCharts fetch", True, "SteamCharts", elapsed)
            display_all_fields(data, "All fields from SteamCharts:")
    else:
        print_test_result("SteamCharts fetch", False, result.get("error", "Unknown error"))
        record_test("SteamCharts fetch", False, "SteamCharts", elapsed, result.get("error"))
except Exception as e:
    print_test_result("SteamCharts fetch", False, f"Exception: {str(e)}")
    record_test("SteamCharts fetch", False, "SteamCharts", 0, str(e))

BaseSource.close_session()



Testing: SteamCharts
✓ Fetched data for Stardew Valley (413150) in 0.54s
✓ PASS: SteamCharts fetch

All fields from SteamCharts:
--------------------------------------------------
  steam_appid: 413150
  name: Stardew Valley
  active_player_24h: 116146
  peak_active_player_all_time: 236614
  monthly_active_player: [{'month': '2025-12', 'average_players': 58487.3, 'gain': 4498.66, 'percentag...


In [8]:
# Cell 8: SteamReview - User reviews with pagination
print("\n" + "="*50)
print("Testing: SteamReview")
print("="*50)

appid = TEST_APPIDS["Portal 2"]  # Has good reviews
review = SteamReview()

try:
    # Test summary mode
    elapsed, result = time_it(
        review.fetch, 
        steam_appid=appid, 
        verbose=False,
        filter="recent",
        mode="summary"
    )
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Fetched review summary for {appid} in {elapsed:.2f}s")
        
        # Check for expected fields
        expected_fields = ["total_positive", "total_negative", "total_reviews", "review_score"]
        missing = [f for f in expected_fields if f not in data]
        
        if missing:
            print_test_result("SteamReview summary", False, f"Missing: {missing}")
            record_test("SteamReview summary", False, "SteamReview", elapsed, f"Missing: {missing}")
        else:
            print_test_result("SteamReview summary", True)
            record_test("SteamReview summary", True, "SteamReview", elapsed)
            print(f"  - Total reviews: {data.get('total_reviews', 0):,}")
            print(f"  - Positive: {data.get('total_positive', 0):,}")
            print(f"  - Negative: {data.get('total_negative', 0):,}")
            print(f"  - Review score: {data.get('review_score_desc', 'N/A')}")
    else:
        print_test_result("SteamReview summary", False, result.get("error", "Unknown error"))
        record_test("SteamReview summary", False, "SteamReview", elapsed, result.get("error"))
except Exception as e:
    print_test_result("SteamReview summary", False, f"Exception: {str(e)[:50]}...")
    record_test("SteamReview summary", False, "SteamReview", 0, str(e))

BaseSource.close_session()


Testing: SteamReview
✓ Fetched review summary for 620 in 0.46s
✓ PASS: SteamReview summary
  - Total reviews: 451,776
  - Positive: 445,914
  - Negative: 5,862
  - Review score: Overwhelmingly Positive


In [9]:
# Cell 9: SteamAchievements - Achievement data (requires Steam API key)
print("\n" + "="*50)
print("Testing: SteamAchievements")
print("="*50)

if not STEAM_KEY:
    print("⊘ SKIP: Steam API key not set (limited to public percentage data)")
    record_test("SteamAchievements fetch", None, "SteamAchievements", 0, "Skipped: No API key")
else:
    appid = TEST_APPIDS["Portal 2"]  # Has many achievements
    achievements = SteamAchievements(api_key=STEAM_KEY)
    
    try:
        elapsed, result = time_it(achievements.fetch, appid, verbose=False)
        
        if result["success"]:
            data = result["data"]
            print(f"✓ Fetched achievement data for {appid} in {elapsed:.2f}s")
            
            # Check for achievement data
            achievement_list = data.get('achievements', [])
            
            if achievement_list:
                print_test_result("SteamAchievements fetch", True)
                record_test("SteamAchievements fetch", True, "SteamAchievements", elapsed)
                print(f"  - Total achievements: {len(achievement_list)}")
                print(f"  - Avg completion: {data.get('achievements_percentage_average', 'N/A')}%")
                if achievement_list:
                    first = achievement_list[0]
                    print(f"  - Sample achievement: {first.get('name', 'N/A')}")
            else:
                print_test_result("SteamAchievements fetch", False, "No achievements found")
                record_test("SteamAchievements fetch", False, "SteamAchievements", elapsed, "No achievements")
        else:
            print_test_result("SteamAchievements fetch", False, result.get("error", "Unknown error"))
            record_test("SteamAchievements fetch", False, "SteamAchievements", elapsed, result.get("error"))
    except Exception as e:
        print_test_result("SteamAchievements fetch", False, f"Exception: {str(e)[:50]}...")
        record_test("SteamAchievements fetch", False, "SteamAchievements", 0, str(e))

# Test without API key (public percentage data only)
print("\nTesting public percentage data (no API key required):")
achievements_public = SteamAchievements()

try:
    elapsed, result = time_it(achievements_public.fetch, appid, verbose=False)
    
    if result["success"]:
        data = result["data"]
        achievement_list = data.get('achievements', [])
        print(f"✓ Public data: {len(achievement_list)} achievements (percentages only)")
        record_test("SteamAchievements public", True, "SteamAchievements", elapsed)
    else:
        print(f"✗ Public data failed: {result.get('error', 'Unknown')}")
        record_test("SteamAchievements public", False, "SteamAchievements", elapsed, result.get("error"))
except Exception as e:
    print(f"✗ Exception: {str(e)[:50]}...")
    record_test("SteamAchievements public", False, "SteamAchievements", 0, str(e))

BaseSource.close_session()


Testing: SteamAchievements
⊘ SKIP: Steam API key not set (limited to public percentage data)

Testing public percentage data (no API key required):
✓ Public data: 0 achievements (percentages only)


In [10]:
# Cell 10: HowLongToBeat - Game search and completion times
print("\n" + "="*50)
print("Testing: HowLongToBeat")
print("="*50)

# Use DEFAULT_GAME which is Stardew Valley
game_name = DEFAULT_GAME

# Close any existing session before starting fresh
BaseSource.close_session()

hltb = HowLongToBeat()

try:
    # Add a small delay to avoid rate limiting from previous cells
    import time as time_module
    time_module.sleep(2)
    
    elapsed, result = time_it(hltb.fetch, game_name, verbose=True)
    
    if result["success"]:
        data = result["data"]
        print(f"✓ Found game: {data.get('name')} (game_id: {data.get('game_id')}) in {elapsed:.2f}s")
        
        # HLTB-specific validation: check for required fields
        hltb_required = ["game_id", "game_name"]
        hltb_issues = []
        for field in hltb_required:
            if field not in data or data[field] is None:
                hltb_issues.append(f"Missing required field: {field}")
        
        # Check for at least one completion time
        time_fields = ["comp_main", "comp_plus", "comp_100", "comp_all"]
        has_time_data = any(data.get(f) is not None for f in time_fields)
        if not has_time_data:
            hltb_issues.append("No completion time data available")
        
        if hltb_issues:
            print_test_result("HowLongToBeat fetch", False, "; ".join(hltb_issues))
            record_test("HowLongToBeat fetch", False, "HowLongToBeat", elapsed, "; ".join(hltb_issues))
        else:
            print_test_result("HowLongToBeat fetch", True)
            record_test("HowLongToBeat fetch", True, "HowLongToBeat", elapsed)
            
            # Show completion times in readable format
            print(f"\n  Completion Times:")
            for field in time_fields:
                value = data.get(field)
                if value is not None:
                    hours = value // 60
                    mins = value % 60
                    label = field.replace("comp_", "").replace("_", " ").title()
                    print(f"    - {label}: {hours}h {mins}m")
            
            display_all_fields(data, "All fields from HowLongToBeat:")
    else:
        error_msg = result.get("error", "Unknown error")
        print(f"✗ Error: {error_msg}")
        print_test_result("HowLongToBeat fetch", False, error_msg)
        record_test("HowLongToBeat fetch", False, "HowLongToBeat", elapsed, error_msg)
except Exception as e:
    import traceback
    print(f"✗ Exception: {str(e)}")
    print("Traceback:")
    traceback.print_exc()
    print_test_result("HowLongToBeat fetch", False, f"Exception: {str(e)}")
    record_test("HowLongToBeat fetch", False, "HowLongToBeat", 0, str(e))
finally:
    BaseSource.close_session()


Testing: HowLongToBeat


INFO - HowLongToBeat: Fetching data for game 'Stardew Valley'


✓ Found game: None (game_id: 34716) in 3.76s
✓ PASS: HowLongToBeat fetch

  Completion Times:
    - Main: 53h 47m
    - Plus: 98h 43m
    - 100: 183h 30m
    - All: 108h 18m

All fields from HowLongToBeat:
--------------------------------------------------
  game_id: 34716
  game_name: Stardew Valley
  game_type: game
  comp_main: 3227
  comp_plus: 5923
  comp_100: 11010
  comp_all: 6498
  comp_main_count: 9
  comp_plus_count: 14
  comp_100_count: 8
  comp_all_count: 32
  invested_co: 4782
  invested_mp: 7983
  invested_co_count: 1
  invested_mp_count: 0
  count_comp: 8006
  count_speed_run: (null/empty)
  count_backlog: 18001
  count_review: 2968
  review_score: 88
  count_playing: 601
  count_retired: 2559


---

## Section 3: Collector Integration Tests

Tests the `Collector` class which orchestrates all sources.

In [11]:
# Cell 11: Single game fetch via Collector
print("\n" + "="*50)
print("Testing: Collector - Single Game")
print("="*50)

appid = DEFAULT_APPID

with Collector() as collector:
    try:
        elapsed, result = time_it(collector.get_games_data, [appid], recap=False, verbose=False)
        
        if result and len(result) > 0:
            game = result[0]
            issues = validate_game_data(game, appid)
            
            if issues:
                print_test_result("Collector single game", False, "; ".join(issues))
                record_test("Collector single game", False, "Collector", elapsed, "; ".join(issues))
            else:
                print_test_result("Collector single game", True)
                record_test("Collector single game", True, "Collector", elapsed)
                display_game_summary(game)
                
                # Check data from multiple sources
                sources_found = []
                if game.get('developers'): sources_found.append('SteamStore')
                if game.get('owners'): sources_found.append('SteamSpy')
                if game.get('active_player_24h'): sources_found.append('SteamCharts')
                if game.get('comp_main'): sources_found.append('HowLongToBeat')
                
                print(f"\n  Sources contributing data: {', '.join(sources_found)}")
                
                # Display all non-null fields
                print(f"\n  All non-null fields:")
                for key, value in game.items():
                    if value is not None:
                        value_str = str(value)
                        if len(value_str) > 50:
                            value_str = value_str[:47] + "..."
                        print(f"    - {key}: {value_str}")
        else:
            print_test_result("Collector single game", False, "No data returned")
            record_test("Collector single game", False, "Collector", elapsed, "No data")
    except Exception as e:
        print_test_result("Collector single game", False, f"Exception: {str(e)}")
        record_test("Collector single game", False, "Collector", 0, str(e))

print("\n  Session cleaned up via context manager ✓")


Testing: Collector - Single Game
✓ PASS: Collector single game

  Game: Stardew Valley
  ----------------------------------------
    AppID: 413150
    Developers: ['ConcernedApe']
    Price: 14.99
    Owners: 37055718
    Current Players: 50662
    Main Story Time: 3227

  Sources contributing data: SteamStore, SteamSpy, SteamCharts, HowLongToBeat

  All non-null fields:
    - steam_appid: 413150
    - name: Stardew Valley
    - developers: ['ConcernedApe']
    - publishers: ['ConcernedApe']
    - type: game
    - is_free: False
    - is_coming_soon: False
    - recommendations: 831186
    - price_currency: USD
    - price_initial: 14.99
    - price_final: 14.99
    - metacritic_score: 89
    - release_date: 2016-02-26 00:00:00
    - days_since_release: 3616
    - average_playtime: 235933
    - copies_sold: 30131536
    - estimated_revenue: 280277328
    - owners: 37055718
    - ccu: 50662
    - active_player_24h: 116146
    - peak_active_player_all_time: 236614
    - monthly_active_

In [12]:
# Cell 12: Multiple games (small batch)
print("\n" + "="*50)
print(f"Testing: Collector - Small Batch ({BATCH_SIZE_SMALL} games)")
print("="*50)

# Get a subset of test appids
batch_appids = list(TEST_APPIDS.values())[:BATCH_SIZE_SMALL]

with Collector() as collector:
    try:
        print(f"Fetching {len(batch_appids)} games...")
        elapsed, result = time_it(collector.get_games_data, batch_appids, recap=False, verbose=False)
        
        if result and len(result) > 0:
            successful = len([r for r in result if r.get('steam_appid')])
            
            print_test_result(
                f"Collector batch ({BATCH_SIZE_SMALL} games)", 
                successful > 0,
                f"{successful}/{len(batch_appids)} games fetched"
            )
            record_test(
                f"Collector batch ({BATCH_SIZE_SMALL})",
                successful > 0,
                "Collector",
                elapsed,
                f"{successful}/{len(batch_appids)} games"
            )
            
            print(f"  - Total time: {elapsed:.2f}s")
            print(f"  - Average per game: {elapsed/len(batch_appids):.2f}s")
            print(f"  - Successfully fetched: {successful}")
            
            # Show summary of fetched games
            print("\n  Fetched games:")
            for game in result:
                if game.get('name'):
                    print(f"    - {game['name']} ({game.get('steam_appid', 'N/A')})")
        else:
            print_test_result("Collector batch", False, "No data returned")
            record_test(f"Collector batch ({BATCH_SIZE_SMALL})", False, "Collector", elapsed, "No data")
    except Exception as e:
        print_test_result("Collector batch", False, f"Exception: {str(e)[:50]}...")
        record_test(f"Collector batch ({BATCH_SIZE_SMALL})", False, "Collector", 0, str(e))

print("\n  Session cleaned up via context manager ✓")


Testing: Collector - Small Batch (5 games)
Fetching 5 games...
✓ PASS: Collector batch (5 games)
       5/5 games fetched
  - Total time: 24.82s
  - Average per game: 4.96s
  - Successfully fetched: 5

  Fetched games:
    - Dota 2 (570)
    - Counter-Strike 2 (730)
    - ELDEN RING (1245620)
    - Stardew Valley (413150)
    - Portal 2 (620)

  Session cleaned up via context manager ✓


In [13]:
# Cell 13: Recap mode (summary data)
print("\n" + "="*50)
print("Testing: Collector - Recap Mode")
print("="*50)

appid = TEST_APPIDS["Elden Ring"]

with Collector() as collector:
    try:
        # Fetch full data first
        elapsed_full, full_data = time_it(collector.get_games_data, [appid], recap=False, verbose=False)
        # Fetch recap data
        elapsed_recap, recap_data = time_it(collector.get_games_data, [appid], recap=True, verbose=False)
        
        if full_data and recap_data:
            full_fields = len(full_data[0])
            recap_fields = len(recap_data[0])
            
            print(f"✓ Full data: {full_fields} fields")
            print(f"✓ Recap data: {recap_fields} fields")
            print(f"✓ Recap reduces fields by {(1 - recap_fields/full_fields)*100:.1f}%")
            
            print_test_result("Collector recap mode", True)
            record_test("Collector recap mode", True, "Collector", elapsed_full + elapsed_recap)
            
            # Show some recap fields
            recap = recap_data[0]
            print("\n  Recap fields sample:")
            recap_sample_fields = ['steam_appid', 'name', 'developers', 'price_final', 'owners', 'ccu']
            for f in recap_sample_fields:
                if f in recap:
                    print(f"    - {f}: {recap[f]}")
        else:
            print_test_result("Collector recap mode", False, "No data returned")
            record_test("Collector recap mode", False, "Collector", 0, "No data")
    except Exception as e:
        print_test_result("Collector recap mode", False, f"Exception: {str(e)[:50]}...")
        record_test("Collector recap mode", False, "Collector", 0, str(e))

print("\n  Session cleaned up via context manager ✓")


Testing: Collector - Recap Mode
✓ Full data: 54 fields
✓ Recap data: 31 fields
✓ Recap reduces fields by 42.6%
✓ PASS: Collector recap mode

  Recap fields sample:
    - steam_appid: 1245620
    - name: ELDEN RING
    - developers: ['FromSoftware, Inc.']
    - price_final: 59.99
    - owners: 24816504

  Session cleaned up via context manager ✓


In [14]:
# Cell 14: Error handling - invalid appid
print("\n" + "="*50)
print("Testing: Collector - Error Handling")
print("="*50)

invalid_appid = "999999999"  # Non-existent appid

with Collector() as collector:
    try:
        elapsed, result = time_it(collector.get_games_data, [invalid_appid], recap=False, verbose=False)
        
        # Should return empty list or gracefully handle error
        if not result or len(result) == 0:
            print_test_result("Collector error handling", True, "Correctly handled invalid appid")
            record_test("Collector error handling", True, "Collector", elapsed)
        else:
            # Some data might still be returned (e.g., from sources that don't validate)
            print(f"⚠ Partial data returned for invalid appid")
            print_test_result("Collector error handling", True, "Partial handling")
            record_test("Collector error handling", True, "Collector", elapsed, "Partial data")
    except Exception as e:
        # Exception is acceptable for invalid appid
        print_test_result("Collector error handling", True, f"Exception raised: {str(e)[:50]}...")
        record_test("Collector error handling", True, "Collector", 0, "Exception OK")

print("\n  Session cleaned up via context manager ✓")


Testing: Collector - Error Handling
⚠ Partial data returned for invalid appid
✓ PASS: Collector error handling
       Partial handling

  Session cleaned up via context manager ✓


---

## Section 4: Performance Benchmarking

Tests to measure and demonstrate performance improvements.

In [15]:
# Cell 15: Connection pooling benefit
print("\n" + "="*50)
print("Testing: Connection Pooling Benefit")
print("="*50)

# Test appids
test_ids = list(TEST_APPIDS.values())[:3]

print("\nScenario 1: Fresh session (cold starts)")
# First request after closing session
times_cold = []
for appid in test_ids:
    BaseSource.close_session()  # Force fresh session
    collector = Collector()
    elapsed, _ = time_it(collector.get_games_data, [appid], verbose=False)
    times_cold.append(elapsed)
    collector.close()

print("\nScenario 2: Reused session (warm connection)")
# All requests share the same session
times_warm = []
with Collector() as collector:
    for appid in test_ids:
        elapsed, _ = time_it(collector.get_games_data, [appid], verbose=False)
        times_warm.append(elapsed)

# Analysis
avg_cold = sum(times_cold) / len(times_cold)
avg_warm = sum(times_warm) / len(times_warm)
speedup = avg_cold / avg_warm if avg_warm > 0 else 0

print(f"\n{'Request':<20} {'Cold (s)':<12} {'Warm (s)':<12} {'Speedup':<10}")
print("-" * 54)
for i, appid in enumerate(test_ids):
    print(f"{f'Request {i+1}':<20} {times_cold[i]:<12.2f} {times_warm[i]:<12.2f} {times_cold[i]/times_warm[i] if times_warm[i] > 0 else 0:<10.2f}x")

print(f"{'Average':<20} {avg_cold:<12.2f} {avg_warm:<12.2f} {speedup:<10.2f}x")

if speedup > 1.2:
    print(f"\n✓ Connection pooling provides {speedup:.1f}x speedup on subsequent requests")
    record_test("Connection pooling", True, "Performance", avg_cold + avg_warm, f"{speedup:.1f}x speedup")
elif speedup > 1.0:
    print(f"\n✓ Connection pooling provides {speedup:.1f}x speedup (minor)")
    record_test("Connection pooling", True, "Performance", avg_cold + avg_warm, f"{speedup:.1f}x speedup")
else:
    print(f"\n⚠ No significant speedup detected (network variance likely)")
    record_test("Connection pooling", True, "Performance", avg_cold + avg_warm, "No significant speedup")


Testing: Connection Pooling Benefit

Scenario 1: Fresh session (cold starts)

Scenario 2: Reused session (warm connection)

Request              Cold (s)     Warm (s)     Speedup   
------------------------------------------------------
Request 1            3.83         4.15         0.92      x
Request 2            4.76         4.82         0.99      x
Request 3            4.37         2.65         1.65      x
Average              4.32         3.88         1.11      x

✓ Connection pooling provides 1.1x speedup (minor)


In [16]:
# Cell 16: Source timing breakdown
print("\n" + "="*50)
print("Testing: Per-Source Timing Breakdown")
print("="*50)

appid = DEFAULT_APPID

# Test each source individually
sources_to_test = [
    ("SteamStore", SteamStore()),
    ("SteamSpy", SteamSpy()),
    ("Gamalytic", Gamalytic(api_key=GAMALYTIC_KEY) if GAMALYTIC_KEY else Gamalytic()),
    ("SteamCharts", SteamCharts()),
    ("HowLongToBeat", HowLongToBeat()),
]

timing_results = []

print(f"\nFetching data for {DEFAULT_GAME} ({appid}) from each source:\n")
print(f"{'Source':<20} {'Time (s)':<12} {'Status':<10}")
print("-" * 42)

for name, source in sources_to_test:
    try:
        if name == "HowLongToBeat":
            elapsed, result = time_it(source.fetch, DEFAULT_GAME, verbose=False)
        else:
            elapsed, result = time_it(source.fetch, appid, verbose=False)
        
        status = "✓ Success" if result.get("success") else "✗ Failed"
        timing_results.append({"source": name, "elapsed": elapsed, "success": result.get("success", False)})
        print(f"{name:<20} {elapsed:<12.2f} {status:<10}")
    except Exception as e:
        timing_results.append({"source": name, "elapsed": 0, "success": False})
        print(f"{name:<20} {'0.00':<12} ✗ Error")

BaseSource.close_session()

# Summary
total_time = sum(r["elapsed"] for r in timing_results)
successful = sum(1 for r in timing_results if r["success"])
print(f"\nTotal time: {total_time:.2f}s")
print(f"Successful: {successful}/{len(timing_results)} sources")

record_test("Source timing breakdown", successful > 0, "Performance", total_time, f"{successful}/{len(sources_to_test)} sources")


Testing: Per-Source Timing Breakdown

Fetching data for Stardew Valley (413150) from each source:

Source               Time (s)     Status    
------------------------------------------
SteamStore           0.31         ✓ Success 
SteamSpy             0.11         ✓ Success 
Gamalytic            1.09         ✓ Success 
SteamCharts          0.45         ✓ Success 
HowLongToBeat        0.92         ✓ Success 

Total time: 2.88s
Successful: 5/5 sources


In [17]:
# Cell 17: Batch performance scaling
print("\n" + "="*50)
print("Testing: Batch Performance Scaling")
print("="*50)

batch_sizes = [1, 2, 3, 5]
batch_results = []

print(f"\nTesting Collector performance with different batch sizes:\n")
print(f"{'Games':<10} {'Total (s)':<12} {'Avg/Game (s)':<15} {'Status':<10}")
print("-" * 47)

for size in batch_sizes:
    try:
        test_appids = list(TEST_APPIDS.values())[:size]
        
        with Collector() as collector:
            elapsed, result = time_it(collector.get_games_data, test_appids, recap=False, verbose=False)
        
        successful = len([r for r in result if r.get('steam_appid')]) if result else 0
        avg_time = elapsed / size if size > 0 else 0
        status = f"{successful}/{size}" if result else "0/0"
        
        batch_results.append({"size": size, "elapsed": elapsed, "avg": avg_time, "successful": successful})
        print(f"{size:<10} {elapsed:<12.2f} {avg_time:<15.2f} {status:<10}")
    except Exception as e:
        print(f"  Size {size}: ERROR - {str(e)}")

# Analysis
if len(batch_results) >= 2:
    first_avg = batch_results[0]["avg"]
    last_avg = batch_results[-1]["avg"]
    
    if last_avg < first_avg:
        efficiency = ((first_avg - last_avg) / first_avg) * 100
        print(f"\n✓ Efficiency gain: {efficiency:.1f}% better per-game time at scale")
    else:
        print(f"\n⚠ No efficiency gain detected (may indicate sequential processing)")
    
    record_test("Batch scaling", True, "Performance", sum(r["elapsed"] for r in batch_results), f"{len(batch_results)} batch sizes")


Testing: Batch Performance Scaling

Testing Collector performance with different batch sizes:

Games      Total (s)    Avg/Game (s)    Status    
-----------------------------------------------
1          4.34         4.34            1/1       




2          20.10        10.05           2/2       
3          11.16        3.72            3/3       
5          20.80        4.16            5/5       

✓ Efficiency gain: 4.0% better per-game time at scale


---

## Section 5: Data Quality Validation

In [18]:
# Cell 18: GameDataModel validation
print("\n" + "="*50)
print("Testing: GameDataModel Schema Validation")
print("="*50)

appid = DEFAULT_APPID

with Collector() as collector:
    elapsed, result = time_it(collector.get_games_data, [appid], recap=False, verbose=False)

if result and len(result) > 0:
    raw_data = result[0]
    
    print(f"\nTesting GameDataModel with data from {raw_data.get('name', 'Unknown')}:\n")
    
    # Try to create model instance
    try:
        model = GameDataModel(**raw_data)
        print("✓ GameDataModel validation PASSED")
        print(f"  - All {len(model.model_fields)} fields validated successfully")
        record_test("GameDataModel validation", True, "Data Quality", elapsed, "Schema valid")
        
        # Check field types
        print("\n  Sample field types:")
        sample_fields = ['steam_appid', 'name', 'developers', 'price_final', 'owners']
        for field in sample_fields:
            if field in raw_data:
                value = raw_data[field]
                print(f"    - {field}: {type(value).__name__} = {value}")
                
    except Exception as e:
        print(f"✗ GameDataModel validation FAILED")
        print(f"  Error: {str(e)}")
        record_test("GameDataModel validation", False, "Data Quality", elapsed, str(e))
        
        # Show problematic fields
        print("\n  Checking individual field types:")
        model_fields = GameDataModel.model_fields
        for field, field_info in list(model_fields.items())[:10]:
            if field in raw_data:
                value = raw_data[field]
                print(f"    - {field}: {type(value).__name__} = {value}")
else:
    print("✗ No data available for validation")
    record_test("GameDataModel validation", False, "Data Quality", 0, "No data")


Testing: GameDataModel Schema Validation

Testing GameDataModel with data from Stardew Valley:

✓ GameDataModel validation PASSED
  - All 56 fields validated successfully

  Sample field types:
    - steam_appid: str = 413150
    - name: str = Stardew Valley
    - developers: list = ['ConcernedApe']
    - price_final: float = 14.99
    - owners: int = 37055718


/tmp/ipykernel_1045229/3666314736.py:20: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  print(f"  - All {len(model.model_fields)} fields validated successfully")


In [19]:
# Cell 19: Cross-source consistency
print("\n" + "="*50)
print("Testing: Cross-Source Data Consistency")
print("="*50)

appid = TEST_APPIDS["Counter-Strike 2"]

# Fetch from individual sources
results = {}

sources = [
    ("SteamStore", SteamStore(), appid),
    ("SteamSpy", SteamSpy(), appid),
]

for name, source, identifier in sources:
    try:
        elapsed, result = time_it(source.fetch, identifier, verbose=False)
        if result.get("success"):
            results[name] = result["data"]
    except Exception:
        pass

BaseSource.close_session()

# Check consistency
checks = []

if "SteamStore" in results and "SteamSpy" in results:
    store_name = results["SteamStore"].get("name")
    spy_name = results["SteamSpy"].get("name")
    
    if store_name and spy_name:
        if store_name.lower() == spy_name.lower():
            checks.append(("Name consistency", True, f"Both agree: {store_name}"))
        else:
            checks.append(("Name consistency", False, f"SteamStore: '{store_name}' vs SteamSpy: '{spy_name}'"))
    
    # AppID consistency
    store_id = str(results["SteamStore"].get("steam_appid", ""))
    spy_id = str(results["SteamSpy"].get("steam_appid", results["SteamSpy"].get("appid", "")))
    
    if store_id and spy_id:
        if store_id == spy_id:
            checks.append(("AppID consistency", True, f"Both agree: {store_id}"))
        else:
            checks.append(("AppID consistency", False, f"SteamStore: {store_id} vs SteamSpy: {spy_id}"))

# Print results
print("\nCross-Source Consistency Checks:")
for check_name, passed, details in checks:
    status = "✓" if passed else "✗"
    print(f"  {status} {check_name}: {details}")
    record_test(f"Cross-source: {check_name}", passed, "Data Quality", 0, details)

if not checks:
    print("  ⚠ Unable to perform consistency checks (data missing)")


Testing: Cross-Source Data Consistency

Cross-Source Consistency Checks:
  ✗ Name consistency: SteamStore: 'Counter-Strike 2' vs SteamSpy: 'Counter-Strike: Global Offensive'
  ✓ AppID consistency: Both agree: 730


---

## Section 6: Summary & Export

In [20]:
# Cell 20: Test results summary
print("\n" + "="*50)
print("Test Results Summary")
print("="*50)

if test_results:
    results_df = pd.DataFrame(test_results)
    
    # Overall stats
    total_tests = len(results_df)
    passed = len(results_df[results_df['passed'] == True])
    failed = len(results_df[results_df['passed'] == False])
    skipped = len(results_df[results_df['passed'].isna()])
    
    print(f"\n{'Category':<20} {'Count':<10}")
    print("-" * 30)
    print(f"{'Total Tests':<20} {total_tests:<10}")
    print(f"{'✓ Passed':<20} {passed:<10}")
    print(f"{'✗ Failed':<20} {failed:<10}")
    print(f"{'⊘ Skipped':<20} {skipped:<10}")
    print(f"{'Pass Rate':<20} {(passed/total_tests*100 if total_tests > 0 else 0):.1f}%")
    
    # Results by source
    print(f"\n{'Results by Source':<40}")
    print("-" * 40)
    by_source = results_df.groupby('source').agg({
        'passed': lambda x: (x == True).sum(),
        'test_name': 'count'
    }).rename(columns={'passed': 'passed', 'test_name': 'total'})
    
    for source, row in by_source.iterrows():
        total = int(row['total'])
        passed_count = int(row['passed'])
        print(f"  {source:<30} {passed_count}/{total}")
    
    # Failed tests
    if failed > 0:
        failed_df = results_df[results_df['passed'] == False]
        print(f"\n❌ Failed Tests:")
        for _, row in failed_df.iterrows():
            print(f"  - {row['test_name']}: {row['details']}")
    
    # Timing summary
    print(f"\n⏱ Total Test Time: {results_df['duration_seconds'].sum():.2f}s")
    
    record_test("Test suite summary", failed == 0, "Summary", results_df['duration_seconds'].sum(), f"{passed}/{total_tests} passed")
else:
    print("\nNo test results recorded.")


Test Results Summary

Category             Count     
------------------------------
Total Tests          18        
✓ Passed             16        
✗ Failed             1         
⊘ Skipped            1         
Pass Rate            88.9%

Results by Source                       
----------------------------------------
  Collector                      4/4
  Data Quality                   2/3
  Gamalytic                      1/1
  HowLongToBeat                  1/1
  Performance                    3/3
  SteamAchievements              1/2
  SteamCharts                    1/1
  SteamReview                    1/1
  SteamSpy                       1/1
  SteamStore                     1/1

❌ Failed Tests:
  - Cross-source: Name consistency: SteamStore: 'Counter-Strike 2' vs SteamSpy: 'Counter-Strike: Global Offensive'

⏱ Total Test Time: 120.87s
