# API Explorer and Debugger

This notebook provides comprehensive testing and debugging capabilities for the SoundByMood API backend.

## Sections:
1. **Direct Search Service Testing** - Manual testing of search functionality
2. **Service Component Debugging** - Test music_service and llm_service individually
3. **Storage Persistence Debugging** - Inspect in-memory storage state
4. **End-to-End API Testing** - Full endpoint simulation with storage inspection

# Section 1: Direct Search Service Testing

Manually test the search functionality without going through the API endpoints.

In [None]:
# Setup: Add API directory to path and import services
import sys
import os
sys.path.append('../api')

# Environment setup
from dotenv import load_dotenv
load_dotenv('../.env', override=True)

# Import services
from music_service import MusicService
from llm_service import LLMService
from search_service import initialize_services

# Initialize services
music_service = MusicService()
llm_service = LLMService()

print("Initializing services...")
music_service.initialize('../data/main_df.csv')
llm_service.initialize()
print("✅ Services initialized successfully!")

Initializing services...
✅ Services initialized successfully!


In [2]:
# Test 1: Manual query with auto-refinement simulation
test_query = "brooding electronic music for coding"

print(f"🔍 Testing query: '{test_query}'")
print("="*60)

# Step 1: Get initial LLM response
initial_prompt = llm_service.create_initial_prompt(test_query)
print(f"📝 Initial prompt: {initial_prompt}")

filters_json = await llm_service.query_llm(initial_prompt)
print(f"🤖 LLM Response:")
print(f"   User Message: {filters_json.get('user_message')}")
print(f"   Reflection: {filters_json.get('reflection')}")

# Step 2: Apply filters and get results
search_result = music_service.search(filters_json)
results_df = search_result["results"]
summary = search_result["summary"]

print(f"📊 Search Results:")
print(f"   Total Results: {len(results_df)}")
print(f"   Summary: {summary}")

# Step 3: Show top results
if len(results_df) > 0:
    top_5 = results_df.sort_values('relevance_score', ascending=False).head(5)
    print(f"🎵 Top 5 Results:")
    for idx, (_, row) in enumerate(top_5.iterrows(), 1):
        print(f"   {idx}. {row['track']} by {row['artist']} (Score: {row['relevance_score']:.1f})")
else:
    print("❌ No results found")


🔍 Testing query: 'brooding electronic music for coding'
📝 Initial prompt: User query: brooding electronic music for coding
Return ONLY JSON per schema.
🤖 LLM Response:
   User Message: I've set filters to find music that is less positive and energetic, and more instrumental with minimal vocals, fitting the 'brooding' and 'for coding' aspects. It's specifically tailored to electronic genres like EDM, ambient, house, and techno.
   Reflection: The most important decisions were interpreting 'brooding' as low valence and low energy, and 'for coding' as highly instrumental with low speechiness to minimize distraction. 'Electronic music' was directly mapped to genre filters. Key filters set were: valence_max_decile to 4 and valence_decile_weight to -50; energy_max_decile to 5 and energy_decile_weight to -40; instrumentalness_min_decile to 7 and instrumentalness_decile_weight to 50; and speechiness_max_decile to 3 and speechiness_decile_weight to -50. Additionally, genre filters for electroni

In [3]:
# Test 2: Try different queries
test_queries = [
    "period drama soundtrack",
    "upbeat pop for workout",
    "jazz for late night study",
    "ambient instrumental background music"
]

for query in test_queries:
    print(f"\n{'='*40}")
    print(f"🔍 Query: '{query}'")
    
    try:
        # Get LLM response
        prompt = llm_service.create_initial_prompt(query)
        filters = await llm_service.query_llm(prompt)
        
        # Get results
        result = music_service.search(filters)
        count = len(result["results"])
        
        print(f"📊 Results: {count} tracks")
        print(f"💬 LLM: {filters.get('user_message', 'No message')}")
        
    except Exception as e:
        print(f"❌ Error: {str(e)}")


🔍 Query: 'period drama soundtrack'
📊 Results: 54 tracks
💬 LLM: These filters are set to find music that sounds like a 'period drama soundtrack', focusing on instrumental tracks with high acoustic quality and a more calm, less danceable feel. We've specifically included genres like soundtrack, orchestra, and classical to ensure relevant results.

🔍 Query: 'upbeat pop for workout'
📊 Results: 518 tracks
💬 LLM: I've set filters to find upbeat pop music ideal for workouts by prioritizing high danceability, energy, and positive mood, along with a fast tempo. Pop genre tracks will also receive a relevance boost.

🔍 Query: 'jazz for late night study'
📊 Results: 2 tracks
💬 LLM: These filters are set to find instrumental jazz music that is calm and mellow, perfect for a focused late-night study session. They prioritize lower energy, minimal vocals, and a quieter, more acoustic sound.

🔍 Query: 'ambient instrumental background music'
📊 Results: 175 tracks
💬 LLM: These filters are set to find amb

# Section 2: Service Component Debugging

Test individual components of music_service and llm_service.

In [None]:
# Music Service Debugging
print("🎵 MUSIC SERVICE DEBUGGING")
print("="*50)

# Test 1: Check dataset loading
print(f"📊 Dataset loaded: {music_service.main_df is not None}")
if music_service.main_df is not None:
    print(f"   Shape: {music_service.main_df.shape}")
    print(f"   Columns: {list(music_service.main_df.columns[:10])}...")  # First 10 columns

# Test 2: Manual filter creation
print(f"\n🔧 Testing manual filters...")
manual_filters = {
    'danceability_min_decile': 5,
    'danceability_max_decile': 10,
    'danceability_decile_weight': 30,
    'energy_min_decile': 7,
    'energy_max_decile': 10,
    'energy_decile_weight': 50,
    'speechiness_min_decile': 1,
    'speechiness_max_decile': 3,
    'speechiness_decile_weight': -30,
    'acousticness_min_decile': 1,
    'acousticness_max_decile': 10,
    'acousticness_decile_weight': 0,
    'instrumentalness_min_decile': 7,
    'instrumentalness_max_decile': 10,
    'instrumentalness_decile_weight': 60,
    'liveness_min_decile': 1,
    'liveness_max_decile': 10,
    'liveness_decile_weight': 0,
    'valence_min_decile': 1,
    'valence_max_decile': 10,
    'valence_decile_weight': 0,
    'views_min_decile': 1,
    'views_max_decile': 10,
    'views_decile_weight': 0,
    'loudness_min': -60,
    'loudness_max': 0,
    'loudness_decile_weight': 0,
    'tempo_min': 120,
    'tempo_max': 140,
    'tempo_decile_weight': 20,
    'duration_ms_min': 30000,
    'duration_ms_max': 600000,
    'duration_ms_decile_weight': 0,
    'album_release_year_min': 2010,
    'album_release_year_max': 2025,
    'track_is_explicit_min': 0,
    'track_is_explicit_max': 1,
    'spotify_artist_genres_include_any': 'electronic,edm',
    'spotify_artist_genres_exclude_any': '',
    'spotify_artist_genres_boosted': 'ambient,lo-fi',
    'debug_tag': 'manual_test',
    'reflection': 'Manual test filters',
    'user_message': 'Testing manual filters'
}

# Test filter application
filter_mask = music_service.llm_to_filters(manual_filters)
print(f"   Filter mask created: {type(filter_mask)}, {filter_mask.sum()} tracks match")

# Test results generation
results_df = music_service.filters_to_results_df(filter_mask, manual_filters)
print(f"   Results generated: {len(results_df)} tracks with scores")

# Test summary creation
summary = music_service.make_summary(results_df)
print(f"   Summary created: {list(summary.keys())}")

In [None]:
# LLM Service Debugging
print("🤖 LLM SERVICE DEBUGGING")
print("="*50)

# Test 1: Check client initialization
print(f"🔌 LLM client initialized: {llm_service.client is not None}")

# Test 2: Test prompt creation
test_query = "sad piano music"
initial_prompt = llm_service.create_initial_prompt(test_query)
print(f"\n📝 Initial prompt created: {len(initial_prompt)} characters")
print(f"   Preview: {initial_prompt[:100]}...")

# Test 3: Test refinement prompt
dummy_filters = {'test': 'data'}
dummy_summary = {'result_count': 100}
refine_prompt = llm_service.create_refine_prompt(
    original_query=test_query,
    previous_filters=dummy_filters,
    result_summary=dummy_summary,
    user_feedback="make it more upbeat"
)
print(f"\n🔄 Refinement prompt created: {len(refine_prompt)} characters")
print(f"   Preview: {refine_prompt[:100]}...")

# Test 4: Test actual LLM call (if you want to test with real API)
print(f"\n⚡ Testing LLM call...")
try:
    llm_response = await llm_service.query_llm(initial_prompt)
    print(f"   ✅ LLM responded successfully")
    print(f"   Response keys: {list(llm_response.keys())}")
    print(f"   User message: {llm_response.get('user_message', 'None')}")
except Exception as e:
    print(f"   ❌ LLM call failed: {str(e)}")

# Section 3: Storage Persistence Debugging

Inspect and manipulate the in-memory storage state.

In [4]:
# Storage Debugging
import storage
from models import JobData, JobStatus, ConversationHistory, RefinementStep
from datetime import datetime
import uuid

print("💾 STORAGE DEBUGGING")
print("="*50)

# Check initial state
print(f"📊 Initial storage state:")
print(f"   JOB_STORE entries: {len(storage.JOB_STORE)}")
print(f"   RESULT_STORE entries: {len(storage.RESULT_STORE)}")

# Create test job data
test_job_id = str(uuid.uuid4())
test_job_data = JobData(
    status=JobStatus.QUEUED,
    query_text="test query for storage",
    started_at=datetime.now(),
    finished_at=None,
    error_message=None,
    conversation_history=None,
    current_filters_json=None,
    result_count=None
)

print(f"\n🧪 Testing storage operations...")

# Test store job
storage.store_job(test_job_id, test_job_data)
print(f"   ✅ Job stored with ID: {test_job_id[:8]}...")

# Test retrieve job
retrieved_job = storage.get_job(test_job_id)
print(f"   ✅ Job retrieved: {retrieved_job.query_text}")

# Test job exists
exists = storage.job_exists(test_job_id)
print(f"   ✅ Job exists check: {exists}")

# Check storage state after operations
print(f"\n📊 Storage state after test:")
print(f"   JOB_STORE entries: {len(storage.JOB_STORE)}")
print(f"   Job IDs: {list(storage.JOB_STORE.keys())}")

💾 STORAGE DEBUGGING
📊 Initial storage state:
   JOB_STORE entries: 0
   RESULT_STORE entries: 0

🧪 Testing storage operations...
   ✅ Job stored with ID: 7f703445...
   ✅ Job retrieved: test query for storage
   ✅ Job exists check: True

📊 Storage state after test:
   JOB_STORE entries: 1
   Job IDs: ['7f703445-d6d0-4352-9d1e-d2fc7361c6e2']


In [None]:
# Test conversation history storage
print("💬 Testing conversation history storage...")

# Create test conversation history
test_step = RefinementStep(
    step_number=1,
    step_type="initial",
    user_input="test query",
    filters_json={"test": "filters"},
    result_count=42,
    user_message="Test user message",
    rationale="Test rationale",
    result_summary={"result_count": 42},
    timestamp=datetime.now(),
    target_range="50-150"
)

test_conversation = ConversationHistory(
    original_query="test query",
    steps=[test_step],
    current_step=1,
    total_auto_refinements=0
)

# Update job with conversation history
test_job_data.conversation_history = test_conversation
storage.store_job(test_job_id, test_job_data)

# Retrieve and verify
updated_job = storage.get_job(test_job_id)
print(f"   ✅ Conversation history stored")
print(f"   Steps: {len(updated_job.conversation_history.steps)}")
print(f"   Original query: {updated_job.conversation_history.original_query}")
print(f"   Step 1 message: {updated_job.conversation_history.steps[0].user_message}")

In [5]:
# Storage inspection utility
def inspect_storage():
    """Utility function to inspect current storage state"""
    print("🔍 CURRENT STORAGE STATE")
    print("="*40)
    
    print(f"📋 Jobs in storage: {len(storage.JOB_STORE)}")
    for job_id, job_data in storage.JOB_STORE.items():
        print(f"   🆔 {job_id[:8]}... - {job_data.status.value} - '{job_data.query_text}'")
        if job_data.conversation_history:
            steps = len(job_data.conversation_history.steps)
            print(f"      💬 {steps} conversation steps")
    
    print(f"📊 Results in storage: {len(storage.RESULT_STORE)}")
    for job_id, results in storage.RESULT_STORE.items():
        print(f"   🆔 {job_id[:8]}... - {results.result_count} total, {len(results.tracks)} tracks returned")

# Run inspection
inspect_storage()

🔍 CURRENT STORAGE STATE
📋 Jobs in storage: 1
   🆔 7f703445... - queued - 'test query for storage'
📊 Results in storage: 0


In [20]:
# DEBUG: Let's add error logging to find the issue
import traceback

print("🔧 DEBUGGING SEARCH SERVICE")
print("="*50)

# Test service initialization
print("Testing service initialization...")
try:
    from search_service import initialize_services, music_service, llm_service
    print(f"   Music service main_df loaded: {music_service.main_df is not 
None}")
    print(f"   LLM service client loaded: {llm_service.client is not None}")

    # Test if services need initialization
    if music_service.main_df is None:
        print("   ⚠️  Music service not initialized!")
        music_service.initialize('../data/main_df.csv')
        print(f"   ✅ Music service initialized: {music_service.main_df is not 
None}")

    if llm_service.client is None:
        print("   ⚠️  LLM service not initialized!")
        llm_service.initialize()
        print(f"   ✅ LLM service initialized: {llm_service.client is not 
None}")

except Exception as e:
    print(f"   ❌ Service initialization error: {str(e)}")
    traceback.print_exc()

# Test a simple LLM call
print(f"\n🤖 Testing LLM call...")
try:
    test_prompt = llm_service.create_initial_prompt("test music")
    test_response = await llm_service.query_llm(test_prompt)
    print(f"   ✅ LLM call successful: {list(test_response.keys())}")
except Exception as e:
    print(f"   ❌ LLM call failed: {str(e)}")
    traceback.print_exc()

# Test music search
print(f"\n🎵 Testing music search...")
try:
    if 'test_response' in locals():
        search_result = music_service.search(test_response)
        print(f"   ✅ Music search successful: {len(search_result['results'])} results")
    else:
        print("   ⏭️  Skipping music search (no LLM response)")
except Exception as e:
    print(f"   ❌ Music search failed: {str(e)}")
    traceback.print_exc()

🔧 DEBUGGING SEARCH SERVICE
Testing service initialization...
   Music service main_df loaded: False
   LLM service client loaded: False
   ⚠️  Music service not initialized!
   ✅ Music service initialized: True
   ⚠️  LLM service not initialized!
   ✅ LLM service initialized: True

🤖 Testing LLM call...
   ✅ LLM call successful: ['danceability_min_decile', 'danceability_max_decile', 'danceability_decile_weight', 'energy_min_decile', 'energy_max_decile', 'energy_decile_weight', 'speechiness_min_decile', 'speechiness_max_decile', 'speechiness_decile_weight', 'acousticness_min_decile', 'acousticness_max_decile', 'acousticness_decile_weight', 'instrumentalness_min_decile', 'instrumentalness_max_decile', 'instrumentalness_decile_weight', 'liveness_min_decile', 'liveness_max_decile', 'liveness_decile_weight', 'valence_min_decile', 'valence_max_decile', 'valence_decile_weight', 'views_min_decile', 'views_max_decile', 'views_decile_weight', 'loudness_min', 'loudness_max', 'loudness_decile_weig

# Section 4: End-to-End API Testing

Simulate full API endpoint behavior with storage inspection at each step.

In [24]:
# End-to-End API Simulation
from search_service import create_search_job, get_job_status, process_search_job
from models import SearchRequest
from fastapi import BackgroundTasks
import asyncio

print("🚀 END-TO-END API TESTING")
print("="*50)

# Initialize services for search_service
from search_service import initialize_services
# Note: We've already initialized individual services above
#initialize_services()


# Clear storage for clean test
storage.JOB_STORE.clear()
storage.RESULT_STORE.clear()
print("🧹 Cleared storage for clean test")

# Test 1: Simulate POST /search endpoint
print("📤 Testing POST /search endpoint simulation...")

test_request = SearchRequest(query_text="period drama soundtrack")

# Mock BackgroundTasks (since we're not in FastAPI context)
class MockBackgroundTasks:
    def __init__(self):
        self.tasks = []
    
    def add_task(self, func, *args, **kwargs):
        self.tasks.append((func, args, kwargs))
        print(f"   📋 Background task added: {func.__name__}")

mock_bg_tasks = MockBackgroundTasks()

# Call create_search_job
response = await create_search_job(test_request, mock_bg_tasks)
job_id = response["job_id"]

print(f"   ✅ Job created with ID: {job_id[:8]}...")
print(f"   📋 Background tasks queued: {len(mock_bg_tasks.tasks)}")

# Inspect storage after job creation
print("🔍 Storage after job creation:")
inspect_storage()

🚀 END-TO-END API TESTING
🧹 Cleared storage for clean test
📤 Testing POST /search endpoint simulation...
   📋 Background task added: process_search_job
   ✅ Job created with ID: c4caf0ab...
   📋 Background tasks queued: 1
🔍 Storage after job creation:
🔍 CURRENT STORAGE STATE
📋 Jobs in storage: 1
   🆔 c4caf0ab... - queued - 'period drama soundtrack'
📊 Results in storage: 0


In [25]:
# Test 2: Simulate GET /jobs/{id} endpoint (before processing)
print("📥 Testing GET /jobs/{id} endpoint (before processing)...")

job_response = await get_job_status(job_id)
print(f"   ✅ Job status retrieved")
print(f"   Status: {job_response.status}")
print(f"   Query: {job_response.query_text}")
print(f"   Results: {job_response.results}")
print(f"   Conversation history: {job_response.conversation_history}")

📥 Testing GET /jobs/{id} endpoint (before processing)...
   ✅ Job status retrieved
   Status: JobStatus.QUEUED
   Query: period drama soundtrack
   Results: None
   Conversation history: None


In [26]:
# Test 3: Simulate background job processing
print("⚙️ Simulating background job processing...")

# Manually execute the background task
if mock_bg_tasks.tasks:
    func, args, kwargs = mock_bg_tasks.tasks[0]
    print(f"   🏃 Executing: {func.__name__} with args {args}")
    
    # Execute the background processing
    await func(*args, **kwargs)
    
    print(f"   ✅ Background processing completed")

# Inspect storage after processing
print("🔍 Storage after background processing:")
inspect_storage()

⚙️ Simulating background job processing...
   🏃 Executing: process_search_job with args ('c4caf0ab-7400-4b0f-9d18-a519df7c9359', 'period drama soundtrack')
   ✅ Background processing completed
🔍 Storage after background processing:
🔍 CURRENT STORAGE STATE
📋 Jobs in storage: 1
   🆔 c4caf0ab... - done - 'period drama soundtrack'
      💬 3 conversation steps
📊 Results in storage: 1
   🆔 c4caf0ab... - 197 total, 50 tracks returned


In [27]:
# Test 4: Simulate GET /jobs/{id} endpoint (after processing)
print("📥 Testing GET /jobs/{id} endpoint (after processing)...")

final_job_response = await get_job_status(job_id)
print(f"   ✅ Final job status retrieved")
print(f"   Status: {final_job_response.status}")
print(f"   Result count: {final_job_response.result_count}")
print(f"   Has results: {final_job_response.results is not None}")
print(f"   Has conversation history: {final_job_response.conversation_history is not None}")

if final_job_response.conversation_history:
    history = final_job_response.conversation_history
    print(f"   📜 Conversation steps: {len(history.steps)}")
    print(f"   🔄 Auto refinements: {history.total_auto_refinements}")
    
    for i, step in enumerate(history.steps):
        print(f"      Step {i+1} ({step.step_type}): {step.result_count} results")
        print(f"         💬 {step.user_message}")

if final_job_response.results:
    results = final_job_response.results
    print(f"   🎵 Top 3 tracks:")
    for i, track in enumerate(results.tracks[:3]):
        print(f"      {i+1}. {track.track} by {track.artist} (Score: {track.relevance_score:.1f})")

📥 Testing GET /jobs/{id} endpoint (after processing)...
   ✅ Final job status retrieved
   Status: JobStatus.DONE
   Result count: 197
   Has results: True
   Has conversation history: True
   📜 Conversation steps: 3
   🔄 Auto refinements: 2
      Step 1 (initial): 202 results
         💬 These filters aim to find music that sounds like a 'period drama soundtrack' by prioritizing instrumental, acoustic, and classical-sounding tracks, while excluding modern, dance-oriented, or vocal-heavy genres.
      Step 2 (auto_refine): 202 results
         💬 These filters are refined to strongly prioritize instrumental, acoustic, and classical-style music typical of period drama soundtracks, by more strictly excluding upbeat, vocal, or overly cheerful tracks. We've also added a preference for slower tempos and boosted more popular tracks to enhance relevance and quality.
      Step 3 (auto_refine): 197 results
         💬 These refined filters more precisely target period drama soundtracks by strictl

In [28]:
# Test 5: Error handling simulation
print("❌ Testing error handling...")

# Test non-existent job
fake_job_id = "non-existent-job-id"
try:
    await get_job_status(fake_job_id)
    print(f"   ❌ Should have failed for non-existent job")
except Exception as e:
    print(f"   ✅ Correctly handled non-existent job: {type(e).__name__}")

print("🏁 End-to-end testing completed!")

❌ Testing error handling...
   ✅ Correctly handled non-existent job: HTTPException
🏁 End-to-end testing completed!


In [29]:
# Final storage inspection
print("📊 FINAL STORAGE STATE")
print("="*50)
inspect_storage()

# Performance summary
print("⚡ PERFORMANCE SUMMARY")
print("="*50)
for job_id, job_data in storage.JOB_STORE.items():
    if job_data.started_at and job_data.finished_at:
        duration = job_data.finished_at - job_data.started_at
        print(f"   Job {job_id[:8]}... took {duration.total_seconds():.2f} seconds")
        if job_data.conversation_history:
            steps = len(job_data.conversation_history.steps)
            auto_refines = job_data.conversation_history.total_auto_refinements
            print(f"      {steps} total steps, {auto_refines} auto-refinements")

📊 FINAL STORAGE STATE
🔍 CURRENT STORAGE STATE
📋 Jobs in storage: 1
   🆔 c4caf0ab... - done - 'period drama soundtrack'
      💬 3 conversation steps
📊 Results in storage: 1
   🆔 c4caf0ab... - 197 total, 50 tracks returned
⚡ PERFORMANCE SUMMARY
   Job c4caf0ab... took 81.23 seconds
      3 total steps, 2 auto-refinements
