Skip to content

Add rich_examples autointerp strategy + compare tab#455

Merged
ocg-goodfire merged 2 commits intodevfrom
feature/rich-examples-autointerp
Mar 18, 2026
Merged

Add rich_examples autointerp strategy + compare tab#455
ocg-goodfire merged 2 commits intodevfrom
feature/rich-examples-autointerp

Conversation

@ocg-goodfire
Copy link
Copy Markdown
Collaborator

Description

Two features:

  1. Rich examples autointerp strategy (rich_examples): New interpretation strategy that shows per-token CI and activation values inline (e.g. <<<token (ci:0.8, act:0.12)>>>), letting the LLM judge evidence quality directly from examples rather than relying on aggregate token statistics.

  2. Autointerp Compare tab: New app tab for side-by-side comparison of interpretation results across different autointerp subruns (different strategies, models, etc.) for the same component.

Compare tab layout

  • Subrun selector: Multiselect chips showing strategy, model, timestamp, completion count, and eval scores
  • Two-panel view: Stacked interpretation cards (left) + full component data (right: activation examples, token stats, correlations, dataset attributions)
  • Navigation: Layer dropdown, pagination, go-to-index search

Backend

  • GET /api/autointerp_compare/subruns — lists all completed subruns with metadata
  • GET /api/autointerp_compare/subruns/{id}/interpretations — bulk headlines for a subrun
  • GET /api/autointerp_compare/subruns/{id}/interpretations/{layer}/{idx} — reasoning + prompt detail
  • Robustly handles corrupted/empty DBs from cancelled jobs

Other changes

  • InterpDB.has_interpretations_table() for safe schema validation
  • Multi-provider LLM config support (Anthropic, OpenAI, OpenRouter)
  • RichExamplesConfig added to strategy discriminated union

Motivation and Context

We're running the new rich_examples strategy against existing dual_view and compact_skeptical runs on Jose (s-55ea3f9b) and need an easy way to compare results side-by-side in the app.

How Has This Been Tested?

  • basedpyright — 0 errors on all modified Python files
  • svelte-check — 0 errors, 0 warnings
  • eslint — clean
  • prettier — clean
  • Manual testing with Jose run in the app

Does this PR introduce a breaking change?

No.

New autointerp strategy (rich_examples) that shows per-token CI and activation
values inline, letting the LLM judge evidence quality directly. Also adds an
Autointerp Compare tab to the app for side-by-side comparison of interpretation
results across different strategies/models/subruns.

Backend: 3 new endpoints for listing subruns, bulk headlines, and detail.
Frontend: SubrunSelector (multiselect chips), stacked SubrunInterpCard, two-panel
AutointerpComparer with full component data on the right panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ocg-goodfire
Copy link
Copy Markdown
Collaborator Author

@claude can you review pls

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 18, 2026

Claude finished @ocg-goodfire's task —— View job


PR Review: Rich Examples Autointerp Strategy + Compare Tab

Todo List

  • Read repository CLAUDE.md guidelines
  • Review backend changes (API endpoints, database modifications)
  • Review frontend changes (Svelte components, TypeScript)
  • Review autointerp strategy implementation
  • Check for security issues, bugs, performance concerns
  • Verify code follows repository standards
  • Provide comprehensive feedback

Review Summary

This is a well-implemented PR that adds two significant features to the SPD autointerp system. The code quality is high and follows the repository's established patterns and conventions. I found no security issues or major bugs.

✅ Strengths

Backend Implementation

  • Clean API design: The three new endpoints in autointerp_compare.py follow RESTful conventions and use proper FastAPI patterns
  • Robust error handling: InterpDB.has_interpretations_table() safely validates schema before operations, preventing crashes from corrupted DBs
  • Proper key mapping: Canonical↔concrete layer key conversion maintains consistency across the app architecture
  • SQLite safety: Read-only database connections and proper resource cleanup with try/finally blocks

Rich Examples Strategy

  • Innovative approach: Dropping aggregate statistics in favor of inline per-token annotations (<<<token (ci:0.8, act:0.12)>>>) is a clever way to provide richer evidence to LLMs
  • Type-safe implementation: Proper discriminated union integration with StrategyConfig and clean dispatch logic
  • Consistent prompting: Follows established prompt structure patterns while innovating on data presentation

Frontend Implementation

  • Modern Svelte patterns: Excellent use of Svelte 5 features like $state, $derived, and SvelteMap for reactivity
  • Smart caching: Hierarchical cache structure (detailCache) prevents redundant API calls
  • Responsive UI: Clean two-panel layout with debounced loading and proper loading states
  • Type safety: Strong TypeScript types throughout the frontend API layer

🔍 Areas for Improvement

Minor Code Quality Issues

  1. Error handling in autointerp_compare.py:149,182:

    assert db_path.exists(), f"No interp.db at {subrun_dir}"

    The repository guidelines emphasize fail-fast with assertions, so this is actually good! However, consider if this should return a 404 HTTP error instead for better API semantics.

  2. URL encoding redundancy in frontend API:

    `/api/autointerp_compare/subruns/${encodeURIComponent(subrunId)}/interpretations/${encodeURIComponent(layer)}/${componentIdx}`

    The subrunId (format: a-YYYYMMDD_HHMMSS) and layer names are safe identifiers that don't need URI encoding, though encoding doesn't hurt.

  3. Potential memory growth in AutointerpComparer.svelte:
    The detailCache grows indefinitely as users navigate between components. Consider implementing cache eviction for very long sessions, though this is likely not a practical concern.

Design Considerations

  1. Rich examples max limit: max_examples: int = 30 might be too many for very verbose prompts with the new inline annotation format. Consider if this needs adjustment.

  2. Score display precision: Using Math.round(score * 100) for scores could benefit from 1 decimal place for better precision (e.g., 85.7% vs 86%).

🎯 Architecture Assessment

The implementation follows SPD's architectural principles well:

  • Fail-fast design: Proper assertions and error propagation
  • Type safety: Discriminated unions, Pydantic models, and TypeScript interfaces
  • Separation of concerns: Clear separation between API, business logic, and UI layers
  • Consistency: Follows established patterns in existing autointerp code

🛡️ Security & Performance

  • No security issues: Proper input validation, read-only database access, and safe URL construction
  • Performance considerations: Efficient caching, debounced loading, and minimal API calls
  • Resource management: Proper SQLite connection cleanup and error boundaries

📋 Testing Notes

The PR mentions manual testing was performed. For future iterations, consider:

  • Unit tests for the key mapping functions (_concrete_to_canonical_key, etc.)
  • Integration tests for the new API endpoints
  • Frontend component tests for the compare tab

Verdict: APPROVED

This is a high-quality implementation that meaningfully extends the autointerp system. The rich examples strategy provides a novel approach to component interpretation, and the compare tab addresses a clear user need. The code follows repository standards and introduces no breaking changes.

The minor improvements suggested above are not blocking issues - they're small optimizations that could be addressed in future iterations if needed.

Great work on this feature! 🚀


@ocg-goodfire ocg-goodfire merged commit a67e43c into dev Mar 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant