Add rich_examples autointerp strategy + compare tab by ocg-goodfire · Pull Request #455 · goodfire-ai/param-decomp

ocg-goodfire · 2026-03-18T12:05:47Z

Description

Two features:

Rich examples autointerp strategy (rich_examples): New interpretation strategy that shows per-token CI and activation values inline (e.g. <<<token (ci:0.8, act:0.12)>>>), letting the LLM judge evidence quality directly from examples rather than relying on aggregate token statistics.
Autointerp Compare tab: New app tab for side-by-side comparison of interpretation results across different autointerp subruns (different strategies, models, etc.) for the same component.

Compare tab layout

Subrun selector: Multiselect chips showing strategy, model, timestamp, completion count, and eval scores
Two-panel view: Stacked interpretation cards (left) + full component data (right: activation examples, token stats, correlations, dataset attributions)
Navigation: Layer dropdown, pagination, go-to-index search

Backend

GET /api/autointerp_compare/subruns — lists all completed subruns with metadata
GET /api/autointerp_compare/subruns/{id}/interpretations — bulk headlines for a subrun
GET /api/autointerp_compare/subruns/{id}/interpretations/{layer}/{idx} — reasoning + prompt detail
Robustly handles corrupted/empty DBs from cancelled jobs

Other changes

InterpDB.has_interpretations_table() for safe schema validation
Multi-provider LLM config support (Anthropic, OpenAI, OpenRouter)
RichExamplesConfig added to strategy discriminated union

Motivation and Context

We're running the new rich_examples strategy against existing dual_view and compact_skeptical runs on Jose (s-55ea3f9b) and need an easy way to compare results side-by-side in the app.

How Has This Been Tested?

basedpyright — 0 errors on all modified Python files
svelte-check — 0 errors, 0 warnings
eslint — clean
prettier — clean
Manual testing with Jose run in the app

Does this PR introduce a breaking change?

No.

New autointerp strategy (rich_examples) that shows per-token CI and activation values inline, letting the LLM judge evidence quality directly. Also adds an Autointerp Compare tab to the app for side-by-side comparison of interpretation results across different strategies/models/subruns. Backend: 3 new endpoints for listing subruns, bulk headlines, and detail. Frontend: SubrunSelector (multiselect chips), stacked SubrunInterpCard, two-panel AutointerpComparer with full component data on the right panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ocg-goodfire · 2026-03-18T12:16:08Z

@claude can you review pls

claude · 2026-03-18T12:16:26Z

Claude finished @ocg-goodfire's task —— View job

PR Review: Rich Examples Autointerp Strategy + Compare Tab

Todo List

Read repository CLAUDE.md guidelines
Review backend changes (API endpoints, database modifications)
Review frontend changes (Svelte components, TypeScript)
Review autointerp strategy implementation
Check for security issues, bugs, performance concerns
Verify code follows repository standards
Provide comprehensive feedback

Review Summary

This is a well-implemented PR that adds two significant features to the SPD autointerp system. The code quality is high and follows the repository's established patterns and conventions. I found no security issues or major bugs.

✅ Strengths

Backend Implementation

Clean API design: The three new endpoints in autointerp_compare.py follow RESTful conventions and use proper FastAPI patterns
Robust error handling: InterpDB.has_interpretations_table() safely validates schema before operations, preventing crashes from corrupted DBs
Proper key mapping: Canonical↔concrete layer key conversion maintains consistency across the app architecture
SQLite safety: Read-only database connections and proper resource cleanup with try/finally blocks

Rich Examples Strategy

Innovative approach: Dropping aggregate statistics in favor of inline per-token annotations (<<<token (ci:0.8, act:0.12)>>>) is a clever way to provide richer evidence to LLMs
Type-safe implementation: Proper discriminated union integration with StrategyConfig and clean dispatch logic
Consistent prompting: Follows established prompt structure patterns while innovating on data presentation

Frontend Implementation

Modern Svelte patterns: Excellent use of Svelte 5 features like $state, $derived, and SvelteMap for reactivity
Smart caching: Hierarchical cache structure (detailCache) prevents redundant API calls
Responsive UI: Clean two-panel layout with debounced loading and proper loading states
Type safety: Strong TypeScript types throughout the frontend API layer

🔍 Areas for Improvement

Minor Code Quality Issues

Error handling in autointerp_compare.py:149,182:
```
assert db_path.exists(), f"No interp.db at {subrun_dir}"
```
The repository guidelines emphasize fail-fast with assertions, so this is actually good! However, consider if this should return a 404 HTTP error instead for better API semantics.
URL encoding redundancy in frontend API:
```
`/api/autointerp_compare/subruns/${encodeURIComponent(subrunId)}/interpretations/${encodeURIComponent(layer)}/${componentIdx}`
```
The subrunId (format: a-YYYYMMDD_HHMMSS) and layer names are safe identifiers that don't need URI encoding, though encoding doesn't hurt.
Potential memory growth in AutointerpComparer.svelte:
The detailCache grows indefinitely as users navigate between components. Consider implementing cache eviction for very long sessions, though this is likely not a practical concern.

Design Considerations

Rich examples max limit: max_examples: int = 30 might be too many for very verbose prompts with the new inline annotation format. Consider if this needs adjustment.
Score display precision: Using Math.round(score * 100) for scores could benefit from 1 decimal place for better precision (e.g., 85.7% vs 86%).

🎯 Architecture Assessment

The implementation follows SPD's architectural principles well:

Fail-fast design: Proper assertions and error propagation
Type safety: Discriminated unions, Pydantic models, and TypeScript interfaces
Separation of concerns: Clear separation between API, business logic, and UI layers
Consistency: Follows established patterns in existing autointerp code

🛡️ Security & Performance

No security issues: Proper input validation, read-only database access, and safe URL construction
Performance considerations: Efficient caching, debounced loading, and minimal API calls
Resource management: Proper SQLite connection cleanup and error boundaries

📋 Testing Notes

The PR mentions manual testing was performed. For future iterations, consider:

Unit tests for the key mapping functions (_concrete_to_canonical_key, etc.)
Integration tests for the new API endpoints
Frontend component tests for the compare tab

Verdict: APPROVED ✅

This is a high-quality implementation that meaningfully extends the autointerp system. The rich examples strategy provides a novel approach to component interpretation, and the compare tab addresses a clear user need. The code follows repository standards and introduces no breaking changes.

The minor improvements suggested above are not blocking issues - they're small optimizations that could be addressed in future iterations if needed.

Great work on this feature! 🚀

Restrict Anthropic autointerp models and use structured outputs

921e410

ocg-goodfire merged commit a67e43c into dev Mar 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rich_examples autointerp strategy + compare tab#455

Add rich_examples autointerp strategy + compare tab#455
ocg-goodfire merged 2 commits intodevfrom
feature/rich-examples-autointerp

ocg-goodfire commented Mar 18, 2026

Uh oh!

ocg-goodfire commented Mar 18, 2026

Uh oh!

claude Bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ocg-goodfire commented Mar 18, 2026

Description

Compare tab layout

Backend

Other changes

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

Uh oh!

ocg-goodfire commented Mar 18, 2026

Uh oh!

claude Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Rich Examples Autointerp Strategy + Compare Tab

Todo List

Review Summary

✅ Strengths

Backend Implementation

Rich Examples Strategy

Frontend Implementation

🔍 Areas for Improvement

Minor Code Quality Issues

Design Considerations

🎯 Architecture Assessment

🛡️ Security & Performance

📋 Testing Notes

Verdict: APPROVED ✅

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Mar 18, 2026 •

edited

Loading