Add rich_examples autointerp strategy + compare tab#455
Conversation
New autointerp strategy (rich_examples) that shows per-token CI and activation values inline, letting the LLM judge evidence quality directly. Also adds an Autointerp Compare tab to the app for side-by-side comparison of interpretation results across different strategies/models/subruns. Backend: 3 new endpoints for listing subruns, bulk headlines, and detail. Frontend: SubrunSelector (multiselect chips), stacked SubrunInterpCard, two-panel AutointerpComparer with full component data on the right panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude can you review pls |
|
Claude finished @ocg-goodfire's task —— View job PR Review: Rich Examples Autointerp Strategy + Compare TabTodo List
Review SummaryThis is a well-implemented PR that adds two significant features to the SPD autointerp system. The code quality is high and follows the repository's established patterns and conventions. I found no security issues or major bugs. ✅ StrengthsBackend Implementation
Rich Examples Strategy
Frontend Implementation
🔍 Areas for ImprovementMinor Code Quality Issues
Design Considerations
🎯 Architecture AssessmentThe implementation follows SPD's architectural principles well:
🛡️ Security & Performance
📋 Testing NotesThe PR mentions manual testing was performed. For future iterations, consider:
Verdict: APPROVED ✅This is a high-quality implementation that meaningfully extends the autointerp system. The rich examples strategy provides a novel approach to component interpretation, and the compare tab addresses a clear user need. The code follows repository standards and introduces no breaking changes. The minor improvements suggested above are not blocking issues - they're small optimizations that could be addressed in future iterations if needed. Great work on this feature! 🚀 |
Description
Two features:
Rich examples autointerp strategy (
rich_examples): New interpretation strategy that shows per-token CI and activation values inline (e.g.<<<token (ci:0.8, act:0.12)>>>), letting the LLM judge evidence quality directly from examples rather than relying on aggregate token statistics.Autointerp Compare tab: New app tab for side-by-side comparison of interpretation results across different autointerp subruns (different strategies, models, etc.) for the same component.
Compare tab layout
Backend
GET /api/autointerp_compare/subruns— lists all completed subruns with metadataGET /api/autointerp_compare/subruns/{id}/interpretations— bulk headlines for a subrunGET /api/autointerp_compare/subruns/{id}/interpretations/{layer}/{idx}— reasoning + prompt detailOther changes
InterpDB.has_interpretations_table()for safe schema validationRichExamplesConfigadded to strategy discriminated unionMotivation and Context
We're running the new
rich_examplesstrategy against existingdual_viewandcompact_skepticalruns on Jose (s-55ea3f9b) and need an easy way to compare results side-by-side in the app.How Has This Been Tested?
basedpyright— 0 errors on all modified Python filessvelte-check— 0 errors, 0 warningseslint— cleanprettier— cleanDoes this PR introduce a breaking change?
No.