fix: improve sentiment scoring accuracy by protostatis · Pull Request #50 · protostatis/panicradar

protostatis · 2026-02-10T00:09:18Z

Summary

Fix duplicate segments bug in backfill that scored every comment twice
Add paragraph deduplication for historical data in _extract_user_info()
Reduce title weight from 0.4 to 0.2 (comments are the signal, not clickbait titles)
Reduce tanh amplification from 5x to 3x for more gradual score curve
Add 33 neutral anchor phrases for Q&A, tool comparisons, and token burns

Evaluation (154 human-labeled posts)

Metric	Before	After	Change
Pearson r	0.43	0.44	+0.01
MAE	0.131	0.086	-34%
Categorical agreement	46.8%	51.3%	+4.5pp
Mean bias	-0.050	-0.029	42% less bearish

Full methodology and analysis in docs/scoring_accuracy_audit.md.

Test plan

All 59 existing tests pass (uv run pytest tests/ -x)
Re-scored 154 labeled posts — MAE, bias, and agreement all improved
All 5 spot-check divergence posts improved
Monitor live scoring after deploy for unexpected distribution shifts

🤖 Generated with Claude Code

Address systematic bearish bias identified by audit (r=0.43, 50% agreement): - Fix duplicate segments bug in backfill: store only selftext in content field instead of pre-concatenating comments (which were scored twice) - Add paragraph deduplication in _extract_user_info() for historical data - Reduce title weight from 0.4 to 0.2 so comment consensus drives scores - Reduce tanh amplification from 5x to 3x for more gradual score curve - Add 33 neutral anchor phrases for Q&A, tool comparisons, and token burns that were being mis-classified as bearish Eval on 154 labeled posts: MAE 0.131→0.086 (-34%), categorical agreement 46.8%→51.3%, bearish bias -0.050→-0.029. Pearson r ~0.44 (unchanged — limited by cosine-similarity approach ceiling). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Documents the 154-post labeled evaluation, metrics (Pearson r, MAE, categorical agreement, bias), root causes found, fixes applied, and before/after results. Includes reproduction steps and next-steps analysis for the scoring approach ceiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

protostatis and others added 2 commits February 9, 2026 18:06

protostatis merged commit 1b57573 into main Feb 10, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve sentiment scoring accuracy#50

fix: improve sentiment scoring accuracy#50
protostatis merged 2 commits into
mainfrom
fix/sentiment-scoring-pipeline

protostatis commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

protostatis commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evaluation (154 human-labeled posts)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

protostatis commented Feb 10, 2026 •

edited

Loading