Skip to content

fix: improve sentiment scoring accuracy#50

Merged
protostatis merged 2 commits into
mainfrom
fix/sentiment-scoring-pipeline
Feb 10, 2026
Merged

fix: improve sentiment scoring accuracy#50
protostatis merged 2 commits into
mainfrom
fix/sentiment-scoring-pipeline

Conversation

@protostatis
Copy link
Copy Markdown
Owner

@protostatis protostatis commented Feb 10, 2026

Summary

  • Fix duplicate segments bug in backfill that scored every comment twice
  • Add paragraph deduplication for historical data in _extract_user_info()
  • Reduce title weight from 0.4 to 0.2 (comments are the signal, not clickbait titles)
  • Reduce tanh amplification from 5x to 3x for more gradual score curve
  • Add 33 neutral anchor phrases for Q&A, tool comparisons, and token burns

Evaluation (154 human-labeled posts)

Metric Before After Change
Pearson r 0.43 0.44 +0.01
MAE 0.131 0.086 -34%
Categorical agreement 46.8% 51.3% +4.5pp
Mean bias -0.050 -0.029 42% less bearish

Full methodology and analysis in docs/scoring_accuracy_audit.md.

Test plan

  • All 59 existing tests pass (uv run pytest tests/ -x)
  • Re-scored 154 labeled posts — MAE, bias, and agreement all improved
  • All 5 spot-check divergence posts improved
  • Monitor live scoring after deploy for unexpected distribution shifts

🤖 Generated with Claude Code

protostatis and others added 2 commits February 9, 2026 18:06
Address systematic bearish bias identified by audit (r=0.43, 50% agreement):

- Fix duplicate segments bug in backfill: store only selftext in content
  field instead of pre-concatenating comments (which were scored twice)
- Add paragraph deduplication in _extract_user_info() for historical data
- Reduce title weight from 0.4 to 0.2 so comment consensus drives scores
- Reduce tanh amplification from 5x to 3x for more gradual score curve
- Add 33 neutral anchor phrases for Q&A, tool comparisons, and token burns
  that were being mis-classified as bearish

Eval on 154 labeled posts: MAE 0.131→0.086 (-34%), categorical agreement
46.8%→51.3%, bearish bias -0.050→-0.029. Pearson r ~0.44 (unchanged —
limited by cosine-similarity approach ceiling).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the 154-post labeled evaluation, metrics (Pearson r, MAE,
categorical agreement, bias), root causes found, fixes applied, and
before/after results. Includes reproduction steps and next-steps
analysis for the scoring approach ceiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@protostatis protostatis merged commit 1b57573 into main Feb 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant