Add note: Practical product-name extraction with a compound pipeline by herohua · Pull Request #7 · herohua/tech-notes

herohua · 2026-05-27T11:43:26Z

Summary

Adds a new tech-note distilling the compound NER + entity linking pipeline used to extract product-name mentions from documentation content. Written for software engineers and technical PMs without an NLP background.

The note covers:

Why naive approaches (single regex, LLM-only) don't survive contact with real content
A nine-stage cascade: dictionary spotter → exclusion masks → boundary expansion → dedupe → fuzzy match → common-word filter → LLM verifier → tagged emission
Why the LLM runs last (with explicit comparison to "LLM first" and "LLM only" alternatives)
Honest limits of the design
When to use this shape and when to reach for a SOTA alternative instead
How each stage maps to standard names in the entity-linking literature

Dated 2025-11-07 to reflect when the pattern was identified in the source projects.

Test plan

Frontmatter renders correctly on the site (title, date, tags, publish flag)
All inline citation links resolve
ASCII pipeline diagram renders in a monospace block
Tables render correctly (literature/SOTA tables)

🤖 Generated with Claude Code

…call, concede position accuracy is solvable via placeholder rewrite

…, not whole pipeline

…ined when it appears

herohua added 4 commits May 27, 2026 19:42

Add note: Practical product-name extraction with a compound pipeline

3ba5195

Refine 'Just ask the LLM' alternative: lead with attention-bounded re…

8d36663

…call, concede position accuracy is solvable via placeholder rewrite

Refine large-taxonomy guidance: name fuzzy stage as actual bottleneck…

266068a

…, not whole pipeline

Gloss 'cross-encoder reranker' on first use so 'reranker' isn't undef…

5db12e0

…ined when it appears

herohua merged commit 3ba8fb2 into main May 28, 2026

herohua deleted the add-compound-product-name-extraction branch May 28, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add note: Practical product-name extraction with a compound pipeline#7

Add note: Practical product-name extraction with a compound pipeline#7
herohua merged 4 commits into
mainfrom
add-compound-product-name-extraction

herohua commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

herohua commented May 27, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant