Skip to content

fix(fetchers): avoid utf-8 panic in hn html stripping#115

Merged
chaliy merged 3 commits into
mainfrom
2026-05-17-fix-vulnerability-in-hackernewsfetcher
May 17, 2026
Merged

fix(fetchers): avoid utf-8 panic in hn html stripping#115
chaliy merged 3 commits into
mainfrom
2026-05-17-fix-vulnerability-in-hackernewsfetcher

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented May 17, 2026

Motivation

  • Prevent a panic when stripping HTML from untrusted Hacker News item/comment text by avoiding slicing at non-char UTF-8 boundaries.

Description

  • Iterate the input with html.char_indices() and compute the lookahead slice using idx + c.len_utf8() instead of result.len(), and add a regression assertion strip_html_tags("ab<é>xy<") == "abxy".

Testing

  • Added a focused unit assertion in test_strip_html_tags and ran cargo test -p fetchkit hackernews::tests::test_strip_html_tags, which passed.

Codex Task

@chaliy chaliy merged commit ac94ad4 into main May 17, 2026
11 checks passed
@chaliy chaliy deleted the 2026-05-17-fix-vulnerability-in-hackernewsfetcher branch May 17, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant