Skip to content

BUG: Search highlight regions break across multi-byte UTF-8 boundaries #1178

@andrinoff

Description

@andrinoff

Describe the bug

When matcha highlights a query match in a body or subject, the highlight start/end indices are byte offsets. Hits inside multi-byte UTF-8 (CJK, accents, emoji) split a rune in half: lipgloss then renders a broken rune followed by the styled prefix of the next rune.

Expected behavior

Track highlight regions as rune offsets, not byte offsets. Convert to byte offsets only when slicing for lipgloss.Render.

Why it's hard

The match-finder, the renderer, and the cache layer all use byte offsets today. Threading rune offsets through requires touching several layers and adding tests on Greek / Japanese / Arabic samples.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions