Skip to content

Expose byte metadata from HtmlHandlers.extract_html_content #6

@nshkrdotcom

Description

@nshkrdotcom

Summary

Layer 3's binary optimization still has to manually convert the grapheme-based chars_consumed coming back from HtmlHandlers.extract_html_content/2 into byte offsets. We just fixed one caller by recomputing the byte span, but each caller has to copy that logic.

Proposed Fix

Have HtmlHandlers.extract_html_content/2 return both grapheme and byte counts (for example {content, chars_consumed, bytes_consumed}) so every caller can rely on a single source of truth. Update the IO-list and binary paths to use the new field and drop their ad-hoc math.

Benefits

  • Eliminates duplicated slicing logic
  • Prevents future drift when we add new consumers
  • Makes it harder to regress on multi-byte graphemes (CRLF, emoji, etc.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions