Skip to content

Fix encoding error when C parser reads external source files#1657

Merged
st0012 merged 1 commit intomasterfrom
fix-c-parser-encoding-external-source
Mar 22, 2026
Merged

Fix encoding error when C parser reads external source files#1657
st0012 merged 1 commit intomasterfrom
fix-c-parser-encoding-external-source

Conversation

@st0012
Copy link
Member

@st0012 st0012 commented Mar 22, 2026

When a C file references another source file via /* in file.c */, the parser read it with bare File.read which uses Encoding.default_external. On systems where this is US-ASCII (e.g. Debian CI), non-ASCII bytes in the source file cause ArgumentError: invalid byte sequence in US-ASCII in String#scan.

Use RDoc::Encoding.read_file instead, which reads in binary mode and properly handles encoding detection and transcoding.

This was triggered by Ruby commit a2531ba293 which added UTF-8 right arrows (→) in comments in class.c, which is referenced from object.c via /* in class.c */.

When a C file references another source file via `/* in file.c */`,
the parser read it with bare `File.read` which uses
`Encoding.default_external`. On systems where this is US-ASCII
(e.g. Debian CI), non-ASCII bytes in the source file cause
`ArgumentError: invalid byte sequence in US-ASCII` in String#scan.

Use `RDoc::Encoding.read_file` instead, which reads in binary mode
and properly handles encoding detection and transcoding.

This was triggered by Ruby commit a2531ba293 which added UTF-8
right arrows (→) in comments in class.c, which is referenced from
object.c via `/* in class.c */`.
@st0012 st0012 added the bug label Mar 22, 2026
@matzbot
Copy link
Collaborator

matzbot commented Mar 22, 2026

🚀 Preview deployment available at: https://f84077be.rdoc-6cd.pages.dev (commit: a6ee7ae)

@st0012 st0012 marked this pull request as ready for review March 22, 2026 14:38
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@st0012 st0012 merged commit 911b122 into master Mar 22, 2026
78 checks passed
@st0012 st0012 deleted the fix-c-parser-encoding-external-source branch March 22, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants