Fix encoding error when C parser reads external source files#1657
Merged
Fix encoding error when C parser reads external source files#1657
Conversation
When a C file references another source file via `/* in file.c */`, the parser read it with bare `File.read` which uses `Encoding.default_external`. On systems where this is US-ASCII (e.g. Debian CI), non-ASCII bytes in the source file cause `ArgumentError: invalid byte sequence in US-ASCII` in String#scan. Use `RDoc::Encoding.read_file` instead, which reads in binary mode and properly handles encoding detection and transcoding. This was triggered by Ruby commit a2531ba293 which added UTF-8 right arrows (→) in comments in class.c, which is referenced from object.c via `/* in class.c */`.
Collaborator
|
🚀 Preview deployment available at: https://f84077be.rdoc-6cd.pages.dev (commit: a6ee7ae) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a C file references another source file via
/* in file.c */, the parser read it with bareFile.readwhich usesEncoding.default_external. On systems where this is US-ASCII (e.g. Debian CI), non-ASCII bytes in the source file causeArgumentError: invalid byte sequence in US-ASCIIin String#scan.Use
RDoc::Encoding.read_fileinstead, which reads in binary mode and properly handles encoding detection and transcoding.This was triggered by Ruby commit
a2531ba293which added UTF-8 right arrows (→) in comments inclass.c, which is referenced fromobject.cvia/* in class.c */.