Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CRLF -> LF conversion on read for rb_io_fdopen & rb_file_open #9380

Conversation

KJTsanaktsidis
Copy link
Contributor

When opening a file with File.open, and then setting the encoding with IO#set_encoding, it still correctly performs CRLF -> LF conversion on Windows when reading files with a CRLF line ending in them (in text mode).

However, the file is opened instead with either the rb_io_fdopen or rb_file_open APIs from C, the CRLF conversion is NOT set up correctly; it works if the encoding is not specified, but if IO#set_encoding is called, the conversion stops happening. This seems to be because the encflags never get ECONV_DEFAULT_NEWLINE_DECORATOR set in these codepaths.

Concretely, this means that the conversion doesn't happen in the following circumstances:

  • When loading ruby files with require (that calls rb_io_fdopen)
  • When parsing ruuby files with RubyVM::AbstractSyntaxTree (that calls rb_file_open). This then causes the ErrorHighlight tests to fail on windows if git has checked them out with CRLF line endings - the error messages it's testing wind up with literal \r\n sequences in them because the iseq text from the parser contains un-newline-converted strings.

This commit fixes the problem by copy-pasting the relevant snippet which sets this up in rb_io_extract_modeenc (for the File.open path) into the relevant codepaths for rb_io_fdopen and rb_file_open.

[Bug #20101]

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_mswin64_newline_encoding branch from 3c95218 to 24d4117 Compare December 28, 2023 06:03
open_with_rb_io_fdopen(VALUE self, VALUE filename, VALUE read_or_write, VALUE binary_or_text)
{
int omode = 0;
if (rb_eql(read_or_write, rb_id2sym(rb_intern("read")))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is rb_sym_intern_ascii_cstr that returns a Symbol from char*, and a Symbol can be compared by the value itself.

Suggested change
if (rb_eql(read_or_write, rb_id2sym(rb_intern("read")))) {
if (read_or_write == rb_sym_intern_ascii_cstr("read")) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hey I didn't know this, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, rb_sym_intern_ascii_cstr isn't exported so I can't use it from an extension.... I wonder if it should be though? It seems pretty useful. I'll keep rb_id2sym(rb_intern) for now I suppose.

When opening a file with `File.open`, and then setting the encoding with
`IO#set_encoding`, it still correctly performs CRLF -> LF conversion on
Windows when reading files with a CRLF line ending in them (in text
mode).

However, the file is opened instead with either the `rb_io_fdopen` or
`rb_file_open` APIs from C, the CRLF conversion is _NOT_ set up
correctly; it works if the encoding is not specified, but if
`IO#set_encoding` is called, the conversion stops happening. This seems
to be because the encflags never get ECONV_DEFAULT_NEWLINE_DECORATOR
set in these codepaths.

Concretely, this means that the conversion doesn't happen in the
following circumstances:
  * When loading ruby files with require (that calls rb_io_fdopen)
  * When parsing ruuby files with RubyVM::AbstractSyntaxTree (that calls
    rb_file_open).
This then causes the ErrorHighlight tests to fail on windows if git has
checked them out with CRLF line endings - the error messages it's
testing wind up with literal \r\n sequences in them because the iseq
text from the parser contains un-newline-converted strings.

This commit fixes the problem by copy-pasting the relevant snippet which
sets this up in `rb_io_extract_modeenc` (for the File.open path) into
the relevant codepaths for `rb_io_fdopen` and `rb_file_open`.

[Bug #20101]
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_mswin64_newline_encoding branch from 24d4117 to d4ea81c Compare December 28, 2023 06:48
@KJTsanaktsidis KJTsanaktsidis merged commit 31371b2 into ruby:master Jan 10, 2024
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants