Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary nokogiri gems on musl based Linux #1990

Closed
wants to merge 2 commits into from

Conversation

larskanis
Copy link
Member

What problem is this PR intended to solve?

As described in #1983 there are two issues when using the nokogiri binary gem on musl based Linux.

The first commit fixes a segfault, the second commit fixes a failing test case. The the commit description for more details.

Have you included adequate test coverage?

Not yet. Tests on Alpine were run per docker run --rm -it ruby:alpine /bin/sh.

Does this change affect the C or the Java implementations?

The C version only.

The binary nokogiri gem built with rake-compiler-dock-1.0.0 segfaults on x86_64-linux-musl.
This happens in:
  lib/nokogiri/xml/reader.rb:92:in `namespaces'
It turned out that any calls to sprintf() with %-arguments fail.

I did not dig deeper on assembly level, since this is the only use of sprintf in nokogiri.
Moreover it can be replaced by calls to ruby string functions easily.

Related to sparklemotion#1983
Musl doesn't recognize cp932 encoding.
It also fails when used on the command line with iconv.
For this test case cp932 can be replaced by Shift_JIS, which is identical for the characters in question and that is handled by musl.

Error was:

  1) Error:
Nokogiri::HTML::TestDocumentEncoding#test_encoding_without_charset:
ArgumentError: invalid byte sequence in UTF-8
    /home/lars/comcard/nokogiri/test/html/test_document_encoding.rb:26:in `test_encoding_without_charset'

Related to sparklemotion#1983
@larskanis
Copy link
Member Author

The test failure is also present, when nokogiri is native compiled on Alpine Linux. It is not specific to the binary gems.

Copy link
Contributor

@zenspider zenspider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a drive-by review... I don't have a stake in this.

}

key = rb_str_conv_enc(key, rb_utf8_encoding(), rb_default_internal_encoding());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how much your code deletes! But this seems like a wide departure from the original, with extra string concatenation for something with known size. EIther the string should be allocated ahead of time (which doesn't look that easy to do with cruby's API), or the concatenation should be avoided (easy). Here's the (rough + untested) cruby equivalent to the original code + your encoding differences:

if (ns->prefix) {
  key = rb_enc_sprintf(rb_utf8_encoding(), "%s:%s", XMLNS_PREFIX, ns->prefix);
} else {
  key = rb_enc_sprintf(rb_utf8_encoding(), "%s", XMLNS_PREFIX);
}
key = rb_str_conv_enc(key, rb_utf8_encoding(), rb_default_internal_encoding());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zenspider for reviewing! rb_enc_str_new_cstr() allocates a struct RString with an embedded string length of 23 bytes at maximum. So each string has a capacity of 23 ASCII characters from the start. Since namespace keys tend to be shorter, usually no reallocation is required. Given that sprintf is usually slower due to format string interpretation, I think the proposed implementation is best how it is.

@zenspider
Copy link
Contributor

THIS looks awesome. I would love to have nokogiri binary gems on alpine.

@flavorjones
Copy link
Member

Just a note - I want to add alpine to the CI pipeline (per note in #1845) before merging this; I'd like to go red (see the segfault reliably on alpine) first, then merge and go green.

flavorjones added a commit that referenced this pull request Mar 29, 2020
flavorjones added a commit that referenced this pull request Mar 29, 2020
also restructure pipelines to have a second gate
related to #1990 and #1845
flavorjones added a commit that referenced this pull request Mar 29, 2020
@flavorjones
Copy link
Member

I've added two new things to the CI pipelines (master and PRs):

These should go red, and I'll manually merge these two commits to make each red build go green.

@flavorjones
Copy link
Member

flavorjones commented Mar 30, 2020

First red build: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/ruby-musl-system/builds/1

  1) Error:
Nokogiri::HTML::TestDocumentEncoding#test_encoding_without_charset:
ArgumentError: invalid byte sequence in UTF-8
    /tmp/build/4d9a0f57/nokogiri/test/html/test_document_encoding.rb:26:in `test_encoding_without_charset'

@flavorjones
Copy link
Member

flavorjones commented Mar 30, 2020

First commit merged: ff99fa5

And the red build is now green: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/ruby-musl-system/builds/3

@flavorjones
Copy link
Member

flavorjones commented Mar 30, 2020

Second red build: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/native-gem-test/builds/1

/usr/local/bundle/gems/nokogiri-2020.0330.1657-x86_64-linux/lib/nokogiri/xml/reader.rb:94: [BUG] Segmentation fault at 0x000000000003e3e6
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux-musl]

-- Control frame information -----------------------------------------------
c:0027 p:---- s:0144 e:000143 CFUNC  :namespaces
c:0026 p:0014 s:0140 e:000139 METHOD /usr/local/bundle/gems/nokogiri-2020.0330.1657-x86_64-linux/lib/nokogiri/xml/reader.rb:94
c:0025 p:0005 s:0134 e:000133 BLOCK  /tmp/build/e0bf947d/nokogiri/test/xml/test_reader.rb:256
c:0024 p:0010 s:0130 e:000129 METHOD /usr/local/bundle/gems/nokogiri-2020.0330.1657-x86_64-linux/lib/nokogiri/xml/reader.rb:111
c:0023 p:0028 s:0125 e:000124 METHOD /tmp/build/e0bf947d/nokogiri/test/xml/test_reader.rb:255

@flavorjones
Copy link
Member

flavorjones commented Mar 30, 2020

Second commit merged: a3d9438

And the red build is now green: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/native-gem-test/builds/2

@flavorjones
Copy link
Member

OK - additional test coverage is all green, after manual merges. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants