Workaround for JRuby bug with UTF-8 encoding on non-English Windows #3767

Merged
merged 2 commits into from Mar 31, 2016

Conversation

Projects
None yet
2 participants
@perlun
Contributor

perlun commented Mar 30, 2016

We discovered this with a customer yesterday. DNS resolving (using Resolv::DNS#getaddresses) failed. It turned out that the underlying problem was that the way win32/registry.rb retrieves error messages from Windows is flawed, under certain circumstances.

The bug only appears if:

  • JAVA_OPTS includes -Dfile.encoding=UTF-8
  • The environment you run it on is a non-English Windows installation, or more specifically: a Windows installation where the error message(s) include some non-ASCII character. The specific use case here is a Swedish Windows installation, where the error message for error number 2 (File not found) reads as "Filen du söker finns inte", or something similar. (The debugging here was complicated by the fact that I don't have a Swedish Windows myself, only the customer).

Here is a test script that reproduces the problem given that you have a Swedish Windows:

require 'win32/resolv'
puts Encoding.locale_charmap
puts Win32::Resolv.send(:get_info).inspect

Since I don't have a Swedish Windows, I managed to reproduce the same error locally by adding this code into the Error constructor:

msg = "Malm\xf6, G\xf6teborg och V\xe4xj\xf6"

That gives me the same error: ArgumentError: invalid byte sequence in UTF-8. The original bug occurs in the msg.tr call, since the string it operates on is marked as UTF-8 (because of the force_encoding call), where it is in fact Windows-1252 encoded (since we call FormatMessageA, which returns Windows-1252 encoded strings...). IMHO, this is clearly a bug.


Now, to the solution. The "correct" approach would be to not use the ANSI version of the system call at all, but instead use the "wide" version. This is the approach MRI has taken, the bug was fixed there almost 3 years ago. :) ruby/ruby@9db6beb is the specific commit, fixing the bug reported in this issue: https://bugs.ruby-lang.org/issues/8508

I tried that first, by copying over their current version of the Error class. The only problem is that it depends on win32/importer, which we don't have, which in turn depends on fiddle/import which we don't seem to have either. You get the picture.

Therefore, I tried with a simpler workaround instead: if we get an exception in this method, let's just retry by fetching the en-US error message instead. (inspired by how the code on the MRI side now looks: https://github.com/ruby/ruby/blob/trunk/ext/win32/lib/win32/registry.rb)
I have tested this with the customer; it works correctly as far as the code no longer crashes. But, it fails to retrieve the English error message (possibly because it's not available on that Windows installation). I am willing to live with that for now; if someone wants to fix that part as well, be my guest. The critical part for me and the customer is that the code must not crash on non-ASCII Windows installations.

Workaround for JRuby bug with UTF-8 encoding on non-English Windows
We discovered this with a customer yesterday. DNS resolving (using `Resolv::DNS#getaddresses`) failed. It turned out that the underlying problem was that the way `win32/registry.rb` retrieves error messages from Windows is flawed, under certain circumstances.

The bug only appears if:

- `JAVA_OPTS` includes `-Dfile.encoding=UTF-8`
- The environment you run it on is a _non-English_ Windows installation, or more specifically: a Windows installation where the error message(s) include some non-ASCII character. The specific use case here is a Swedish Windows installation, where the error message for error number 2 (File not found) reads as "Filen du söker finns inte", or something similar. (The debugging here was complicated by the fact that I don't have a Swedish Windows myself, only the customer).

Here is a test script that reproduces the problem given that you have a Swedish Windows:

```ruby

require 'win32/resolv'
puts Encoding.locale_charmap
puts Win32::Resolv.send(:get_info).inspect
```

Since I don't have a Swedish Windows, I managed to reproduce the same error locally by adding this code into the `Error` constructor:

```ruby
msg = "Malm\xf6, G\xf6teborg och V\xe4xj\xf6"
```

That gives me the same error: `ArgumentError: invalid byte sequence in UTF-8`. The original bug occurs in the `msg.tr` call, since the string it operates on is _marked_ as UTF-8 (because of the `force_encoding` call), where it is in fact Windows-1252 encoded (since we call `FormatMessageA`, which returns Windows-1252 encoded strings...). IMHO, this is clearly a bug.

---

Now, to the solution. The "correct" approach would be to not use the ANSI version of the system call _at all_, but instead use the "wide" version. This is the approach MRI has taken, the bug was fixed there almost 3 years ago. :) ruby/ruby@9db6beb is the specific commit, fixing the bug reported in this issue: https://bugs.ruby-lang.org/issues/8508

I tried that first, by copying over their current version of the `Error` class. The only problem is that it depends on `win32/importer`, which we don't have, which in turn depends on `fiddle/import` which we don't seem to have either. You get the picture.

Therefore, I tried with a simpler workaround instead: _if_ we get an exception in this method, let's just retry by fetching the `en-US` error message instead. (inspired by how the code on the MRI side now looks: https://github.com/ruby/ruby/blob/trunk/ext/win32/lib/win32/registry.rb)
I have tested this with the customer; it works correctly as far as the code no longer crashes. _But_, it fails to retrieve the English error message (possibly because it's not available on that Windows installation). I am willing to live with that for now; if someone wants to fix that part as well, be my guest. The critical part for me and the customer is that the code must not crash on non-ASCII Windows installations.
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Mar 30, 2016

Member

@perlun that particular commit (e.g. diff) looks like it would just work wouldn't it? Or are there some companion commits which will break if we just replace the *A with *W here?

Member

enebo commented Mar 30, 2016

@perlun that particular commit (e.g. diff) looks like it would just work wouldn't it? Or are there some companion commits which will break if we just replace the *A with *W here?

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Mar 31, 2016

Contributor

It wasn't applicable straight away, since it relied on stuff we don't use (like dl/import). I tried to use the file as-is from their revision, which didn't work.

Anyway, I did a bit of a "compromise" now, bringing in the essence of their change but making it work in our codebase. Have tested it locally and this version now works fine on my machine at least; will verify with customer as well.

Contributor

perlun commented Mar 31, 2016

It wasn't applicable straight away, since it relied on stuff we don't use (like dl/import). I tried to use the file as-is from their revision, which didn't work.

Anyway, I did a bit of a "compromise" now, bringing in the essence of their change but making it work in our codebase. Have tested it locally and this version now works fine on my machine at least; will verify with customer as well.

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Mar 31, 2016

Contributor

Verified with customer - works even better than my previous approach, the non-ASCII error message is properly retrieved now with this change.

Contributor

perlun commented Mar 31, 2016

Verified with customer - works even better than my previous approach, the non-ASCII error message is properly retrieved now with this change.

@enebo enebo added the windows label Mar 31, 2016

@enebo enebo added this to the JRuby 1.7.25 milestone Mar 31, 2016

@enebo enebo merged commit a53970c into jruby:jruby-1_7 Mar 31, 2016

0 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Mar 31, 2016

Member

@perlun great! I think we have not updated this impl because of the deps it uses in future revisions. Without supporting those we might need to get a little more inventive like this and probably change any other A signatures to W signatures. This gets you past your current hurdle though so let's roll with it. Thanks for the PR and followup work.

Member

enebo commented Mar 31, 2016

@perlun great! I think we have not updated this impl because of the deps it uses in future revisions. Without supporting those we might need to get a little more inventive like this and probably change any other A signatures to W signatures. This gets you past your current hurdle though so let's roll with it. Thanks for the PR and followup work.

@perlun perlun deleted the ecraft:fix-win32-registry-error-handling-bug-with-utf8-windows-other-languages branch Apr 1, 2016

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Apr 1, 2016

Contributor

Thanks Thomas! You probably get this question all the time, but when do you anticipate 1.7.25 to be officially released? (it's no rush for me personally, just curious - the customer already received a working -SNAPSHOT.jar file)

Contributor

perlun commented Apr 1, 2016

Thanks Thomas! You probably get this question all the time, but when do you anticipate 1.7.25 to be officially released? (it's no rush for me personally, just curious - the customer already received a working -SNAPSHOT.jar file)

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Apr 1, 2016

Member

@perlun yeah we anticipate it within the next week or so. We actually have slipped a few times but we are spending some extra time to make sure we get appveyor CI runs more green on windows. In theory, 1.7.25 and 9.1 will be the most tested releases against windows in a long time. We will not end up fixing all windows issues but hopefully will be on a much better support trajectory.

Member

enebo commented Apr 1, 2016

@perlun yeah we anticipate it within the next week or so. We actually have slipped a few times but we are spending some extra time to make sure we get appveyor CI runs more green on windows. In theory, 1.7.25 and 9.1 will be the most tested releases against windows in a long time. We will not end up fixing all windows issues but hopefully will be on a much better support trajectory.

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Apr 15, 2016

Contributor

@enebo - long weeks over there? 😜

Contributor

perlun commented Apr 15, 2016

@enebo - long weeks over there? 😜

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Apr 15, 2016

Contributor

(nvm - I just realized you had released it yesterday. Bummer!)

Contributor

perlun commented Apr 15, 2016

(nvm - I just realized you had released it yesterday. Bummer!)

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Apr 15, 2016

Member

@perlun sorry. Did I miss something? I don't get the bummer comment? This was merged. And yeah things did not go as planned on release timewise :|

Member

enebo commented Apr 15, 2016

@perlun sorry. Did I miss something? I don't get the bummer comment? This was merged. And yeah things did not go as planned on release timewise :|

@perlun

This comment has been minimized.

Show comment
Hide comment
@perlun

perlun Apr 15, 2016

Contributor

I meant it like: a shame I wasn't aware the release was already made. Complaining over my own failures. 😃 You're doing a great job, thanks for all of it.

Contributor

perlun commented Apr 15, 2016

I meant it like: a shame I wasn't aware the release was already made. Complaining over my own failures. 😃 You're doing a great job, thanks for all of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment