-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Fix hang when gemspec has incompatible encoding #6279
Conversation
Thanks for opening a pull request and helping make Bundler better! Someone from the Bundler team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality. We use Travis CI to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of Travis CI in the PR status window below. If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #bundler channel on Slack. For more information about contributing to the Bundler project feel free to review our CONTRIBUTING guide |
I run the specs on my fork and things seem to be ok, except that |
By the way, I'm now seeing #6254 that seems the best way to go in terms of making this error message more clear. 👍. But as I said, things seem to just hang on master unless I apply this patch... |
You can replace |
Not sure about the reason, but binary read fixes it...
30c3692
to
be337de
Compare
Thanks for the tip, @hsbt! I found a |
@deivid-rodriguez good catch, thanks! @bundlerbot r+ |
📌 Commit be337de has been approved by |
Fix hang when gemspec has incompatible encoding ### What was the end-user problem that led to this PR? The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example. In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec. Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either. ### What was your diagnosis of the problem? My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message. ### What is your fix for the problem, implemented in this PR? My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message: ```diff diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb index 37bfe3674..79ef563d1 100644 --- a/lib/bundler/dsl.rb +++ b/lib/bundler/dsl.rb @@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0. def parse_line_number_from_description description = self.description - if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/ + return [nil, description] unless dsl_path + + quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path)) + quoted_dsl_path = Regexp.quote(dsl_path.to_s) + + return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description) + + if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/ trace_line = Regexp.last_match[1] description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ") end ``` With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found. However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation. ### Why did you choose this fix out of the possible options? I chose this fix because not only fixes the error message but also seems to make `bundler` just work.
☀️ Test successful - status-travis |
Thanks for the quick review! |
This is awesome, thanks! |
Fix hang when gemspec has incompatible encoding ### What was the end-user problem that led to this PR? The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example. In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec. Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either. ### What was your diagnosis of the problem? My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message. ### What is your fix for the problem, implemented in this PR? My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message: ```diff diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb index 37bfe3674..79ef563d1 100644 --- a/lib/bundler/dsl.rb +++ b/lib/bundler/dsl.rb @@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0. def parse_line_number_from_description description = self.description - if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/ + return [nil, description] unless dsl_path + + quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path)) + quoted_dsl_path = Regexp.quote(dsl_path.to_s) + + return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description) + + if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/ trace_line = Regexp.last_match[1] description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ") end ``` With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found. However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation. ### Why did you choose this fix out of the possible options? I chose this fix because not only fixes the error message but also seems to make `bundler` just work. (cherry picked from commit 83c23ec)
Fix hang when gemspec has incompatible encoding ### What was the end-user problem that led to this PR? The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example. In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec. Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either. ### What was your diagnosis of the problem? My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message. ### What is your fix for the problem, implemented in this PR? My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message: ```diff diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb index 37bfe3674..79ef563d1 100644 --- a/lib/bundler/dsl.rb +++ b/lib/bundler/dsl.rb @@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0. def parse_line_number_from_description description = self.description - if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/ + return [nil, description] unless dsl_path + + quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path)) + quoted_dsl_path = Regexp.quote(dsl_path.to_s) + + return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description) + + if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/ trace_line = Regexp.last_match[1] description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ") end ``` With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found. However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation. ### Why did you choose this fix out of the possible options? I chose this fix because not only fixes the error message but also seems to make `bundler` just work. (cherry picked from commit 83c23ec)
…n, r=colby-swandale Respect encodings when reading gemspecs This PR fixes #6598. Thanks @eregon for the help and the explanation that helped me understand the issue :)! ### What was the end-user problem that led to this PR? On gems using UTF-8 symbols defined in other files in their gemspecs, `bundle install` would crash. ### What was your diagnosis of the problem? The problem was that since #6279 gemspecs are read binarily. However, files required from them as read as text. That means that the constants defined on those files and used in the gemspec are interpreted different and thus don't match. ### What is your fix for the problem, implemented in this PR? My fix is to go back to reading gemspec as text again. Explictly passing the encoding when reading them still fixes the problem that the PR introducing the regression was meant to fix. ### Why did you choose this fix out of the possible options? I chose this fix because it fixes the problem, and keeps the situation that the PR introducing the regression fixed also working.
…n, r=colby-swandale Respect encodings when reading gemspecs This PR fixes #6598. Thanks @eregon for the help and the explanation that helped me understand the issue :)! ### What was the end-user problem that led to this PR? On gems using UTF-8 symbols defined in other files in their gemspecs, `bundle install` would crash. ### What was your diagnosis of the problem? The problem was that since #6279 gemspecs are read binarily. However, files required from them as read as text. That means that the constants defined on those files and used in the gemspec are interpreted different and thus don't match. ### What is your fix for the problem, implemented in this PR? My fix is to go back to reading gemspec as text again. Explictly passing the encoding when reading them still fixes the problem that the PR introducing the regression was meant to fix. ### Why did you choose this fix out of the possible options? I chose this fix because it fixes the problem, and keeps the situation that the PR introducing the regression fixed also working. (cherry picked from commit cb18acc)
What was the end-user problem that led to this PR?
The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to
LANG=C
. See here for an example.In bundler's master,
bundle install
seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something likeLANG=C bundle install
with such a gemspec.Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think
bundler
should just work in this situation, or at least give the user a more helpful error message. And of course never hang either.What was your diagnosis of the problem?
My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message.
What is your fix for the problem, implemented in this PR?
My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message:
With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found.
However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the
bundle install
succeed in this situation.Why did you choose this fix out of the possible options?
I chose this fix because not only fixes the error message but also seems to make
bundler
just work.