Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

Fix hang when gemspec has incompatible encoding #6279

Merged
merged 2 commits into from
Feb 1, 2018

Conversation

deivid-rodriguez
Copy link
Member

What was the end-user problem that led to this PR?

The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to LANG=C. See here for an example.

In bundler's master, bundle install seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like LANG=C bundle install with such a gemspec.

Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think bundler should just work in this situation, or at least give the user a more helpful error message. And of course never hang either.

What was your diagnosis of the problem?

My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message.

What is your fix for the problem, implemented in this PR?

My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message:

diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb
index 37bfe3674..79ef563d1 100644
--- a/lib/bundler/dsl.rb
+++ b/lib/bundler/dsl.rb
@@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0.
 
       def parse_line_number_from_description
         description = self.description
-        if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/
+        return [nil, description] unless dsl_path
+
+        quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path))
+        quoted_dsl_path = Regexp.quote(dsl_path.to_s)
+
+        return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description)
+
+        if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/
           trace_line = Regexp.last_match[1]
           description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ")
         end

With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found.

However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the bundle install succeed in this situation.

Why did you choose this fix out of the possible options?

I chose this fix because not only fixes the error message but also seems to make bundler just work.

@ghost
Copy link

ghost commented Jan 31, 2018

Thanks for opening a pull request and helping make Bundler better! Someone from the Bundler team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality.

We use Travis CI to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of Travis CI in the PR status window below.

If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #bundler channel on Slack.

For more information about contributing to the Bundler project feel free to review our CONTRIBUTING guide

@deivid-rodriguez
Copy link
Member Author

I run the specs on my fork and things seem to be ok, except that binread is not available on 1.8.7. I was very surprised that you still run specs against 1.8.7 :O. If you're ok with the patch, let me know how you would like to deal with that.

@deivid-rodriguez
Copy link
Member Author

deivid-rodriguez commented Jan 31, 2018

By the way, I'm now seeing #6254 that seems the best way to go in terms of making this error message more clear. 👍. But as I said, things seem to just hang on master unless I apply this patch...

@hsbt
Copy link
Member

hsbt commented Jan 31, 2018

I was very surprised that you still run specs against 1.8.7

You can replace binread to File.open('/path/to/file', 'rb'){|f| f.read }

@deivid-rodriguez
Copy link
Member Author

Thanks for the tip, @hsbt! I found a read_file method in bundler.rb that does exactly what you suggest so I used it and now all specs pass!

@indirect
Copy link
Member

indirect commented Feb 1, 2018

@deivid-rodriguez good catch, thanks!

@bundlerbot r+

@bundlerbot
Copy link
Collaborator

📌 Commit be337de has been approved by indirect

@bundlerbot
Copy link
Collaborator

⌛ Testing commit be337de with merge 83c23ec...

bundlerbot added a commit that referenced this pull request Feb 1, 2018
Fix hang when gemspec has incompatible encoding

### What was the end-user problem that led to this PR?

The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example.

In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec.

Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either.

### What was your diagnosis of the problem?

My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message.

### What is your fix for the problem, implemented in this PR?

My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message:

```diff
diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb
index 37bfe3674..79ef563d1 100644
--- a/lib/bundler/dsl.rb
+++ b/lib/bundler/dsl.rb
@@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0.

       def parse_line_number_from_description
         description = self.description
-        if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/
+        return [nil, description] unless dsl_path
+
+        quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path))
+        quoted_dsl_path = Regexp.quote(dsl_path.to_s)
+
+        return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description)
+
+        if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/
           trace_line = Regexp.last_match[1]
           description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ")
         end
```

With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found.

However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation.

### Why did you choose this fix out of the possible options?

I chose this fix because not only fixes the error message but also seems to make `bundler` just work.
@bundlerbot
Copy link
Collaborator

☀️ Test successful - status-travis
Approved by: indirect
Pushing 83c23ec to master...

@bundlerbot bundlerbot merged commit be337de into rubygems:master Feb 1, 2018
@colby-swandale colby-swandale added this to the 1.16.2 milestone Feb 2, 2018
@deivid-rodriguez deivid-rodriguez deleted the fix/encoding_bug branch February 2, 2018 00:24
@deivid-rodriguez
Copy link
Member Author

Thanks for the quick review!

@segiddins
Copy link
Member

This is awesome, thanks!

colby-swandale pushed a commit that referenced this pull request Apr 11, 2018
Fix hang when gemspec has incompatible encoding

### What was the end-user problem that led to this PR?

The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example.

In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec.

Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either.

### What was your diagnosis of the problem?

My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message.

### What is your fix for the problem, implemented in this PR?

My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message:

```diff
diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb
index 37bfe3674..79ef563d1 100644
--- a/lib/bundler/dsl.rb
+++ b/lib/bundler/dsl.rb
@@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0.

       def parse_line_number_from_description
         description = self.description
-        if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/
+        return [nil, description] unless dsl_path
+
+        quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path))
+        quoted_dsl_path = Regexp.quote(dsl_path.to_s)
+
+        return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description)
+
+        if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/
           trace_line = Regexp.last_match[1]
           description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ")
         end
```

With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found.

However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation.

### Why did you choose this fix out of the possible options?

I chose this fix because not only fixes the error message but also seems to make `bundler` just work.

(cherry picked from commit 83c23ec)
colby-swandale pushed a commit that referenced this pull request Apr 20, 2018
Fix hang when gemspec has incompatible encoding

### What was the end-user problem that led to this PR?

The problem was that if a gemspec had a non ascii character on it, and the external encoding is not utf-8, bundler would fail with a very cryptic error. CircleCI is very likely to reproduce this bug since its base images default to `LANG=C`. See [here](https://circleci.com/gh/deivid-rodriguez/pry-byebug/12) for an example.

In bundler's master, `bundle install` seems to hang in this same situation. I have no idea why but CPU usage goes crazy (100%) when running something like `LANG=C bundle install` with such a gemspec.

Note that adding a utf-8 magic encoding comment seems to fix the problem, but I think `bundler` should just work in this situation, or at least give the user a more helpful error message. And of course never hang either.

### What was your diagnosis of the problem?

My diagnosis was that bundler was reading an incompatible character from the gemspec and was not able to deal with it, nor give a useful error message.

### What is your fix for the problem, implemented in this PR?

My fix initial fix was to at least give a better error message, preventing the method building the error message itself from blowing up due to incompatible encodings. This is the initial patch that I wrote to improve the error message:

```diff
diff --git a/lib/bundler/dsl.rb b/lib/bundler/dsl.rb
index 37bfe3674..79ef563d1 100644
--- a/lib/bundler/dsl.rb
+++ b/lib/bundler/dsl.rb
@@ -581,7 +581,14 @@ The :#{name} git source is deprecated, and will be removed in Bundler 2.0.

       def parse_line_number_from_description
         description = self.description
-        if dsl_path && description =~ /((#{Regexp.quote File.expand_path(dsl_path)}|#{Regexp.quote dsl_path.to_s}):\d+)/
+        return [nil, description] unless dsl_path
+
+        quoted_expanded_dsl_path = Regexp.quote(File.expand_path(dsl_path))
+        quoted_dsl_path = Regexp.quote(dsl_path.to_s)
+
+        return [nil, description] unless Encoding.compatible?(quoted_dsl_path, description) && Encoding.compatible?(quoted_expanded_dsl_path, description)
+
+        if description =~ /((#{quoted_expanded_dsl_path}|#{quoted_dsl_path}):\d+)/
           trace_line = Regexp.last_match[1]
           description = description.sub(/#{Regexp.quote trace_line}:\s*/, "").sub("\n", " - ")
         end
```

With that patch, the error would be more clear, pointing at the exact line in the gemspec where the incompatible character was found.

However, after that I considered that maybe bundler doesn't need to read the gemspec as a text file, but could read it "binarily". That seemed to fix the problem indeed and made the `bundle install` succeed in this situation.

### Why did you choose this fix out of the possible options?

I chose this fix because not only fixes the error message but also seems to make `bundler` just work.

(cherry picked from commit 83c23ec)
bundlerbot added a commit that referenced this pull request Jun 26, 2018
…n, r=colby-swandale

Respect encodings when reading gemspecs

This PR fixes #6598. Thanks @eregon for the help and the explanation that helped me understand the issue :)!

### What was the end-user problem that led to this PR?

On gems using UTF-8 symbols defined in other files in their gemspecs, `bundle install` would crash.

### What was your diagnosis of the problem?

The problem was that since #6279 gemspecs are read binarily. However, files required from them as read as text. That means that the constants defined on those files and used in the gemspec are interpreted different and thus don't match.

### What is your fix for the problem, implemented in this PR?

My fix is to go back to reading gemspec as text again. Explictly passing the encoding when reading them still fixes the problem that the PR introducing the regression was meant to fix.

### Why did you choose this fix out of the possible options?

I chose this fix because it fixes the problem, and keeps the situation that the PR introducing the regression fixed also working.
colby-swandale pushed a commit that referenced this pull request Jul 10, 2018
…n, r=colby-swandale

Respect encodings when reading gemspecs

This PR fixes #6598. Thanks @eregon for the help and the explanation that helped me understand the issue :)!

### What was the end-user problem that led to this PR?

On gems using UTF-8 symbols defined in other files in their gemspecs, `bundle install` would crash.

### What was your diagnosis of the problem?

The problem was that since #6279 gemspecs are read binarily. However, files required from them as read as text. That means that the constants defined on those files and used in the gemspec are interpreted different and thus don't match.

### What is your fix for the problem, implemented in this PR?

My fix is to go back to reading gemspec as text again. Explictly passing the encoding when reading them still fixes the problem that the PR introducing the regression was meant to fix.

### Why did you choose this fix out of the possible options?

I chose this fix because it fixes the problem, and keeps the situation that the PR introducing the regression fixed also working.

(cherry picked from commit cb18acc)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants