Skip to content
This repository

Capybara rack test driver + rack master = encoding issues #243

Closed
slbug opened this Issue January 07, 2011 · 22 comments

9 participants

bUg. Jonas Nicklas jack dempsey John Firebaugh Matt Culpepper Rick DeNatale Josh Schairbaum Mischa Fierer Petter Remen
bUg.

rack escape uses '/u' flag - https://github.com/rack/rack/blob/master/lib/rack/utils.rb#L16
but capybara to_binary method forces utf8 hidden field value to '\xE2\x9C\x93'.

because of this escape method fail with 'incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)'

Jonas Nicklas
Owner

Ohh god, not the encoding issues again. Sorry, I know nothing of this stuff. Please provide a patch and I'll take a look, but I'm afraid I can't help you with this.

bUg.

i'm not sure if my patch won't break anything, but it works for me. just add one line in to_binary method.

def to_binary(object)
  return object if Rails.version.to_f >= 3.1
  #rest code
end
Jonas Nicklas
Owner

Hmm, I don't think that's the right way. We want to do something that works universally, and we definitely don't want to code anything about Rails into Capybara.

jack dempsey

Yep, on rails 3.1 here and having same issue fwiw. I'm just bundling against a locally hacked capyabara for the time being. Hope to have a sec to think more about this, but I'm in the same boat as you jonas--just don't know much about this stuff.

Jonas Nicklas
Owner

I've made some headway on this issue. I recently encountered it myself, and tried to figure it all out. Here's how it is now, as best as I can figure:

  1. Capybara gets a UTF-8 string as a param, that string contains a multibyte char (say å)
  2. Capybara forces this string into binary encoding, essentially leaving it unencoded
  3. Capybara passes this string into Rack-Test
  4. Rack-Test tries to escape the string via Rack::Utils.escape
  5. Rack::Utils tries to find all non-alphanumeric characters and URL encode them, it uses a UTF-8 regexp for this
  6. Matching the UTF-8 regexp against the binary string fails, probably because Ruby can't figure out what the hell to do with the unencoded å.

So this presents a pretty tricky problem. The idea up until now has been that, since a browser would not be sending the strings in a specific encoding, and instead it would be up to the framework to figure out how they are encoded, Capybara has stripped off all encodings. However, since rack-test is trying to escape the params it gets confused by a string that is seemingly encoded incorrectly.

I'm not sure that rack-test is doing something wrong here, though it seems that using the UTF-8 regexp is just asking for trouble honestly, what if someone sends in a string with a completely different encoding, surely that would break stuff?

Is Capybara doing the rick thing? Is rack-test? Shit, I have no idea :(

John Firebaugh
Collaborator

Actually, a browser would be sending the strings in a specific encoding -- either the page encoding or that specified by the accept-charset attribute of the form. This is important, because the percent-encoding done for application/x-www-form-urlencoded is sensitive to the encoding of the form data: for each non-alphanumeric character, take the bytes representing that character in the desired encoding, and percent encode each of them.

This implies two things:

  • Rack::Utils.escape must support strings in any encoding. (Here's how.)
  • Capybara should pass along the string in its existing encoding, rather than forcing a binary encoding.

Most of the time, folks will only be interested in testing with ASCII or UTF8 strings. But both Capybara and Rack should support other encodings as well -- that's where the above points matter.

So it looks likes there's bugs on both sides.

Jonas Nicklas
Owner

Agreed that the browser would send them in a specific encoding, but they would not be tagged as such in Ruby, until the server figures out what the proper encoding is, no? That is, the Rack server would have to work out what kind of encoding has been submitted and mark the strings sent in as encoded in that encoding.

If Capybara already marks the strings as UTF-8 encoded, that is not how a browser would behave (they know nothing about Ruby after all, and in the end only send binary data).

Or am I wrong about this? My undestanding of this whole issue is rather vague ;)

John Firebaugh
Collaborator

Here's the spec.

Think of URL-encoding as being a function of text in a particular encoding. Since it operates at the byte level, the output depends on both the text and the encoding. Ruby strings conveniently carry their encoding along with them, so the natural interface for the URL-encoding function in Ruby is to take a single string parameter. But it does need to know the encoding in order for it to do its job correctly; we can't go stripping the encoding off beforehand (which is what forcing a binary encoding is essentially doing).

It's true that the original encoding can't be reconstructed solely from the URL-encoded bytes sent over the wire. The server needs to know what the original encoding was somehow. Typically it uses the charset from the request's content-type header, or it might assume a fixed encoding in situations where that's reasonable (e.g. if it serves all the forms with accept-charset=utf-8 [modulo browser bugs that the Rails "snowman" is meant to solve]).

John Firebaugh
Collaborator

Here's my pull request for Rack: rack/rack#140

Want me to fix the Capybara side too?

Matt Culpepper
mculp commented April 10, 2011

I encountered this issue today and ended up with the same CGI.escape fix as @jfirebaugh down in Rack::Util.

Jonas Nicklas
Owner

@jfirebaugh: that makes perfect sense. I think I understand now. If you'd like to fix the Capybara side of this issue, that'd be fantastic, and maybe we can figure this out once and for all.

Jonas Nicklas
Owner

@jfirebaugh: any progress? if you don't have time to fix this, could you explain what you intended to change? Really want to get this fixed.

John Firebaugh
Collaborator

The capybara side of the fix is simply to remove Capybara::Driver::RackTest#to_binary and all its usages.

However, that depends on the bug being fixed Rack. If it isn't, you could either get an encoding warning (with Rack 1.2.2) or the incompatible encoding exception (with Rack master if you pass non-UTF8 encoded strings). I'm not sure what to do about that, nor what the timeline is for a new Rack release.

Rick DeNatale

What's the status of this. I'm trying to move an app to Rails 3.1 beta and haver run into it.

I can't tell if the current rack has a variant of @jfirebaugh's pull request. I see that the pull request was accepted but the code looks a bit different.

Rick DeNatale

To answer part of my own question.

I monkey-patched Capybara::Driver::RackTest#to_binary to always just return it's argument, but I still am getting the timeout. So, I guess not. Feels like I might have to temporarily back out of upgrading to rails 3.1 for this app.

John Firebaugh
Collaborator

Rack 1.3.0 beta includes a fix (a followup to my initial pull request). Has anyone tested capybara with it yet? (I haven't had a chance.)

Jonas Nicklas jnicklas closed this issue from a commit May 19, 2011
Jonas Nicklas Remove to_binary, closes #243 and #340
After investigating this thoroughly, I have come
to the conclusion that the need for to_binary
came from a bug in rack, which has subsequently
been fixed. If warnings show up after this change
they should disappear with the release of rack 1.3
aeb62d6
Jonas Nicklas jnicklas closed this in aeb62d6 May 19, 2011
Jonas Nicklas
Owner

After going through this once again, it seems that the reason we originally added to_binary was just a hack around the fact that Rack was doing something inherently broken. The issue seems to be fixed in Rack master and everything works fine under the Rack 1.3.0.beta.

I've removed to_binary and this should fix the issue.

So for the time being people will have to upgrade to the latest Rack on 1.9. I thought about some kind of backwards compatible fix, but I just couldn't figure out how to do it :(

Josh Schairbaum

I tried to upgrade my Rails 3.0.9 app to Rack 1.3, but both Rails and Rspec-Rails both claim to require Rack 1.2.6, although I don't see it explicity listed in rspec-rails. Has anyone else run into this?

Jonas Nicklas
Owner

@jschairb: that is the reason we did not put in a dependency on Rack 1.3. You'll probably have to upgrade to Rails 3.1 in order to dim the warnings.

Josh Schairbaum

@jnicklas thanks for the response!

Mischa Fierer

In the meantime, if you can't update to rack 3 b/c you don't want to upgrade to rails 3.1, you can put this in an initializer or in a support file:

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    def unescape(s)
      CGI.unescape(s)
    end
  end
end

<3

Ryan Bigg radar referenced this issue from a commit in reInteractive/capybara May 19, 2011
Jonas Nicklas Remove to_binary, closes #243 and #340
After investigating this thoroughly, I have come
to the conclusion that the need for to_binary
came from a bug in rack, which has subsequently
been fixed. If warnings show up after this change
they should disappear with the release of rack 1.3
6ef4691
Petter Remen

@mischa I had to add a couple of lines to your fix

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    module_function :escape

    def unescape(s)
      CGI.unescape(s)
    end
    module_function :unescape
  end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.