Skip to content
This repository

warning: regexp match /.../n against to UTF-8 string #87

Closed
mhfs opened this Issue June 04, 2010 · 25 comments

10 participants

Marcelo Silveira Jonas Nicklas Rodrigo Rosenfeld Rosas Christos Trochalakis Cezary Baginski jwilsonsprings Dan Carper Lee Byrd Jeroen Houben Mischa Fierer
Marcelo Silveira
mhfs commented June 04, 2010

Hey there,

In one of my specs I'm getting several warnings because of this line:

click 'Enviar instruções de redefinição de senha'

gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string
gems/rack-1.1.0/lib/rack/utils.rb:15: warning: regexp match /.../n against to UTF-8 string

In case it helps, here is my Gemfile content https://gist.github.com/c0e8f04433cf1affcf84

Thanks,
Marcelo

Marcelo Silveira
mhfs commented June 04, 2010

Oops ... forgot to mention I'm using ruby-1.9.2-preview3.

Jonas Nicklas
Owner

sounds like this is coming from inside rack, any evidence this is a Capybara problem?

Rodrigo Rosenfeld Rosas

I'm also using Capybara and I'm having exactly the same problem, but I can't tell you it is a Capybara problem. I don't know what is the cause... I'm also having this warnings:

Non US-ASCII detected and no charset defined.
Defaulting to UTF-8, set your own if this is incorrect.

These are probably related to mail delivering...

Jonas Nicklas
Owner

God I hate Ruby 1.9 charset issues :(

Rodrigo Rosenfeld Rosas

:)

Jonas Nicklas
Owner

I find it very unlikely that this is a Capybara issue. Closing this. If someone can prove to me otherwise, please reopen!

Christos Trochalakis

Yes, Rack::Utils.escape uses an ascii regural expression on unicode data, notice the '/n'

def escape(s)
  s.to_s.gsub(/([^ a-zA-Z0-9_.-]+)/n) {
    '%'+$1.unpack('H2'*bytesize($1)).join('%').upcase
  }.tr(' ', '+')
end

But I have no idea where this gets called :|

Cezary Baginski
e2 commented August 16, 2010

Its because of the _snowman ...

Cezary Baginski
e2 commented August 16, 2010

Here is a extremely bad patch for capybara/driver/rack_test_driver.rb:

  def process(method, path, attributes = {})
    return if path.gsub(/^#{request_path}/, '') =~ /^#/

    if Kernel.const_defined?(:Encoding)
      att = {} 
      attributes.each_pair do |k,v|
        key = k.dup.force_encoding(Encoding::ASCII_8BIT)
        att[key] = v.dup.force_encoding(Encoding::ASCII_8BIT)
      end
      path = path.dup.force_encoding(Encoding::ASCII_8BIT)
    else
      att = attributes
    end

    send(method, path, att, env)
    follow_redirects!
  end

The same goes for the submit method.

Jonas Nicklas
Owner

I don't know much about encodings, but that seems completely wrong to me. Unless the attributes are actually in ASCII_8BIT, then force encoding them there seems like a bad idea, and if they're not ASCII, then why cast them? Wouldn't that just lead to encoding errors? I think in effect we are removing the encoding information from the Strings for no gain. The problem lies deeper than this. I'm not sure how to fix it, but I'm pretty sure this isn't it.

Cezary Baginski
e2 commented August 17, 2010

ASCII_8BIT is an alias for binary and 'force_encoding' does exactly what you mentioned: it removes the encoding. You cannot "encode" to binary, and as for speed, this doesn't have overhead, apart from looking very ugly. The fact is that frameworks such as cucumber don't guarantee binary strings, and they don't seem to belong to rack.

Why Capybara? Because rack works on binary data and is optimized for real world applications, where parameters are binary, because you cannot effectively make assumptions about the encoding (what about UTF-16? Shift-JIS? - they will crash on the /u parameter). Capybara has a driver that passes utf-8 strings that occur only because the parameters are crafted in the cucumber tests and never cast to binary.

In summary, the warning is there because making thing assume utf-8 is fundamentally wrong and suggests a deeper problem, as you have described.

Summary: In my opinion Capybara should simulate the real world by passing unencoded strings (ASCII_8BIT as named in ruby), instead of getting rack to work around acceptance tests, which provide non-binary strings and grow encoding handling functionality.

Also, note that the US_ASCII is the only exception that is compatible with ASCII_8BIT ('binary' in normal people's terms) and ruby handles automatic conversion as a special case. Thats why it works without warning until utf-8 characters are used (_snowman in this case).

I deeply recommend the following: http://yehudakatz.com/2010/05/17/encodings-unabridged/

You can read through ruby-core (especially Yui Naruse's comments) on why ruby works this way and there is no "binary" encoding.

Jonas Nicklas
Owner

That was a case well made, I'm actually convinced. Can you provide a patch with tests? That'd be absolutely awesome! This whole encoding thing is giving me headaches (and yes, I've read through Yehuda's article, and a number of others), so I feel ill equipped to tackle this.

Cezary Baginski
e2 commented August 19, 2010

Rushed patch with tests here:

http://github.com/e2/capybara/commit/c8a55c4011c2f91943ee9aeebe3b81a4420c3ecf

I don't like the solution and you can completely redo it without crediting me at all - and I will be completely happy. Just let me know once you have patched it so I can nuke my fork...

Thanks :)

P.S. Sorry about "bonus" indenting - I just didn't want to rebase+squash AGAIN...

Jonas Nicklas
Owner

Good job e2! I actually thought that solution was pretty decent. I've merged it in, if someone wants to improve on it, they're welcome, but this looks good for now.

jwilsonsprings

Humm, I upgraded to this while using (ruby 1.9.2, Rails 3.0.0). While I don't get the warning, things are worse off. My tests fail with a :

undefined local variable or method `node' for # (NameError)
./features/step_definitions/web_steps.rb:35:in `block (2 levels) in '
./features/step_definitions/web_steps.rb:14:in `with_scope'
./features/step_definitions/web_steps.rb:34:in `/^(?:|I )follow "([^\"]*)"(?: within "([^\"]*)")?$/'

When I try to use Spork, I get this:

/home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:573:in `load': too large packet 67654656 (DRb::DRbConnError)
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:632:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:918:in `recv_reply'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1197:in `send_message'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1088:in `block (2 levels) in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1172:in `open'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1087:in `block in method_missing'
    from /home/jwilson/.rvm/rubies/ruby-1.9.2-p0/lib/ruby/1.9.1/drb/drb.rb:1105:in `with_friend'
Exception encountered: #
backtrace:
jwilsonsprings

To be fair, I just pulled edge from github, so possible this is something else... Falling back to released 0.3.9

Jonas Nicklas
Owner

You shouldn't be getting that error on the latest master. It's a problem where cucumber monkey-patches a Capybara method which no longer exists. There's a line in env.rb along the lines of:

 require 'cucumber/capybara-javascript-emulation'

or something to that effect. Remove it and the error will go away.

jwilsonsprings

Yep, that did it! Thanks.

Dan Carper

Deleting this makes clicking normal links work, but for me at least then links with onclick's (generated from link_to ... :method => :post) fails..

Any ideas?

Lee Byrd

Having the same problem. Any resolution?

Jeroen Houben

whatever happened to this issue?

Jeroen Houben

OK seems to be continued here: #243

Jonas Nicklas
Owner

tl;dr: upgrade to Rack 1.3.0

Mischa Fierer

tl;dr 2:

A lot of things don't support Rack 1.3.0 yet (formtastic, rails 3.0.9, sendgrid-rails, etc). For now, you can put this in an initializer or support file:

module Rack
  module Utils
    def escape(s)
      CGI.escape(s.to_s)
    end
    def unescape(s)
      CGI.unescape(s)
    end
  end
end
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.