Identical Regexp are not equal to each other #4259

Closed
andy-twosticks opened this Issue Nov 1, 2016 · 8 comments

Projects

None yet

4 participants

@andy-twosticks
andy-twosticks commented Nov 1, 2016 edited

Environment

%: uname -a
Linux centos7andy.jhallpr.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

%: jruby -v
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 OpenJDK 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 [linux-x86_64]

Expected Behavior

"Things equal to the same thing are equal to each other" -- Euclid

Regardless of how a Regexp is created or what it contains, identical Regexp should always be equal to each other.

The behaviour below turns up in MRI 1.9.3, too, but appears to be gone by 2.0.0:

%: rvm use ruby-2.0.0 
Using /home/jonea/.rvm/gems/ruby-2.0.0-p648 

%: irb               
2.0.0-p648 :001 > %r|foo/bar| == /foo\/bar/             
 => true                                                

%: rvm use ruby-1.9.3   
Using /home/jonea/.rvm/gems/ruby-1.9.3-p551             

%: irb               
1.9.3-p551 :001 > %r|foo/bar| == /foo\/bar/             
 => false                                               

Actual Behavior

%: jirb
:001 > /foo\/bar/ == Regexp.new("foo/bar")
 => false

:002 > /foo\/bar/ == Regexp.new("foo\/bar")
=> false

:003 > /foo\/bar/ == %r|foo/bar|
=> false

:004 > %r|foo/bar| == Regexp.new("foo\/bar")
=> true

 :005 >  /foobar/ == %r|foobar|
 => true

As you can see, the problem seems to be with Regexp created using the basic slash syntax when the regexp itself also contains a slash.

@headius
Member
headius commented Nov 1, 2016

@andy-twosticks Nice find! This would point to a difference in how the parser constructs Regex versus how the constructor forms construct Regex.

@headius headius added the parser label Nov 1, 2016
@headius
Member
headius commented Nov 1, 2016

@enebo Any idea why this might be? Something odd with the escaping?

@enebo
Member
enebo commented Nov 1, 2016

@headius My only guess would be that bytelist after parsing does not remove the \ and it does it in the constructor of the regexp. That would make the bytelists different perhaps?

@headius
Member
headius commented Nov 1, 2016

It looks like the sources are difference in these two cases.

[] ~/projects/jruby $ jruby -e 'p /foo\/bar/.source'
"foo\\/bar"

[] ~/projects/jruby $ jruby -e 'p %r[foo/bar].source'
"foo/bar"
@headius
Member
headius commented Nov 1, 2016

Ok, so stepping through the parser showed me that it's constructing the // form using foo\/bar in a ByteList and the %r[] form using foo/bar. Both proceed along the same path from there, and the compiled Regexps that result are the same. Only the sources differ.

Seems like either the one needs to be escaped or the other needs to be unescaped before constructing the Regexp.

@headius
Member
headius commented Nov 1, 2016

Ruby 2.3 for comparison:

[] ~/projects/jruby $ ruby23 -e 'p %r[foo/bar].source'
"foo/bar"

[] ~/projects/jruby $ ruby23 -e 'p /foo\/bar/.source'
"foo/bar"
@headius
Member
headius commented Nov 1, 2016

A bit more comparison with MRI:

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("/foo\/bar/")'
["/", "foo", "/bar", "/"]

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("%r[foo/bar]")'
["%r[", "foo/bar", "]"]

We produce the same tokens, but when combining those string pieces we seem to be escaping the / again. Ripper isn't exactly the same as the normal parser, but it should be close.

@enebo enebo added this to the JRuby 9.1.6.0 milestone Nov 2, 2016
@enebo enebo closed this in 49bd27d Nov 2, 2016
@enebo
Member
enebo commented Nov 2, 2016

So there was an extra append '' in our StringTerm code which MRI did not have nor did our ripper version of StringTerm. So pretty easy to delete that line. Spec added in commit b0abc53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment