New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical Regexp are not equal to each other #4259

Closed
andy-twosticks opened this Issue Nov 1, 2016 · 8 comments

Comments

Projects
None yet
4 participants
@andy-twosticks

andy-twosticks commented Nov 1, 2016

Environment

%: uname -a
Linux centos7andy.jhallpr.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

%: jruby -v
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 OpenJDK 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 [linux-x86_64]

Expected Behavior

"Things equal to the same thing are equal to each other" -- Euclid

Regardless of how a Regexp is created or what it contains, identical Regexp should always be equal to each other.

The behaviour below turns up in MRI 1.9.3, too, but appears to be gone by 2.0.0:

%: rvm use ruby-2.0.0 
Using /home/jonea/.rvm/gems/ruby-2.0.0-p648 

%: irb               
2.0.0-p648 :001 > %r|foo/bar| == /foo\/bar/             
 => true                                                

%: rvm use ruby-1.9.3   
Using /home/jonea/.rvm/gems/ruby-1.9.3-p551             

%: irb               
1.9.3-p551 :001 > %r|foo/bar| == /foo\/bar/             
 => false                                               

Actual Behavior

%: jirb
:001 > /foo\/bar/ == Regexp.new("foo/bar")
 => false

:002 > /foo\/bar/ == Regexp.new("foo\/bar")
=> false

:003 > /foo\/bar/ == %r|foo/bar|
=> false

:004 > %r|foo/bar| == Regexp.new("foo\/bar")
=> true

 :005 >  /foobar/ == %r|foobar|
 => true

As you can see, the problem seems to be with Regexp created using the basic slash syntax when the regexp itself also contains a slash.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

@andy-twosticks Nice find! This would point to a difference in how the parser constructs Regex versus how the constructor forms construct Regex.

Member

headius commented Nov 1, 2016

@andy-twosticks Nice find! This would point to a difference in how the parser constructs Regex versus how the constructor forms construct Regex.

@headius headius added the parser label Nov 1, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

@enebo Any idea why this might be? Something odd with the escaping?

Member

headius commented Nov 1, 2016

@enebo Any idea why this might be? Something odd with the escaping?

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Nov 1, 2016

Member

@headius My only guess would be that bytelist after parsing does not remove the \ and it does it in the constructor of the regexp. That would make the bytelists different perhaps?

Member

enebo commented Nov 1, 2016

@headius My only guess would be that bytelist after parsing does not remove the \ and it does it in the constructor of the regexp. That would make the bytelists different perhaps?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

It looks like the sources are difference in these two cases.

[] ~/projects/jruby $ jruby -e 'p /foo\/bar/.source'
"foo\\/bar"

[] ~/projects/jruby $ jruby -e 'p %r[foo/bar].source'
"foo/bar"
Member

headius commented Nov 1, 2016

It looks like the sources are difference in these two cases.

[] ~/projects/jruby $ jruby -e 'p /foo\/bar/.source'
"foo\\/bar"

[] ~/projects/jruby $ jruby -e 'p %r[foo/bar].source'
"foo/bar"
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

Ok, so stepping through the parser showed me that it's constructing the // form using foo\/bar in a ByteList and the %r[] form using foo/bar. Both proceed along the same path from there, and the compiled Regexps that result are the same. Only the sources differ.

Seems like either the one needs to be escaped or the other needs to be unescaped before constructing the Regexp.

Member

headius commented Nov 1, 2016

Ok, so stepping through the parser showed me that it's constructing the // form using foo\/bar in a ByteList and the %r[] form using foo/bar. Both proceed along the same path from there, and the compiled Regexps that result are the same. Only the sources differ.

Seems like either the one needs to be escaped or the other needs to be unescaped before constructing the Regexp.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

Ruby 2.3 for comparison:

[] ~/projects/jruby $ ruby23 -e 'p %r[foo/bar].source'
"foo/bar"

[] ~/projects/jruby $ ruby23 -e 'p /foo\/bar/.source'
"foo/bar"
Member

headius commented Nov 1, 2016

Ruby 2.3 for comparison:

[] ~/projects/jruby $ ruby23 -e 'p %r[foo/bar].source'
"foo/bar"

[] ~/projects/jruby $ ruby23 -e 'p /foo\/bar/.source'
"foo/bar"
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 1, 2016

Member

A bit more comparison with MRI:

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("/foo\/bar/")'
["/", "foo", "/bar", "/"]

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("%r[foo/bar]")'
["%r[", "foo/bar", "]"]

We produce the same tokens, but when combining those string pieces we seem to be escaping the / again. Ripper isn't exactly the same as the normal parser, but it should be close.

Member

headius commented Nov 1, 2016

A bit more comparison with MRI:

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("/foo\/bar/")'
["/", "foo", "/bar", "/"]

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("%r[foo/bar]")'
["%r[", "foo/bar", "]"]

We produce the same tokens, but when combining those string pieces we seem to be escaping the / again. Ripper isn't exactly the same as the normal parser, but it should be close.

@enebo enebo added this to the JRuby 9.1.6.0 milestone Nov 2, 2016

@enebo enebo closed this in 49bd27d Nov 2, 2016

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Nov 2, 2016

Member

So there was an extra append '' in our StringTerm code which MRI did not have nor did our ripper version of StringTerm. So pretty easy to delete that line. Spec added in commit b0abc53.

Member

enebo commented Nov 2, 2016

So there was an extra append '' in our StringTerm code which MRI did not have nor did our ripper version of StringTerm. So pretty easy to delete that line. Spec added in commit b0abc53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment