Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby regexp warning: character class has duplicated range #439

Closed
mkllnk opened this issue Jun 2, 2023 · 10 comments
Closed

Ruby regexp warning: character class has duplicated range #439

mkllnk opened this issue Jun 2, 2023 · 10 comments
Assignees

Comments

@mkllnk
Copy link

mkllnk commented Jun 2, 2023

Thank you for your work here. Great job. There's just one little thing that's slowing be down at the moment:

Calling RDF::URI#canonicalize triggers a Ruby warning:

gems/rdf-3.2.10/lib/rdf/model/uri.rb:1: warning: character class has duplicated range: /[^(?:(?-mix:[A-Za-z](?:[A-Za-z0-9+-.])*))]/

I could reduce it to this code which still triggers the warning:

RDF::URI.encode("http", /[^(?:(?-mix:))]/)

I get these messages a lot when running tests, for example. Difficult to see the right output.

@gkellogg
Copy link
Member

gkellogg commented Jun 2, 2023

I’ll take a look at it. That code goes back to early versions of Ruby 2, if not earlier.

@gkellogg
Copy link
Member

gkellogg commented Jun 2, 2023

There is no explicit code that uses the regular expression sequence /[^(?:(?-mix:[A-Za-z](?:[A-Za-z0-9+-.])*))]/. My guess is that it comes from the SCHEME constant:

SCHEME = Regexp.compile("[A-Za-z](?:[A-Za-z0-9+-\.])*").freeze

But, I can't reproduce this on either Ruby 3.2.0 or 2.6.10. Also, your minimal example RDF::URI.encode("http", /[^(?:(?-mix:))]/) doesn't generate any errors or warnings for me.

Can you provide some more details of your environment? Ruby version? If you download the gem, and do a bundle install, it would be worth trying again with bundle exec bin/console to see if you see the problem, and maybe report on bundle show. It could be there is something else in your environment that is changing the behavior.

I develop on Mac OS, but many other platforms are run in GH Actions.

@mkllnk
Copy link
Author

mkllnk commented Jun 5, 2023

Some environments have warnings enabled and others don't. If you run ruby -w then you see the warnings. Here's a script to reproduce:

#!/bin/env -S ruby -w

require "rdf"

RDF::URI.new("http://example.com")

That produces the following output on my machine:

rdf-3.2.9/lib/rdf/model/uri.rb:39: warning: character class has duplicated range: /[-]|[󰀀-󿿽]|[က00-ჿFD]/
rdf-3.2.9/lib/rdf/model/uri.rb:53: warning: character class has duplicated range
rdf-3.2.9/lib/rdf/model/uri.rb:73: warning: character class has duplicated range
rdf-3.2.9/lib/rdf/model/uri.rb:76: warning: character class has duplicated range

Same on Ruby 2.6.10 and ruby 3.0.3.

@gkellogg gkellogg added the bug label Jun 6, 2023
@gkellogg gkellogg self-assigned this Jun 6, 2023
@gkellogg
Copy link
Member

gkellogg commented Jun 7, 2023

Should be fixed on the develop branch now. Please check it out, and I'll push a new release containing the fixes.

Setting $VERBOSE = true in spec_helper.rb revealed a lot of warnings, some of which were masking errors, such as in the URI normalization that you were hitting.

@mkllnk
Copy link
Author

mkllnk commented Jun 7, 2023

Yes, great. The repeated output in my specs is gone. I still have a few messages during boot though:

rdf-fa66bb91ca63/lib/rdf/model/uri.rb:40: warning: character class has duplicated range: /[-]|[󰀀-󿿽]|[က00-ჿFD]/
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:54: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:74: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:77: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:122: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:123: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:124: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:125: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:126: warning: character class has duplicated range
rdf-fa66bb91ca63/lib/rdf/model/uri.rb:127: warning: character class has duplicated range

@gkellogg
Copy link
Member

gkellogg commented Jun 7, 2023

I’m not seeing this in the action run associated with this PR, or on my matchine. Can you give me something to reproduce it?

@mkllnk
Copy link
Author

mkllnk commented Jun 7, 2023

In your rdf project directory:

echo '#!/bin/env -S ruby -w
require "rdf"
RDF::URI.new("http://example.com")
' > test.rb

ruby -Ilib test.rb

@gkellogg gkellogg reopened this Jun 7, 2023
@gkellogg
Copy link
Member

gkellogg commented Jun 7, 2023

Apparently, the $VERBOSE was being set after the class was loaded, so missed in testing.

IRI Normalization makes use of a hidden feature in Ruby Regular expressions, where [^[a-z]|[0-9]] can be interpreted like [^a-z0-9] which becomes important when composing multiple complicated regular expressions. This is used in a gsub to find character sequences that need to be percent-encoded. Changing this to adhere more closely to standard interfaces will take a bit more time.

@gkellogg
Copy link
Member

gkellogg commented Jun 7, 2023

Okay, try it now.

@mkllnk
Copy link
Author

mkllnk commented Jun 8, 2023

Excellent! Solved. Thank you for such a quick response. 🙇

@mkllnk mkllnk closed this as completed Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants