Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net-imap 0.4.0 Regexp Error #3287

Closed
nirvdrum opened this issue Oct 6, 2023 · 4 comments · Fixed by #3296
Closed

net-imap 0.4.0 Regexp Error #3287

nirvdrum opened this issue Oct 6, 2023 · 4 comments · Fixed by #3296
Assignees
Labels

Comments

@nirvdrum
Copy link
Collaborator

nirvdrum commented Oct 6, 2023

net-imap 0.4.0 was released on 2023-10-04 and it fails to load on TruffleRuby 23.1.0. This is particularly a problem because it's a dependency of ActionMailbox and Rails 7.1.0 just released. Anyone looking to start a Rails project with TruffleRuby is likely to run into this.

> ruby -v -r 'net/imap' -e 'p :hi'
truffleruby 23.1.0, like ruby 3.2.2, Oracle GraalVM Native [aarch64-darwin]
<internal:core> core/regexp.rb:129:in `union': too short multibyte code string (org.joni.exception.ValueException): /(?-mix:[\x00-\x7f])|(?-mix:[\xC2-\xDF](?-mix:[\x80-\xBF]))|(?-mix:(?-mix:\xE0[\xA0-\xBF](?-mix:[\x80-\xBF]))|(?-mix:\xED[\x80-\x9F](?-mix:[\x80-\xBF]))|(?-mix:[\xE1-\xEC][\x80-\xBF][\x80-\xBF])|(?-mix:[\xEE-\xEF][\x80-\xBF][\x80-\xBF]))|(?-mix:(?-mix:[\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]))/ (RegexpError)
	from <internal:core> core/regexp.rb:129:in `union'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:126:in `<module:RFC3629>'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:115:in `<module:Patterns>'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:61:in `<class:ResponseParser>'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:10:in `<class:IMAP>'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:7:in `<module:Net>'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap/response_parser.rb:6:in `<top (required)>'
	from <internal:core> core/kernel.rb:297:in `require_relative'
	from /Users/nirvdrum/.gem/truffleruby/3.2.2/gems/net-imap-0.4.0/lib/net/imap.rb:2768:in `<top (required)>'
	from <internal:core> core/kernel.rb:234:in `gem_original_require'
	from <internal:/Users/nirvdrum/.rubies/truffleruby-23.1.0/lib/mri/rubygems/core_ext/kernel_require.rb>:159:in `require'
	from <internal:core> core/unbound_method.rb:18:in `bind_call'
	from <internal:core> core/kernel.rb:272:in `require'
<internal:core> core/kernel.rb:236:in `gem_original_require': cannot load such file -- net/imap (LoadError)
	from <internal:/Users/nirvdrum/.rubies/truffleruby-23.1.0/lib/mri/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from <internal:core> core/unbound_method.rb:18:in `bind_call'
	from <internal:core> core/kernel.rb:272:in `require'

The code in question is related to a new UTF-8 handling:

UTF8_1      = /[\x00-\x7f]/n # aka ASCII 7bit
UTF8_TAIL   = /[\x80-\xBF]/n
UTF8_2      = /[\xC2-\xDF]#{UTF8_TAIL}/n
UTF8_3      = Regexp.union(/\xE0[\xA0-\xBF]#{UTF8_TAIL}/n,
                           /\xED[\x80-\x9F]#{UTF8_TAIL}/n,
                           /[\xE1-\xEC]#{    UTF8_TAIL.source * 2}/n,
                           /[\xEE-\xEF]#{    UTF8_TAIL.source * 2}/n)
UTF8_4      = Regexp.union(/[\xF1-\xF3]#{    UTF8_TAIL.source * 3}/n,
                           /\xF0[\x90-\xBF]#{UTF8_TAIL.source * 2}/n,
                           /\xF4[\x80-\x8F]#{UTF8_TAIL.source * 2}/n)
UTF8_CHAR   = Regexp.union(UTF8_1, UTF8_2, UTF8_3, UTF8_4)

Taken from: https://github.com/ruby/net-imap/blob/v0.4.0/lib/net/imap/response_parser.rb#L116-L126.

Added in net-imap PR #111. Related commit.

@nirvdrum
Copy link
Collaborator Author

nirvdrum commented Oct 6, 2023

It looks this was already fixed in Joni. At least JRuby 9.4.3.0 isn't affected. We're using Joni 2.1.44 but the latest is 2.2.1. Even if we wanted to stick with the 2.1.x line, the latest there is 2.1.48.

@nirvdrum
Copy link
Collaborator Author

nirvdrum commented Oct 6, 2023

Looking at the exception message, we see the following Regexp:

/(?-mix:[\x00-\x7f])|(?-mix:[\xC2-\xDF](?-mix:[\x80-\xBF]))|(?-mix:(?-mix:\xE0[\xA0-\xBF](?-mix:[\x80-\xBF]))|(?-mix:\xED[\x80-\x9F](?-mix:[\x80-\xBF]))|(?-mix:[\xE1-\xEC][\x80-\xBF][\x80-\xBF])|(?-mix:[\xEE-\xEF][\x80-\xBF][\x80-\xBF]))|(?-mix:(?-mix:[\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]))/

It's possible the exception message is missing context, but that is indeed an invalid regex. It needs to have the /n suffix to make the encoding ASCII-8BIT. It should be:

/(?-mix:[\x00-\x7f])|(?-mix:[\xC2-\xDF](?-mix:[\x80-\xBF]))|(?-mix:(?-mix:\xE0[\xA0-\xBF](?-mix:[\x80-\xBF]))|(?-mix:\xED[\x80-\x9F](?-mix:[\x80-\xBF]))|(?-mix:[\xE1-\xEC][\x80-\xBF][\x80-\xBF])|(?-mix:[\xEE-\xEF][\x80-\xBF][\x80-\xBF]))|(?-mix:(?-mix:[\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF])|(?-mix:\xF4[\x80-\x8F][\x80-\xBF][\x80-\xBF]))/n

It looks like our implementation of Regexp.union is losing track of the encoding. Using a subset of the original values that won't raise an exception:

UTF8_1      = /[\x00-\x7f]/n # aka ASCII 7bit
UTF8_TAIL   = /[\x80-\xBF]/n
UTF8_2      = /[\xC2-\xDF]#{UTF8_TAIL}/n
Regexp.union(UTF8_1, UTF8_2).encoding

On MRI, we see the encoding is ASCII-8BIT. On TruffleRuby, we see the encoding is US-ASCII.

@eregon
Copy link
Member

eregon commented Nov 13, 2023

Fixed by #3296

@eregon
Copy link
Member

eregon commented Dec 4, 2023

This fix will be in the 23.1.2 release which will be available on January 23, 2024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants