Skip to content


Subversion checkout URL

You can clone with
Download ZIP


regex fails for foreign characters when offset is last character #317

dwkoogt opened this Issue · 3 comments

3 participants


I have a regex
and trying to match that to
ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ]
I get

when i put matched value to var m and get offsets like
offset, offset1 = m.offset(0)
I get 41 and 61.

now, if i use start offset with 61 and try match again, I'm expecting to get nil but instead I get the original result as if match started from index 0.

This only happens in jruby and I'm using 1.7 preview 2.


Can you re-test this on 1.7.0.RC1? We fixed several encoding-related issues.


Just tested with 1.7 RC1 and the condition still exists.


I confirmed the problem with RC1, but it appears to be fixed on the master branch.

$ jruby -S irb            
irb(main):001:0> RUBY_DESCRIPTION
=> "jruby 1.7.0.RC2 (1.9.3p203) 2012-10-16 5bc40ab on Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10 [darwin-x86_64]"
irb(main):002:0> str = "ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ]"
=> "ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ]"
irb(main):003:0> re = /((mailto:|(news|(ht|f)tp(s?)):\/\/)\S+[\d\w\/])/
=> /((mailto:|(news|(ht|f)tp(s?)):\/\/)\S+[\d\w\/])/
irb(main):004:0> m = re.match str
=> #<MatchData "" 1:"" 2:"http://" 3:"http" 4:"ht" 5:"">
irb(main):005:0> m.offset 0
=> [41, 61]
irb(main):006:0> m = re.match str, 61
=> nil
@BanzaiMan BanzaiMan closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.