regex fails for foreign characters when offset is last character #317

Closed
dwkoogt opened this Issue Sep 26, 2012 · 3 comments

Projects

None yet

3 participants

dwkoogt commented Sep 26, 2012

I have a regex
/((mailto:|(news|(ht|f)tp(s?))://)\S+[\d\w/])/
and trying to match that to
ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ] http://bit.ly/Pr2tyl
I get
http://bit.ly/Pr2tyl

when i put matched value to var m and get offsets like
offset, offset1 = m.offset(0)
I get 41 and 61.

now, if i use start offset with 61 and try match again, I'm expecting to get nil but instead I get the original result http://bit.ly/Pr2tyl as if match started from index 0.

This only happens in jruby and I'm using 1.7 preview 2.

Owner
headius commented Sep 26, 2012

Can you re-test this on 1.7.0.RC1? We fixed several encoding-related issues.

dwkoogt commented Sep 26, 2012

Just tested with 1.7 RC1 and the condition still exists.

Owner

I confirmed the problem with RC1, but it appears to be fixed on the master branch.

$ jruby -S irb            
irb(main):001:0> RUBY_DESCRIPTION
=> "jruby 1.7.0.RC2 (1.9.3p203) 2012-10-16 5bc40ab on Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10 [darwin-x86_64]"
irb(main):002:0> str = "ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ] http://bit.ly/Pr2tyl"
=> "ゼクシィ茨城栃木群馬版 2012年 10月号 [雑誌] [雑誌の新着商品ブログ] http://bit.ly/Pr2tyl"
irb(main):003:0> re = /((mailto:|(news|(ht|f)tp(s?)):\/\/)\S+[\d\w\/])/
=> /((mailto:|(news|(ht|f)tp(s?)):\/\/)\S+[\d\w\/])/
irb(main):004:0> m = re.match str
=> #<MatchData "http://bit.ly/Pr2tyl" 1:"http://bit.ly/Pr2tyl" 2:"http://" 3:"http" 4:"ht" 5:"">
irb(main):005:0> m.offset 0
=> [41, 61]
irb(main):006:0> m = re.match str, 61
=> nil
@BanzaiMan BanzaiMan closed this Oct 17, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment