/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match #40

Closed
k-takata opened this Issue Jul 29, 2014 · 3 comments

Comments

Projects
None yet
1 participant
@k-takata
Owner

k-takata commented Jul 29, 2014

How to reproduce:

$ LD_LIBRARY_PATH=.libs python

>>> from testpy import *
>>> set_encoding('UTF-8')
>>> set_output_encoding('UTF-8')
>>> x2(u"(?i)\u0149\u0149", u"\u0149\u0149", 0, 2)
FAIL: /(?i)ʼnʼn/ 'ʼnʼn'

It seems that Oniguruma 5.9.5 also has this bug.

@k-takata k-takata added the bug label Jul 29, 2014

k-takata added a commit that referenced this issue Jul 29, 2014

k-takata added a commit that referenced this issue Jul 29, 2014

@k-takata

This comment has been minimized.

Show comment
Hide comment
@k-takata

k-takata Jul 29, 2014

Owner

\u0149 is casefolded into \u02bc + n, then it is compiled into exactn-ic:\xca\xbc + exact1-ic:n.
One character is divided into two opcodes, so it couldn't match to a character \u0149.

Owner

k-takata commented Jul 29, 2014

\u0149 is casefolded into \u02bc + n, then it is compiled into exactn-ic:\xca\xbc + exact1-ic:n.
One character is divided into two opcodes, so it couldn't match to a character \u0149.

@k-takata

This comment has been minimized.

Show comment
Hide comment
@k-takata

k-takata Jul 29, 2014

Owner

Hmm, some tests failed.

https://travis-ci.org/k-takata/Onigmo/jobs/31143460

3115 FAIL: /(?i)ΐ/ 'ΐ'

x2("(?i)\u03b9\u0308\u0301", "\u0390", 0, 1)

3458 search fail (ISO-8859-2)
3459 search fail (ISO-8859-2)
3515 search fail (UTF-16BE)
3518 search fail (UTF-32BE)
3537 search fail (UTF-16BE)
3538 search fail (UTF-16BE)

Owner

k-takata commented Jul 29, 2014

Hmm, some tests failed.

https://travis-ci.org/k-takata/Onigmo/jobs/31143460

3115 FAIL: /(?i)ΐ/ 'ΐ'

x2("(?i)\u03b9\u0308\u0301", "\u0390", 0, 1)

3458 search fail (ISO-8859-2)
3459 search fail (ISO-8859-2)
3515 search fail (UTF-16BE)
3518 search fail (UTF-32BE)
3537 search fail (UTF-16BE)
3538 search fail (UTF-16BE)

k-takata added a commit that referenced this issue Jul 29, 2014

k-takata added a commit that referenced this issue Jul 29, 2014

k-takata added a commit that referenced this issue Jul 29, 2014

k-takata added a commit that referenced this issue Jul 30, 2014

/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match (Issue #40)
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.

k-takata added a commit that referenced this issue Jul 30, 2014

@k-takata

This comment has been minimized.

Show comment
Hide comment
Owner

k-takata commented Jul 30, 2014

k-takata added a commit that referenced this issue Jul 31, 2014

/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match (Issue #40)
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.

k-takata added a commit that referenced this issue Jul 31, 2014

@k-takata k-takata closed this in dfc8809 Jul 31, 2014

k-takata added a commit that referenced this issue Jul 31, 2014

/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match (Issue #40)
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.
(cherry picked from commit 7b61f4b)

Conflicts:

	regcomp.c

k-takata added a commit to k-takata/bregonig that referenced this issue Sep 13, 2014

Ver.3.06
* Onigmo (Oniguruma-mod) 5.15.0 for bregonig.dll を使用。
  https://github.com/k-takata/Onigmo/tree/Onigmo-5.15.0_for_bregonig
  - Unicode 7.0 に対応
  - Oniguruma 5.9.5 をマージ
  - 大量のグループを使うと落ちる問題を修正
    k-takata/Onigmo#24
  - /\x{1ffc}/i =~ "\x1ff3" がマッチしない問題を修正
  - UTF-16/32 で /[a-c#]+\W/ =~ "def#" がマッチしない問題を修正
  - /(?i)\u0149\u0149/ =~ "\u0149\u0149" がマッチしない問題を修正
    k-takata/Onigmo#40
  - 文字クラスの中で /w を使い、/i オプションを指定したときの問題を修正
    k-takata/Onigmo#4
  - 文字プロパティが /i オプションを無視する問題を修正
    k-takata/Onigmo#41
  - "ab" =~ /(?!^a).*b/ がマッチしない問題を修正
    k-takata/Onigmo#44

k-takata added a commit to k-takata/bregonig that referenced this issue Sep 13, 2014

Ver.2.09
* Oniguruma 5.9.5 改変版2 for bregonig.dll V2 を使用。
  https://github.com/k-takata/Onigmo/tree/onig-5.9.5-mod2_for_bregonig-v2
  - ベースバージョンを Oniguruma 5.9.4 から 5.9.5 に変更。
  - 大量のグループを使うと落ちる問題を修正
    k-takata/Onigmo#24
  - UTF-16/32 で /[a-c#]+\W/ =~ "def#" がマッチしない問題を修正
  - /(?i)\u0149\u0149/ =~ "\u0149\u0149" がマッチしない問題を修正
    k-takata/Onigmo#40
  - "ab" =~ /(?!^a).*b/ がマッチしない問題を修正
    k-takata/Onigmo#44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment