Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match #40

Closed
k-takata opened this issue Jul 29, 2014 · 3 comments
Closed

/(?i)\u0149\u0149/ =~ "\u0149\u0149" doesn't match #40

k-takata opened this issue Jul 29, 2014 · 3 comments
Labels

Comments

@k-takata
Copy link
Owner

How to reproduce:

$ LD_LIBRARY_PATH=.libs python

>>> from testpy import *
>>> set_encoding('UTF-8')
>>> set_output_encoding('UTF-8')
>>> x2(u"(?i)\u0149\u0149", u"\u0149\u0149", 0, 2)
FAIL: /(?i)ʼnʼn/ 'ʼnʼn'

It seems that Oniguruma 5.9.5 also has this bug.

@k-takata k-takata added the bug label Jul 29, 2014
k-takata added a commit that referenced this issue Jul 29, 2014
k-takata added a commit that referenced this issue Jul 29, 2014
@k-takata
Copy link
Owner Author

\u0149 is casefolded into \u02bc + n, then it is compiled into exactn-ic:\xca\xbc + exact1-ic:n.
One character is divided into two opcodes, so it couldn't match to a character \u0149.

@k-takata
Copy link
Owner Author

Hmm, some tests failed.

https://travis-ci.org/k-takata/Onigmo/jobs/31143460

3115 FAIL: /(?i)ΐ/ 'ΐ'

x2("(?i)\u03b9\u0308\u0301", "\u0390", 0, 1)

3458 search fail (ISO-8859-2)
3459 search fail (ISO-8859-2)
3515 search fail (UTF-16BE)
3518 search fail (UTF-32BE)
3537 search fail (UTF-16BE)
3538 search fail (UTF-16BE)

k-takata added a commit that referenced this issue Jul 29, 2014
k-takata added a commit that referenced this issue Jul 29, 2014
k-takata added a commit that referenced this issue Jul 29, 2014
k-takata added a commit that referenced this issue Jul 30, 2014
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.
k-takata added a commit that referenced this issue Jul 30, 2014
@k-takata
Copy link
Owner Author

https://travis-ci.org/k-takata/Onigmo/jobs/31236650
Rebased and forcedly pushed.

k-takata added a commit that referenced this issue Jul 31, 2014
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.
k-takata added a commit that referenced this issue Jul 31, 2014
k-takata added a commit that referenced this issue Jul 31, 2014
'\u0149' is casefolded into '\u02bc' + 'n'.  They were compiled into
'exactn-ic:\xca\xbc' + 'exact1-ic:n'.  One character was divided into
two opcodes, so it couldn't match to a character '\u0149'.
Merge a series of 'exactn-ic' and 'exact1-ic' into one 'exactn-ic'.
(cherry picked from commit 7b61f4b)

Conflicts:

	regcomp.c
k-takata added a commit to k-takata/bregonig that referenced this issue Sep 13, 2014
* Onigmo (Oniguruma-mod) 5.15.0 for bregonig.dll を使用。
  https://github.com/k-takata/Onigmo/tree/Onigmo-5.15.0_for_bregonig
  - Unicode 7.0 に対応
  - Oniguruma 5.9.5 をマージ
  - 大量のグループを使うと落ちる問題を修正
    k-takata/Onigmo#24
  - /\x{1ffc}/i =~ "\x1ff3" がマッチしない問題を修正
  - UTF-16/32 で /[a-c#]+\W/ =~ "def#" がマッチしない問題を修正
  - /(?i)\u0149\u0149/ =~ "\u0149\u0149" がマッチしない問題を修正
    k-takata/Onigmo#40
  - 文字クラスの中で /w を使い、/i オプションを指定したときの問題を修正
    k-takata/Onigmo#4
  - 文字プロパティが /i オプションを無視する問題を修正
    k-takata/Onigmo#41
  - "ab" =~ /(?!^a).*b/ がマッチしない問題を修正
    k-takata/Onigmo#44
k-takata added a commit to k-takata/bregonig that referenced this issue Sep 13, 2014
* Oniguruma 5.9.5 改変版2 for bregonig.dll V2 を使用。
  https://github.com/k-takata/Onigmo/tree/onig-5.9.5-mod2_for_bregonig-v2
  - ベースバージョンを Oniguruma 5.9.4 から 5.9.5 に変更。
  - 大量のグループを使うと落ちる問題を修正
    k-takata/Onigmo#24
  - UTF-16/32 で /[a-c#]+\W/ =~ "def#" がマッチしない問題を修正
  - /(?i)\u0149\u0149/ =~ "\u0149\u0149" がマッチしない問題を修正
    k-takata/Onigmo#40
  - "ab" =~ /(?!^a).*b/ がマッチしない問題を修正
    k-takata/Onigmo#44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant