Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting non-word char in group name #91

Open
ksss opened this issue Jan 3, 2019 · 1 comment
Open

Supporting non-word char in group name #91

ksss opened this issue Jan 3, 2019 · 1 comment

Comments

@ksss
Copy link
Contributor

ksss commented Jan 3, 2019

ref #90

I want to discuss about group name.

Current

mruby-onig-regexp can use non-word char in group name when linked by oniguruma.
But it can't use when linked by onigmo.

link with oniguruma.so onigmo.so bundled onigmo
non-word OK NG NG

Problem

So, mgem can't use non-word char in group name because it depends on mruby-onig-regexp. (e.g. mruby-uri)

Background

CRuby can use non-word char in group name.

$ ruby -e 'p Regexp.new("bad(?<aa-bb>.*)").match("badboy")["aa-bb"]'
"boy"

I investigated the background.

  • oniguruma allows non-word char (like '-') in group name.
  • onigmo does not allows non-word char in group name.
  • CRuby change regexp library from oniguruma to onigmo. However, It kept backward compatibility by patching onigmo.

Discussion point

I think, we have some solutions.

  • No change. We allow incompatibility by linked library.
  • Add compile option to onigmo like ONIG_OPTION_ALLOW_NON_WORD_CHAR_IN_CAPTURE_GROUP.
  • Drop supporting linking with shared library and use bundled onigmo only.
  • Fork mruby-onig-regexp and it support only bundled onigmo only.
  • Add patch to bundled onigmo in mruby-onig-regexp like this. But this solution can't resolve when linking with existing onigmo.so
diff --git a/mrbgem.rake b/mrbgem.rake
index 0d37b12..1404353 100644
--- a/mrbgem.rake
+++ b/mrbgem.rake
@@ -66,6 +66,7 @@ MRuby::Gem::Specification.new('mruby-onig-regexp') do |spec|
           _pp 'autotools', oniguruma_dir
           run_command e, './autogen.sh' if File.exists? 'autogen.sh'
           run_command e, "./configure --disable-shared --enable-static #{host}"
+          run_command e, "patch -p1 < #{dir}/onigmo-#{version}.patch"
           run_command e, "make -j#{$rake_jobs || 1}"
         else
           run_command e, 'cmd /c "copy /Y win32 > NUL"'
diff --git a/onigmo-6.1.3.patch b/onigmo-6.1.3.patch
new file mode 100644
index 0000000..4163913
--- /dev/null
+++ b/onigmo-6.1.3.patch
@@ -0,0 +1,13 @@
+diff --git a/regparse.c b/regparse.c
+index 431aad9..54563ac 100644
+--- a/regparse.c
++++ b/regparse.c
+@@ -2509,7 +2509,7 @@ get_name_end_code_point(OnigCodePoint start)
+ # ifdef RUBY
+ #  define ONIGENC_IS_CODE_NAME(enc, c)  TRUE
+ # else
+-#  define ONIGENC_IS_CODE_NAME(enc, c)  ONIGENC_IS_CODE_WORD(enc, c)
++#  define ONIGENC_IS_CODE_NAME(enc, c)  TRUE
+ # endif
+
+ # ifdef USE_BACKREF_WITH_LEVEL

I think mruby's regexp should support non-word char in group name.
Because I think it is better that the specification of mruby as close as possible to CRuby.
But I have to fork mruby-onig-regexp to resolve the problem perfectly.

How do you think?

@take-cheeze
Copy link
Collaborator

Another option is to switch back to oniguruma since recently there is more releases than onigmo: https://github.com/kkos/oniguruma/releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants