Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joni seems to be 1.5 slower than simple JNI bindings #43

Open
denofevil opened this issue Jul 11, 2019 · 1 comment
Open

joni seems to be 1.5 slower than simple JNI bindings #43

denofevil opened this issue Jul 11, 2019 · 1 comment

Comments

@denofevil
Copy link

Steps to reproduce

  1. onig4j-v003-src.zip
  2. Update jni/Makefile with proper JAVA_HOME and then call make
  3. Update lib location in src/onig4j/OnigRegex.java
  4. Run OnigPerformanceTest

We've got following results:
java: 4261ms
joni: 5798ms
onig: 3511ms
tm4e: 18ms

With a straightforward approach joni is about 1.5 times slower than oniguruma bindings.

tm4e major boost seems to be a result of src/org/eclipse/tm4e/core/internal/oniguruma/OnigRegExp.java:49: if a regexp is called consequently on the same string it just returns latest cached match result

@enebo
Copy link
Member

enebo commented Mar 1, 2023

This is obviously ancient (sorry about that) yet still makes an interesting suggestion. The code highlighted is:

    public OnigResult Search(OnigString str, int position) {
        if (lastSearchStrUniqueId == str.uniqueId() && lastSearchPosition <= position) {
            if (lastSearchResult == null || lastSearchResult.LocationAt(0) >= position) {
                return lastSearchResult;
            }
        }

        lastSearchStrUniqueId = str.uniqueId();
        lastSearchPosition = position;
        lastSearchResult = Search(str.utf8_value(), position, str.utf8_length());
        return lastSearchResult;
    }

In looking at the benchmark it seems it creates a regexp cache which when _findNextMatchSync happens it basically kept the search for that regexp around so it can they notice it is the same result and then have a cache hit. We (JRuby) cache joni regexps but not results above joni itself.

Perhaps there is value in caching results in Joni? I think C Ruby added some result cache. I will try and see if they have an interesting data to back up how often this happens. This again may be more useful above joni than in it but I can see how having it in it could help more projects and not force them to each do their own caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants