Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on synchronized access on RE2 instances #46

Closed
nicktrav opened this issue Feb 12, 2018 · 2 comments
Closed

Thoughts on synchronized access on RE2 instances #46

nicktrav opened this issue Feb 12, 2018 · 2 comments

Comments

@nicktrav
Copy link

nicktrav commented Feb 12, 2018

We've recently been doing some profiling on one of our applications that uses re2j quite heavily, and we're noticing a decent amount of thread contention coming from RE2 when doing concurrent patten matching (i.e. multiple threads using the same Pattern instance).

Both RE2#get and RE2#put synchronize access on the monitor to obtain either a cached Machine instance or instantiate a new one if there are none that already exist.

Here's a small example that when run demonstrates the blocking thread behavior:

public class ThreadContentionExample {

  private static final int THREADS = 20;
  private static final Pattern PATTERN = Pattern.compile("\\w+://.*");

  public static void main(String ...args) throws InterruptedException {
    Thread[] threads = new Thread[THREADS];
    for (int i = 0; i < THREADS; i++) {
      Thread thread = new Thread() {
        @Override public void run() {
          while (true) {
            PATTERN.matcher("https://www.google.com").matches();
          }
        }
      };
      thread.start();
      threads[i] = thread;
    }

    for (Thread thread : threads) {
      thread.join();
    }
  }
}

The thread profile (via YourKit) looks as follows:

screen shot 2018-02-12 at 11 00 38 am

I did some JMH profiling with various other concurrent data structures, but it seems like all have slightly worse off performance than the current implementation at lower thread counts (within error bounds anyway). A ConcurrentLinkedQueue or Dequeue seems to be slightly more performant at higher thread counts. However, even if they were better, many of these more "exotic" classes are not available in GWT anyway or they're at a higher JDK language level (re2j uses 1.6 features now), so I think that precludes them from use.

That said, I was wondering if there were any thoughts how this synchronization bottleneck could be removed, or at least improved a little, in the case that there are many threads accessing the same Pattern instance?

Our current approach is to just use ThreadLocal Patterns in our application code (as we have bounded thread pools). I see that the ThreadLocal approach was adopted in re2j too, but it lead to some memory leaks, so it was reverted.

This definitely isn't a dealbreaker for us, and it's not really a bug either, just food for thought.

Re2j has been great! Thanks!

@arnaudroger
Copy link
Contributor

I was looking at that. and also it would be possible to have a non-locking structure to keep track of the cache it's quite complicated and error-prone. It might be easier to allow people to get non Thread safe instance that avoid the complication of managing a pool of Machine if they are ready to have one per thread.

@charlesmunger
Copy link
Contributor

I believe #121 will fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants