Rollback of ThreadLocal optimization change #43

mikelalcon · 2017-11-20T22:50:43Z

This change rollsback 3def696, 7597ac7 and f6ab6e0 due to an
internal (Google) memory regression.

The cause is a memory leak because of a non-static ThreadLocal usage that contained references to the object instance itself (Machine -> RE2 objects).

The main issue is creating unbound new ThreadLocal instances leaks ThreadLocals since they are not GCed.

The repro case is much easier:

public class Main {
  public static void main(String[] args) {
    for (int i = 0; i < 10000000; i++) {
      Pattern.compile("aaaa").matches("aaaa");
    }
  }
}

What is happening here?

Each Pattern.compile is creating a new RE2 instance. Each RE2 instance is creating its own ThreadLocal.

When we use the ThreadLocal we access a per thread map ThreadLocalMap. This map contains Entries with ThreadLocal as key and the value as the value.

The Entry of the map (ThreadLocalMap.Entry) extend WeakReference. And here is the problem. The entry itself is not weak. It is just the key that is weak.

So we are essentially creating thousands of entries on that map using the a different ThreadLocal as a key each time.

Why are not GCed? The ThreadLocal is not longer referenced, right? It is referenced:

Each Entry on that map (That is not weak, remember, it is just the key that is weak) contains a value that is...The Machine. Machine points to RE2...and RE2 class points to the ThreadLocal!. And there we have the cycle.

TL;DR: Never create unbound number of ThreadLocals. If not static, make sure that the number of instances are controlled (IOW, not unbound).

But does this mean that ThreadLocals are never GCed from the ThreadLocalMap?

No

When we look for a ThreadLocal in the ThreadLocalMap (by doing threadLocal.get() for example), if we don't find the ThreadLocal in the map by doing a simple hashcode lookup in a array, it calls getEntryAfterMiss.This method calls expungeStaleEntry if Entry.get() returns null (it is the week reference to the key, the threadLocal). This code (and also other places that end up calling expungeStaleEntry) cleanups the stale entry if the threadlocal has been GCed.

But in this case it wasn't GCed, because the value is an indirect reference to the ThreadLocal (because it is a non-static member of RE2).

This change rollsback 3def696, 7597ac7 and f6ab6e0 due to an internal memory regression.

RE2J does an internal synchornization when doing the internal matching. This is because it has an internal pool of objects that it reuses. I did an attempt to fix it in RE2J, but we had to rollback. See details in google/re2j#43 Instead now we create a copy of the pattern per batch job (that is executed by a single thread). Essentially removing the contention. Because compiling the pattern per batch x transform is in the order of thousands at most, this doesn't show a performance penalty (Internally tested with 1, 2 and 20 threads). Once the performance bottleneck is removed from RE2J, we should be able to remove the Pattern.compile(pattern.pattern()) calls and replace with the original 'pattern' expression. Some performance numbers from the internal run: 20 threads, no optimization: real 1m16.621s real 1m16.700s real 1m23.557s real 1m20.242s real 1m15.694s 20 threads, with optimization: real 1m9.686s real 1m4.141s real 0m58.491s real 1m4.280s real 1m4.952s 1 thread, no optimization: real 1m52.285s real 1m43.664s real 1m43.624s real 1m44.872s real 1m42.039s 1 thread, with optimization: real 1m40.081s real 1m44.596s real 1m42.790s real 1m43.514s real 1m39.136s The results show that with 1 thread (no parallelism) we don't have a noticeable performance pentalty but when we use 20 threads we se a noticeable performance improvement. This performance improvement needs >10 threads to show the benefit (since it removes contention). PiperOrigin-RevId: 176656487 Change-Id: Iebe76a3d8f5e2efb8653538914b514e62416e61b

alandonovan

Could you elaborate on the cause of the memory regression in the change description?

mikelalcon · 2017-11-27T20:44:30Z

Done.

Rollback of ThreadLocal optimization change

1110a62

This change rollsback 3def696, 7597ac7 and f6ab6e0 due to an internal memory regression.

googlebot added the cla: yes label Nov 20, 2017

alandonovan reviewed Nov 27, 2017

View reviewed changes

Merge branch 'master' into rollback

a9ca2dd

sjamesr merged commit e834873 into google:master Dec 28, 2017

nicktrav mentioned this pull request Feb 12, 2018

Thoughts on synchronized access on RE2 instances #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollback of ThreadLocal optimization change #43

Rollback of ThreadLocal optimization change #43

mikelalcon commented Nov 20, 2017 •

edited

Loading

alandonovan left a comment

mikelalcon commented Nov 27, 2017

Rollback of ThreadLocal optimization change #43

Rollback of ThreadLocal optimization change #43

Conversation

mikelalcon commented Nov 20, 2017 • edited Loading

alandonovan left a comment

Choose a reason for hiding this comment

mikelalcon commented Nov 27, 2017

mikelalcon commented Nov 20, 2017 •

edited

Loading