Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggressively remove PoolThreadCache references from its finalizer object #14155

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Jul 2, 2024

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the cache, due to debugging capabilities, its finalizer will still retain a strong reference to it, causing few classes to leak (and eventually, their ClassLoader). Despite we cannot avoid finalizers to wait the finalization pass, we can reduce the memory footprint of "leaked" instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen

@franz1981
Copy link
Contributor Author

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 2, 2024

@normanmaurer @chrisvest It would be great if we could just let FastThreadLocalThread which cleanupFastThreadLocals == true to not use any finalizer, trusting them, but maybe I'm making this too simple and there's still a reason to use finalizers for those, wdyt?

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 2, 2024

FYI @normanmaurer @chrisvest I've added 72c4731 which push the approach further by enabling opt-in to NOT use finalizers for FastThreadLocalThreads which take care of cleaning up their FastThreadLocals instances.

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the non-broken NettyLeak1 reproducer from #12749, and it looks like the issue is resolved. So… controversial suggestion: maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

@chrisvest
Copy link
Contributor

Here's a fixed version of NettyLeak0 from #12749 which also shows no leaks:

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;

public class NettyLeak0 {

    public static void main(String[] args) throws Exception {
        System.setProperty("io.netty.allocator.useCacheForAllThreads", "true");
        System.setProperty("io.netty.allocator.maxOrder", "9");
        Thread.setDefaultUncaughtExceptionHandler((t, e) -> {
            e.printStackTrace();
            Runtime.getRuntime().halt(42);
        });

        int bufSize = 32 * 1024;

        AtomicInteger longLiving = new AtomicInteger();
        List<ByteBuf> longLivingSink = Collections.synchronizedList(new ArrayList<ByteBuf>());
        ArrayBlockingQueue<Thread> threads = new ArrayBlockingQueue<>(64);
        for (int i = 0; true; i++) {
            Thread thread = new Thread(() -> {
                for (int x = 0; x < 256; x++) {
                    if (x != 0) {
                        // allocate different size classes to fill more thread caches
                        for (int j = 0; j <= 10; j++) {
                            ByteBuf buf = PooledByteBufAllocator.DEFAULT.directBuffer(bufSize >> j);
                            buf.release();
                        }
                    } else {
                        // only one buf is alive after thread termination
                        ByteBuf buf = PooledByteBufAllocator.DEFAULT.directBuffer(1);
                        longLiving.incrementAndGet();
                        longLivingSink.add(buf);
                    }
                }
            });
            if (!threads.offer(thread)) {
                threads.take().join();
                threads.put(thread);
            }
            thread.start();

            if (i % 1000 == 0) {
                System.gc(); // to get a chance for io.netty.buffer.PoolThreadCache.finalize
                System.out.println(PooledByteBufAllocator.DEFAULT.metric());
                System.out.println(
                        "chunks count: " +
                                PooledByteBufAllocator.DEFAULT.metric().directArenas().stream()
                                        .flatMap(m -> m.chunkLists().stream())
                                        .mapToInt(m -> {
                                            int count = 0;
                                            for (PoolChunkMetric __ : m) {
                                                count += 1;
                                            }
                                            return count;
                                        })
                                        .sum()
                );
                int ll = longLiving.get();
                System.out.println("long living: " + ll);
                // real size of a buf will be > 1b & < 1kb, but just estimate it as 1kb
                System.out.println("long living size (MB): " + ll / 1024);
                System.out.println();
                longLivingSink.removeIf(buf -> buf.release() && decrement(longLiving));
            }
        }
    }

    private static boolean decrement(AtomicInteger longLiving) {
        longLiving.addAndGet(-1);
        return true;
    }
}

@franz1981
Copy link
Contributor Author

@normanmaurer wdyt about the @chrisvest suggestion?

maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

@normanmaurer
Copy link
Member

@normanmaurer wdyt about the @chrisvest suggestion?

maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

Sounds good to me

@franz1981 franz1981 marked this pull request as ready for review July 4, 2024 09:58
@franz1981
Copy link
Contributor Author

PTAL @chrisvest @normanmaurer

Not very happy that I've added a new sys prop here, but I want to keep it safe for existing users...
Who requires to keep on creating/stopping event loop threads (e.g. the dev mode in quarkus) can decide to avoid finalization and just set it to speedup dropping the previous classloader (@gsmet please try this for your issue)

@normanmaurer
Copy link
Member

PTAL @chrisvest @normanmaurer

Not very happy that I've added a new sys prop here, but I want to keep it safe for existing users... Who requires to keep on creating/stopping event loop threads (e.g. the dev mode in quarkus) can decide to avoid finalization and just set it to speedup dropping the previous classloader (@gsmet please try this for your issue)

That's good to me

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the cache, due to debugging
capabilities, it's finalizer will still retain a strong reference to it, causing few classes to leak (and eventually, their ClassLoader).
Despite we cannot avoid finalizers to wait the finalization pass, we can reduce the memory footprint of "leaked" instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen
@franz1981
Copy link
Contributor Author

Let's see what the CI think @normanmaurer and thanks again!

@gsmet
Copy link

gsmet commented Jul 4, 2024

Let me have a look for my use case before merging. Thanks!

@gsmet
Copy link

gsmet commented Jul 4, 2024

I can confirm that I don't see the finalizers trying to load classes after my CL is closed, which looks promising!

gsmet added a commit to gsmet/quarkus that referenced this pull request Jul 4, 2024
…eads

This is going to be useful once
netty/netty#14155 lands in Quarkus.
@franz1981
Copy link
Contributor Author

So @normanmaurer , now just need CI and @chrisvest to be happy bout this

@franz1981
Copy link
Contributor Author

Windows seems the only one unhappy, likely for unrelated reasons; if you are good to go, I am done here @normanmaurer

@normanmaurer
Copy link
Member

I am happy if @chrisvest is happy... @chrisvest PTAL again

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@franz1981
Copy link
Contributor Author

@normanmaurer I can merge it myself, ok @normanmaurer ?
My first time merging a PR here (no jokes!)

@normanmaurer
Copy link
Member

@franz1981 sure... lets go. Please also cherry-pick it to the other related branches..

@normanmaurer normanmaurer added this to the 4.1.112.Final milestone Jul 9, 2024
@franz1981 franz1981 merged commit 8039232 into netty:4.1 Jul 9, 2024
17 checks passed
franz1981 added a commit that referenced this pull request Jul 9, 2024
…ect (#14155)

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the
cache, due to debugging capabilities, its finalizer will still retain a
strong reference to it, causing few classes to leak (and eventually,
their ClassLoader). Despite we cannot avoid finalizers to wait the
finalization pass, we can reduce the memory footprint of "leaked"
instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference
within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen
normanmaurer pushed a commit to yawkat/netty that referenced this pull request Jul 17, 2024
…ect (netty#14155)

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the
cache, due to debugging capabilities, its finalizer will still retain a
strong reference to it, causing few classes to leak (and eventually,
their ClassLoader). Despite we cannot avoid finalizers to wait the
finalization pass, we can reduce the memory footprint of "leaked"
instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference
within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants