LinkedHashMap memory leak #3260

eduveks · 2021-03-06T16:47:19Z

Hello,

I'm facing memory leak of many LinkedHashMap with GraalVM.

This behaviour is not new because in the last years I am using GraalVM always seeing memory issues, but now is more critical because we are using in situations with much more stress overall.

Anyway GraalVM is awesome, thank you a lot by all your work. 👏

Hope this report can help to improve.

Describe GraalVM and your environment :

GraalVM version or commit id if built from source: 21.0.0
CE or EE: CE
JDK version: 11
OS and OS Version: macOS Big Sur
Architecture: amd64
The output of java -Xinternalversion:

OpenJDK 64-Bit Server VM (11.0.10+8-jvmci-21.0-b06) for bsd-amd64 JRE (11.0.10+8-jvmci-21.0-b06), built on Jan 15 2021 00:28:57 by "graal1" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)

The text was updated successfully, but these errors were encountered:

eduveks · 2021-03-06T16:49:42Z

After more 20 minutes looks like this:

munishchouhan · 2021-03-10T04:22:28Z

@eduveks thanks for reporting the issue
we will check it and get back to you

chumer · 2021-03-29T11:10:44Z

@eduveks thanks for the report!

Are you reloading classes e.g. using a custom host class loader or some application server? It looks to me as if our host member/method lookup cache seems to be missed a lot.

If this is true then the current behavior might be not ideal but is expected. We currently do reference bound host methods strongly. A workaround is to recreate the Engine (or Context if you don't use an explicit Engine) to refresh that cache. Note that this also refreshes all the compiled code. If this turns out to be the case, then I could have another look on improving this, if you provide a small example of your usage pattern.

If you are not reloading classes in your host application then it would be immensely helpful if you could provide a sample application that reproduces this leak reliably or, less preferrably, a heap dump (hprof) file of the leak.

Thanks!

eduveks · 2021-03-30T16:34:16Z

Hello @chumer,

I did another round of stress test, with screenshots below and here the hprof file made a few minutes ago:

https://netuno.org/heapdump-1617120295618.hprof.zip

Yes, it is an application server, the Netuno platform, with a Jetty server inside and runs over GraalVM.

You can check out Netuno install is straightforward, only need to execute one command in the terminal.

The installation will download and install GraalVM too inside the netuno folder.

Then here an example of how to create services.

NodeJS inside GraalVM execute the javascript server-side (services), then the script code like _out.json, _db.query and much more, they will call Java methods.

Each request executes multiples scripts (JavaScript) over the same Class that initializes the HostAccess, ContextBuilder, and then the Context, but in the same request may create more than one Context. But always all Contexts initialized will be closed in the end of the request with context.leave() and context.close(true).

Multi-threads do not use the same HostAccess and ContextBuilder, each thread/request has your own org.graalvm.polyglot.* instances.

If you need more details just say.

Thank you!

chumer · 2021-04-06T13:24:53Z

I downloaded the heap dump and computed the retained size and the retained size looks like this:

So it looks like if all of the references can actually be freed.
So I don't think there is anything leaking in that heap dump.
Why you are seeing the GC pressure that you are seeing, I cannot really explain.
Did you try forcing a full GC with System.gc form time to time?

Each request executes multiples scripts (JavaScript) over the same Class that initializes the HostAccess, ContextBuilder, and then the Context, but in the same request may create more than one Context. But always all Contexts initialized will be closed in the end of the request with context.leave() and context.close(true).

Two remarks here:

Do not use context.close(true) if you don't need to. Always prefer to use context.close() unless you really want to cancel the execution.
I am seeing a lot of memory pressure due to many contexts created that do a lot of host interop. A lot of this work can be cached by using code sharing between multiple contexts[1]. Maybe you should try that? Should give you a significant improvement, and probably also get rid of this issue. (although we should get to the root cause of this)

[1] https://www.graalvm.org/reference-manual/embed-languages/#code-caching-across-multiple-contexts

eduveks · 2021-04-07T12:09:18Z

I think that I already tested the implementation with code caching across multiple contexts in the past.

But yesterday, I did some tests changing the implementation to supports the same Engine across all threads and with one Engine per thread.

The first impression in both cases was memory ate fast and very slow performance, scripts executed before in 300 milliseconds jumped to 20 seconds, CPU with high usage too.

I will try to do more tests within the next few days to try to give more details.

Yes, periodically, we force the garbage collector with:

System.gc();
System.runFinalization();

Sample of the first results with the Engine:

chumer · 2021-04-07T12:18:06Z

Any chance you can give me something runnable that reproduces this problem?

eduveks · 2021-04-07T15:53:05Z

Any chance you can give me something runnable that reproduces this problem?

Yes, I think. I don't have time today, but I will give you more news in the next few days.

thurka · 2021-04-08T07:52:08Z

I think I know what is going on here. I was investigating, why there are so many instances of java.lang.reflect.Method in the heap dump. I tracked almost all the references to various instances of java.lang.ClassValue$ClassValueMap and those instances does not have any references so it looks like they can be collected by GC, but instances of java.lang.ClassValue$ClassValueMap are references from java.lang.Class.classValueMap field so they are held by the particular class, but this information is not dumped to hprof file. :-(
The question is why java.lang.ClassValue$ClassValueMap holds so many java.lang.reflect.Method. I don't know. If you look at the source code of ClassValueMap, there is some caching, so it might be one reason. I also saw this: https://stackoverflow.com/questions/62593886/java-memory-leak-why-are-unreachable-objects-not-collected This looks very similar to our case and in addition it uses scripting API to call javascript.
I have netuno server running with some demo application and just randomly clicking pages, reloading pages etc., I am able to see steady rise of instances of Method.

thurka · 2021-04-08T07:54:32Z

This is example of references from Method to ClassValueMap.

thurka · 2021-04-08T08:07:56Z

Small heap dump for investigation: GH-3260_heapdump.zip .

thurka · 2021-04-08T09:30:39Z

Steps to reproduce:

get off Cisco VPN
set JAVA_HOME to JDK 11 (installation below needs nashorn, so it will not work on JDK 15+)
install netuno in /tmp

thurka$ cd /tmp
thurka$ mkdir -p netuno && cd netuno && curl https://www.netuno.org/netuno.jar > netuno.jar && java -jar netuno.jar install

add node to PATH

thurka$ export PATH=/tmp/netuno/graalvm/bin/:$PATH

start netuno server with the default application via ./netuno server
fill in email address
it will open web browser in Portuguese locale(???), you have to manually register the netuno server
it will take some time, since NPM Install is running for the first time
play with application
start VisualVM and open notuno process
switch to Sampler tab and select Memory sampling
sort data by Live Object
look for number of java.lang.reflect.Method instances
the number is steadily climbing while I play with netuno app in browser
you can use Show delta values button to see deltas in live object count
you can invoke GC directly from Memory sampler view to free unreferenced instances

chumer · 2021-04-08T11:09:43Z

I did not yet run your code yet, but I think I found a problematic pattern in our code:

public class Test {
	
	/*
	 * Change to false to avoid the bug.
	 */
	private static boolean CYCLIC_REFERENCE = true;

	private static class HeavyValue {
		
		final Context context;
		final byte[] bytes = new byte[1024 * 16];
		
		HeavyValue(Context context) {
			// the context assignment
			if (CYCLIC_REFERENCE) {
				this.context = context;
			} else {
				this.context = null;
			}
		}
		
	}
	
	
	private static class Context {
		
		final ClassValue<HeavyValue> values = new ClassValue<Test.HeavyValue>() {
			@Override
			protected HeavyValue computeValue(Class<?> type) {
				return new HeavyValue(Context.this);
			}
		};
		
		
	}
	
	
	static class TestClass1 {
		
		}
	static class TestClass2 {
			
		}
	static class TestClass3 {
		
	}
	static class TestClass4 {
		
	}
	static class TestClass5 {
		
	}
	static class TestClass6 {
		
	}

	private static final Class<?>[] TEST_CLASSES = new Class[] {
			TestClass1.class,
			TestClass2.class,
			TestClass3.class,
			TestClass4.class,
			TestClass5.class,
			TestClass6.class
	};
	
	
	private static int test() {
		Context context =  new Context();
		
		int hash = 31;
		for (Class<?> testClass : TEST_CLASSES) {
			hash += context.values.get(testClass).bytes.length;	
		}
		return hash;
	}
	
	
	public static void main(String[] args) {
		int hash = 31;
		for (int i = 0; i < 100000; i++) {
			hash += test();
		}
		System.out.println(hash);
		
	}
	
}

Switching that cyclic reference flag to false fixes the problem (tested on latest JDK 8 on Mac). But this should really not happen, as such cyclic references should be resolvable by the GC. I will double check and maybe file a JDK bug. We probably need to deploy a workaround for this.

I need to do some more digging to find out what is going on.

chumer · 2021-04-08T11:28:01Z

Turns out this is a known issue:
https://bugs.openjdk.java.net/browse/JDK-8169425

thurka · 2021-04-08T11:29:32Z

It looks like every page reload in the web app adds several instances of com.oracle.truffle.polyglot.HostClassCache.

chumer · 2021-04-08T13:46:47Z

I could reproduce it with a simple example now locally.
It looks like there is a simple workaround. You need to reuse the same HostAccess instance for all contexts.
I am working on a fix that avoids a the Cyclic dependency.

chumer · 2021-04-08T15:01:25Z

Fix is in the pipeline. Will backport this to 21.1 and probably also 20.3 (need to double check)

eduveks · 2021-04-09T12:53:48Z

Thank you, @thurka, for your inputs, and I'm happy you could found the issue with Netuno.

@chumer the workaround reusing the HostAccess across all contexts does not affect overall performance, but in this case, the memory leak still happens. See the print screen below:

To complement the @thurka tip to run locally, if install and start Netuno just with ./netuno server based on the default configs will run the demo app, then it is possible to execute Apache BenchMark (web stress) over one of these services:

ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/print-template
ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/date-format
ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/query-result

The first one, print-template, seems to provoke a higher peak.

This test with Apache BenchMark and the demo application are very different from all tests I had reported before, but I think it is over the same internal issue.

My results with Apache BenchMark executed over 5 times:

chumer · 2021-04-09T13:03:10Z

@eduveks This is really weird, in my local tests using the same HostAccess instance for all contexts resolved the problem. As the cache is bound and reused for HostAccess instances. So it should be impossible to have the same method loaded twice. Are you sure you are not just building an equal HostAccess instance? To make sure, can you read the HostAccess instance from a static final? Maybe you can drop a snippet how you are doing that atm, so I can plug it into my test?

eduveks · 2021-04-09T13:26:52Z

@chumer I did another test to ensure and is the same behavior with HostAccess as static.

This is the current full code of the integration with GraalVM:

package org.netuno.psamata.script;

import org.graalvm.polyglot.*;
import org.graalvm.polyglot.Context.Builder;

import java.util.*;

public class GraalRunner {
    private static boolean graal = false;
    
    //private static Engine engine = Engine.create();
    
    private static HostAccess hostAccess = null;

    private Builder contextBuilder = null;
    
    private Context context = null;
    
    private List<Context> oldContexts = new ArrayList<>();

    static {
        try {
            Context.create().close();
            
            hostAccess = HostAccess.newBuilder()
                .allowPublicAccess(true)
                .targetTypeMapping(
                        Value.class, Object.class,
                        (v) -> v.hasArrayElements(),
                        (v) -> transformArray(v)
                ).targetTypeMapping(
                        Value.class, List.class,
                        (v) -> v.hasArrayElements(),
                        (v) -> transformArray(v)
                ).targetTypeMapping(
                        Value.class, Collection.class,
                        (v) -> v.hasArrayElements(),
                        (v) -> transformArray(v)
                ).targetTypeMapping(
                        Value.class, Iterable.class,
                        (v) -> v.hasArrayElements(),
                        (v) -> transformArray(v)
                )
                .build();
            
            graal = true;
        } catch (Throwable t) {
            graal = false;
        }
    }

    public GraalRunner(String... permittedLanguages) {
        init(null, permittedLanguages);
    }
    
    public GraalRunner(Map<String, String> options, String... permittedLanguages) {
        init(options, permittedLanguages);
    }
    
    private void init(Map<String, String> options, String... permittedLanguages) {
        //engine = Engine.create();
        
        contextBuilder = Context.newBuilder(permittedLanguages)
                .allowExperimentalOptions(true)
                .allowIO(true)
                .allowAllAccess(true)
                .allowCreateThread(true)
                .allowHostAccess(hostAccess);
        
        //https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/ResourceLimits.html
        //contextBuilder.resourceLimits(new ResourceLimits())
        
        if (options != null) {
            contextBuilder.options(options);
        }
        
        //contextBuilder.engine(engine);

        context = contextBuilder.build();
    }
    
    public void resetContext() {
        if (context != null) {
            oldContexts.add(context);
        }
        context = contextBuilder.build();
    }
    
    public void close() {
        for (Context oldContext : oldContexts) {
            try {
                oldContext.leave();
            } catch (Exception e) { }
            oldContext.close();
        }
        oldContexts.clear();
        oldContexts = null;
        try {
            context.leave();
        } catch (Exception e) { }
        context.close();
        context = null;
        //engine.close();
    }

    public static boolean isGraal() {
        return graal;
    }

    public GraalRunner set(String language, String var, Object value) {
        context.getBindings(language).putMember(var, value);
        return this;
    }

    public Set<String> keys(String language) {
        return context.getBindings(language).getMemberKeys();
    }

    public Object get(String language, String var) {
        return toObject(context.getBindings(language).getMember(var));
    }

    public boolean getBoolean(String language, String var) {
        return context.getBindings(language).getMember(var).asBoolean();
    }

    public byte getByte(String language, String var) {
        return context.getBindings(language).getMember(var).asByte();
    }

    public double getDouble(String language, String var) {
        return context.getBindings(language).getMember(var).asDouble();
    }

    public float getFloat(String language, String var) {
        return context.getBindings(language).getMember(var).asFloat();
    }

    public int getInt(String language, String var) {
        return context.getBindings(language).getMember(var).asInt();
    }

    public long getLong(String language, String var) {
        return context.getBindings(language).getMember(var).asLong();
    }

    public int getShort(String language, String var) {
        return context.getBindings(language).getMember(var).asShort();
    }

    public String getString(String language, String var) {
        return context.getBindings(language).getMember(var).asString();
    }

    public Value eval(String language, String code) {
        try {
            context.enter();
        } catch (Exception e) { }
        try {
            return context.eval(language, code);
        } finally {
            try {
                context.leave();
            } catch (Exception e) { }
        }
    }

    public static Map transformMembers(Value v) {
        Map map = new HashMap();
        for (String key : v.getMemberKeys()) {
            Value member = v.getMember(key);
            if (member.hasArrayElements() && !member.isHostObject()) {
                map.put(key, transformArray(member));
            } else if (member.hasMembers() && !member.isHostObject()) {
                map.put(key, transformMembers(member));
            } else {
                map.put(key, toObject(member));
            }
        }
        return map;
    }

    public static List transformArray(Value v) {
        List list = new ArrayList();
        for (int i = 0; i < v.getArraySize(); ++i) {
            Value element = v.getArrayElement(i);
            if (element.hasArrayElements() && !element.isHostObject()) {
                list.add(transformArray(element));
            } else if (element.hasMembers() && !element.isHostObject()) {
                list.add(transformMembers(element));
            } else {
                list.add(toObject(element));
            }
        }
        return list;
    }

    public static Object toObject(Value v) {
        if (v.isNull()) {
            return null;
        } else if (v.isHostObject()) {
            return v.asHostObject();
        } else if (v.isProxyObject()) {
            return v.asProxyObject();
        } else if (v.isBoolean()) {
            return v.asBoolean();
        } else if (v.isNumber()) {
            if (v.fitsInByte()) {
                return v.asByte();
            }
            if (v.fitsInShort()) {
                return v.asShort();
            }
            if (v.fitsInInt()) {
                return v.asInt();
            }
            if (v.fitsInLong()) {
                return v.asLong();
            }
            if (v.fitsInFloat()) {
                return v.asFloat();
            }
            if (v.fitsInDouble()) {
                return v.asDouble();
            }
        } else if (v.isString()) {
            return v.asString();
        } else {
            Value value = v.getMetaObject();
            if (value.fitsInByte()) {
                return value.asByte();
            }
            if (value.fitsInShort()) {
                return value.asShort();
            }
            if (value.fitsInInt()) {
                return value.asInt();
            }
            if (value.fitsInLong()) {
                return value.asLong();
            }
            if (value.fitsInFloat()) {
                return value.asFloat();
            }
            if (value.fitsInDouble()) {
                return value.asDouble();
            }
            try {
                return value.asBoolean();
            } catch (Exception e) {
                try {
                    return value.asString();
                } catch (Exception ex) {
                    return null;
                }
            }
        }
        return null;
    }
}

chumer · 2021-04-12T13:56:09Z

Any reason why you are doing:

try {
            context.enter();
        } catch (Exception e) { }
        try {
            return context.eval(language, code);
        } finally {
            try {
                context.leave();
            } catch (Exception e) { }
        }

With enter/leave? Any context operation should do an implicit enter/leave. In this case doing the enter and leave explicitely is actually slower.

Missing a leave on a particular thread can also potentially cause another leak. Is there any multi-threading happening for a GraalRunner instance?

eduveks · 2021-04-12T16:53:52Z

@chumer, my apologies, I detect a maven malfunction that did not change my code version in all of my tests. 🤦‍♂️

Finally, it is correct. Your workaround sharing the HostAccess works very well!

Sorry for my mistake.

This is the result of many stress, my hardest test, rolling smooth: 🥳

Thank you very much! 🍻

eduveks · 2021-04-12T17:01:02Z

To clarify, I commented all context.enter() and context.leave(). I do not remember why but I think it was an attempt to avoid the leak.

Yes, just one instance of the GraalRunner per thread always.

And again, thank you for your support.

eduveks · 2021-04-12T18:09:59Z

With this, I remembered making another test sharing the same Engine with all threads, as @chumer said.

And the performance jumped to another level, much less memory and CPU consumption overall, even with a lot of stress is like nothing is going on.

Then the key is to share Engine and HostAccess over all threads. 🚀

Finally, the killer result:

chumer · 2021-04-14T11:47:42Z

Happy to hear it all works out now as intended! :-)

eduveks added the truffle label Mar 6, 2021

munishchouhan self-assigned this Mar 10, 2021

munishchouhan assigned chumer and unassigned munishchouhan Mar 22, 2021

eduveks closed this as completed Apr 13, 2021

eduveks mentioned this issue Jun 30, 2021

Passing JS Arrays to Java object methods seem always to be returned as TruffleMap oracle/graaljs#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinkedHashMap memory leak #3260

LinkedHashMap memory leak #3260

eduveks commented Mar 6, 2021

eduveks commented Mar 6, 2021

munishchouhan commented Mar 10, 2021

chumer commented Mar 29, 2021

eduveks commented Mar 30, 2021 •

edited

chumer commented Apr 6, 2021

eduveks commented Apr 7, 2021

chumer commented Apr 7, 2021

eduveks commented Apr 7, 2021

thurka commented Apr 8, 2021 •

edited

thurka commented Apr 8, 2021

thurka commented Apr 8, 2021

thurka commented Apr 8, 2021

chumer commented Apr 8, 2021

chumer commented Apr 8, 2021

thurka commented Apr 8, 2021

chumer commented Apr 8, 2021 •

edited

chumer commented Apr 8, 2021

eduveks commented Apr 9, 2021

chumer commented Apr 9, 2021

eduveks commented Apr 9, 2021

chumer commented Apr 12, 2021

eduveks commented Apr 12, 2021

eduveks commented Apr 12, 2021

eduveks commented Apr 12, 2021

chumer commented Apr 14, 2021

LinkedHashMap memory leak #3260

LinkedHashMap memory leak #3260

Comments

eduveks commented Mar 6, 2021

eduveks commented Mar 6, 2021

munishchouhan commented Mar 10, 2021

chumer commented Mar 29, 2021

eduveks commented Mar 30, 2021 • edited

chumer commented Apr 6, 2021

eduveks commented Apr 7, 2021

chumer commented Apr 7, 2021

eduveks commented Apr 7, 2021

thurka commented Apr 8, 2021 • edited

thurka commented Apr 8, 2021

thurka commented Apr 8, 2021

thurka commented Apr 8, 2021

chumer commented Apr 8, 2021

chumer commented Apr 8, 2021

thurka commented Apr 8, 2021

chumer commented Apr 8, 2021 • edited

chumer commented Apr 8, 2021

eduveks commented Apr 9, 2021

chumer commented Apr 9, 2021

eduveks commented Apr 9, 2021

chumer commented Apr 12, 2021

eduveks commented Apr 12, 2021

eduveks commented Apr 12, 2021

eduveks commented Apr 12, 2021

chumer commented Apr 14, 2021

eduveks commented Mar 30, 2021 •

edited

thurka commented Apr 8, 2021 •

edited

chumer commented Apr 8, 2021 •

edited