Memory leak when using Lingua in web applications #116

dnb-erik-brangs · 2021-11-30T11:50:15Z

As discussed in #110 , there seems to be a memory leak when using Lingua in a Java web application. I've uploaded an example application at https://github.com/deutsche-nationalbibliothek/lingua-reproducer-memory-leak . The README contains some information about the problem. Please let me know if you need more information.

pemistahl · 2021-12-04T20:23:41Z

Thank you, Erik, for this example application. I was able to reproduce the OutOfMemoryError by redeploying the app in Jetty multiple times. I don't know yet how to solve the coroutine problem. But I found out that memory consumption is significantly reduced when the maps holding the loaded language models are cleared before the servlet is destroyed. I've written a new method unloadLanguageModels() for this purpose.

In your servlet, try the following:

@Override
public void destroy() {
    super.destroy();
    detectorForAllLanguages.unloadLanguageModels();
    detectorForAllLanguages = null;
}

The consumed memory still increases with each redeployment but the increase is much smaller now. The number of opened threads is still very high after several redeployments, unfortunately. I will continue to investigate how to optimize the coroutines or whether it makes sense to replace the coroutines with simple Java Futures.

According to this issue, the coroutines' default dispatcher creates a worker internally which is never shut down. This could be the source of the problem.

dnb-erik-brangs · 2021-12-06T13:37:20Z

Thanks.

The API description for the CoroutineDispatcher says that

An arbitrary java.util.concurrent.Executor can be converted to a dispatcher with the asCoroutineDispatcher extension function.

It seems to me that it might be possible to use a custom CoroutineDispatcher which uses a custom java.util.concurrent.Executor, e.g. a ThreadPoolExecutor. The application could then manually shut down the executor on undeploy, e.h. by calling shutdown(). However, I don't know whether that would be a good idea.

pemistahl · 2021-12-07T14:00:35Z

After doing some experiments, I've found that coroutines do not bring much value to the library. I've replaced them with a fixed thread pool that executes a bunch of Callables. It turns out that this has the same performance. Additionally, I've renamed the method unloadLanguageModels() to destroy() which shuts down the thread pool and removes the language models from memory. This way, the memory consumption does not increase anymore after five or six redeployments on my machine.

Can you please try for yourself? If you can live with the current improvements, I will close this issue and prepare release 1.1.1. Thank you.

dnb-erik-brangs · 2021-12-07T15:28:11Z

Thank you. The memory leaks and thread leaks are now fixed.

However, the current implementation of thread pools is not ideal for web applications. Web applications generally profit from being able to manage the thread pools (and their life cycle) themselves. For example, Runtime.getRuntime().availableProcessors() may return "high" numbers like 24, 48 or 64. With a fixed thread pool, those threads are all kept around even if they don't have any work. It's unclear to me if that many threads are needed.

On Jakarta EE or Java EE application servers, it may also be desirable to use other ExecutorService implementations (e.g. a ManagedExecutorService configured via the server).

Should I open a new issue for that?

pemistahl · 2021-12-07T19:46:22Z

Web applications generally profit from being able to manage the thread pools (and their life cycle) themselves.

Well, are you aware of the fact that coroutines have their own thread pool, too? Users of my library should not care about creating their own thread pool in order to use it. If you worry about too many idle threads, I can use a fixed thread pool that automatically terminates its threads after a certain amount of time. In any case, the parallel processing should remain an internal implementation detail that users do not have to care about.

I'm sure that my library can be properly used in web applications as well with decent performance. If you have a different opinion, then again please give me a concrete example where the internal thread pool is a bottleneck.

But yes, please open a new issue for that. This issue here is only about the memory leak problem which has been fixed. I want to release version 1.1.1 soon without any more additions. New features will have to wait until version 1.2.0 is released.

dnb-erik-brangs · 2021-12-08T09:43:50Z

If you worry about too many idle threads, I can use a fixed thread pool that automatically terminates its threads after a certain amount of time.

Yes, that would be good. For example, you could use a ThreadPoolExecutor with allowCoreThreadTimeOut set to true.

I'm sure that my library can be properly used in web applications as well with decent performance.

I apologize. I obviously did not express myself clear enough.

I was not talking about performance. I was thinking more about better integration with the container. I've created #119 for that.

pemistahl · 2021-12-12T10:17:41Z

For example, you could use a ThreadPoolExecutor with allowCoreThreadTimeOut set to true.

Good idea. I've just implemented exactly that. Thanks for opening issue #119. I will deal with it when I prepare release 1.2.0. I'm closing this issue now as the problem is fixed.

pemistahl added a commit that referenced this issue Dec 4, 2021

Add method for unloading language models (#116)

0e91ee9

pemistahl added the bug Something isn't working label Dec 4, 2021

pemistahl added this to the Lingua 1.1.1 milestone Dec 4, 2021

pemistahl added a commit that referenced this issue Dec 7, 2021

Replace coroutines with fixed thread pool and futures (#116)

fd80c09

pemistahl added a commit that referenced this issue Dec 7, 2021

Replace coroutines with fixed thread pool and futures (#116)

f7c6f0e

pemistahl added a commit that referenced this issue Dec 12, 2021

Allow core thread timeout on thread pool (#116)

7ad60fd

pemistahl closed this as completed Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when using Lingua in web applications #116

Memory leak when using Lingua in web applications #116

dnb-erik-brangs commented Nov 30, 2021

pemistahl commented Dec 4, 2021 •

edited

Loading

dnb-erik-brangs commented Dec 6, 2021

pemistahl commented Dec 7, 2021

dnb-erik-brangs commented Dec 7, 2021

pemistahl commented Dec 7, 2021 •

edited

Loading

dnb-erik-brangs commented Dec 8, 2021

pemistahl commented Dec 12, 2021

Memory leak when using Lingua in web applications #116

Memory leak when using Lingua in web applications #116

Comments

dnb-erik-brangs commented Nov 30, 2021

pemistahl commented Dec 4, 2021 • edited Loading

dnb-erik-brangs commented Dec 6, 2021

pemistahl commented Dec 7, 2021

dnb-erik-brangs commented Dec 7, 2021

pemistahl commented Dec 7, 2021 • edited Loading

dnb-erik-brangs commented Dec 8, 2021

pemistahl commented Dec 12, 2021

pemistahl commented Dec 4, 2021 •

edited

Loading

pemistahl commented Dec 7, 2021 •

edited

Loading