Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using Lingua in web applications #116

Closed
dnb-erik-brangs opened this issue Nov 30, 2021 · 7 comments
Closed

Memory leak when using Lingua in web applications #116

dnb-erik-brangs opened this issue Nov 30, 2021 · 7 comments
Labels
bug Something isn't working
Milestone

Comments

@dnb-erik-brangs
Copy link

As discussed in #110 , there seems to be a memory leak when using Lingua in a Java web application. I've uploaded an example application at https://github.com/deutsche-nationalbibliothek/lingua-reproducer-memory-leak . The README contains some information about the problem. Please let me know if you need more information.

@pemistahl pemistahl added the bug Something isn't working label Dec 4, 2021
@pemistahl pemistahl added this to the Lingua 1.1.1 milestone Dec 4, 2021
@pemistahl
Copy link
Owner

pemistahl commented Dec 4, 2021

Thank you, Erik, for this example application. I was able to reproduce the OutOfMemoryError by redeploying the app in Jetty multiple times. I don't know yet how to solve the coroutine problem. But I found out that memory consumption is significantly reduced when the maps holding the loaded language models are cleared before the servlet is destroyed. I've written a new method unloadLanguageModels() for this purpose.

In your servlet, try the following:

@Override
public void destroy() {
    super.destroy();
    detectorForAllLanguages.unloadLanguageModels();
    detectorForAllLanguages = null;
}

The consumed memory still increases with each redeployment but the increase is much smaller now. The number of opened threads is still very high after several redeployments, unfortunately. I will continue to investigate how to optimize the coroutines or whether it makes sense to replace the coroutines with simple Java Futures.

According to this issue, the coroutines' default dispatcher creates a worker internally which is never shut down. This could be the source of the problem.

@dnb-erik-brangs
Copy link
Author

Thanks.

The API description for the CoroutineDispatcher says that

An arbitrary java.util.concurrent.Executor can be converted to a dispatcher with the asCoroutineDispatcher extension function.

It seems to me that it might be possible to use a custom CoroutineDispatcher which uses a custom java.util.concurrent.Executor, e.g. a ThreadPoolExecutor. The application could then manually shut down the executor on undeploy, e.h. by calling shutdown(). However, I don't know whether that would be a good idea.

@pemistahl
Copy link
Owner

After doing some experiments, I've found that coroutines do not bring much value to the library. I've replaced them with a fixed thread pool that executes a bunch of Callables. It turns out that this has the same performance. Additionally, I've renamed the method unloadLanguageModels() to destroy() which shuts down the thread pool and removes the language models from memory. This way, the memory consumption does not increase anymore after five or six redeployments on my machine.

Can you please try for yourself? If you can live with the current improvements, I will close this issue and prepare release 1.1.1. Thank you.

@dnb-erik-brangs
Copy link
Author

Thank you. The memory leaks and thread leaks are now fixed.

However, the current implementation of thread pools is not ideal for web applications. Web applications generally profit from being able to manage the thread pools (and their life cycle) themselves. For example, Runtime.getRuntime().availableProcessors() may return "high" numbers like 24, 48 or 64. With a fixed thread pool, those threads are all kept around even if they don't have any work. It's unclear to me if that many threads are needed.

On Jakarta EE or Java EE application servers, it may also be desirable to use other ExecutorService implementations (e.g. a ManagedExecutorService configured via the server).

Should I open a new issue for that?

@pemistahl
Copy link
Owner

pemistahl commented Dec 7, 2021

Web applications generally profit from being able to manage the thread pools (and their life cycle) themselves.

Well, are you aware of the fact that coroutines have their own thread pool, too? Users of my library should not care about creating their own thread pool in order to use it. If you worry about too many idle threads, I can use a fixed thread pool that automatically terminates its threads after a certain amount of time. In any case, the parallel processing should remain an internal implementation detail that users do not have to care about.

I'm sure that my library can be properly used in web applications as well with decent performance. If you have a different opinion, then again please give me a concrete example where the internal thread pool is a bottleneck.

But yes, please open a new issue for that. This issue here is only about the memory leak problem which has been fixed. I want to release version 1.1.1 soon without any more additions. New features will have to wait until version 1.2.0 is released.

@dnb-erik-brangs
Copy link
Author

If you worry about too many idle threads, I can use a fixed thread pool that automatically terminates its threads after a certain amount of time.

Yes, that would be good. For example, you could use a ThreadPoolExecutor with allowCoreThreadTimeOut set to true.

I'm sure that my library can be properly used in web applications as well with decent performance.

I apologize. I obviously did not express myself clear enough.

I was not talking about performance. I was thinking more about better integration with the container. I've created #119 for that.

@pemistahl
Copy link
Owner

For example, you could use a ThreadPoolExecutor with allowCoreThreadTimeOut set to true.

Good idea. I've just implemented exactly that. Thanks for opening issue #119. I will deal with it when I prepare release 1.2.0. I'm closing this issue now as the problem is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants