There is a serious bug in Spring Web MVC introduced in Spring 5 which results in the ExceptionHandlerMethodResolver (the one responsible for resolving @ControllerAdvice/@RestControllerAdvice exception handler methods) to lose all exception handler method associations under heavy memory/GC pressure.
We are currently using Spring 5 RC3 in an application and after about one day of runtime, our exception handler methods in a @ControllerAdvice class just won't get called anymore. They are being called perfectly once the application was started so there is no general issue with the code setup there.
This changed as of Version 5 in Spring from a ConcurrentHashMap to this soft/weak reference map. Now, the problem is that the garbage collector will clear the soft/weak references held in this map under heavy memory load such that no exception handler method will ever get called again and the server reports HTTP status code 500 because of the exception being caught by the upper-most servlet handler.
We also realized that the ExceptionHandlerMethodResolver is being built and populated by the ExceptionHandlerExceptionResolver. HOWEVER, there it is being held in a strongly referenced ConcurrentHashMap. This will result in the ExceptionHandlerExceptionResolver NOT losing the ExceptionHandlerMethodResolver, BUT the methods in the soft/weak map inside the ExceptionHandlerMethodResolver to be cleared once GC performs a full cycle.
The issue is easy to reproduce: All that is needed is a simple Web MVC project with a @ControllerAdvice annotated class with a @ExceptionHandler(MyException.class) annotated method and a thread which keeps on allocating memory up to the point where an OutOfMemoryError would occur and then clearing the memory.
Please fix this by at least making the field ExceptionHandlerMethodResolver.mappedMethods not a soft/weak map anymore (the exceptionLookupCache field may of course be weak/soft, since it is a cache).
Please find attached a minimal Spring Boot example with a JUnit test that exactly reproduces the bug.
In order for the bug to reproduce, I have to force a full GC cycle, which I do via repeated allocations until a OOME, which is caught (yes I know you should not do this, but it is necessary for the test setup :) ) and then the memory freed.