Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultMessageListenerContainer: CACHE_CONSUMER with dedicated connections for Bitronix transactions [SPR-16642] #21183

Open
spring-issuemaster opened this issue Mar 26, 2018 · 3 comments

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Mar 26, 2018

Andrew Tulloch opened SPR-16642 and commented

If the error handling path in the DefaultMessageListenerContainer the JMS connection is closed and re-opened.

Combining this with CACHE_CONSUMER with bitronix results in an error from DualSessionWrapper due to the Connection being involved in multiple XA transactions during the close attempt, which bitronix rejects and DMLC logs, swallows and continues.

This results in the Connection not being closed and a new Connection obtained from the bitronix pool resulting in a gradual leak from the pool.

Utilising dedicated connections alleviates this and enables CACHE_CONSUMER to function without issue. So making this a configuration option to not use the shared connection whilst CACHE_* is enabled means we can avoid this path and only have a single thread using a dedicated connection avoiding the multiple active transaction close issue.


Affects: 5.0.4

Referenced from: pull request #1778

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Mar 26, 2018

Juergen Hoeller commented

We generally expect XA transactions to be used with CACHE_NONE there, calling a dedicated XA-aware JMS connection pool instead, not locally caching those resources within DefaultMessageListenerContainer at all. Why does the local caching make much difference in addition to the Bitronix pool that you have in use already (maybe because of a lack of consumer caching within that pool)?

I wonder whether we could specifically address this: maybe identifying the rejected connection close attempt and reusing the original connection afterwards? (Isn't it odd for the Bitronix pool to reject such a close attempt to begin with?) In any case, giving up the shared connection at DMLC level means giving up a lot of the caching benefit, so I'd rather go for some alternative solution.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Mar 26, 2018

Andrew Tulloch commented

What we're seeing specifically with CACHE_NONE is significant blocking as Bitronix is synchronized in many places and it's configured to test the connections, with a few instances and of a reasonable number of threads this means most threads are just blocked due to time taken to execute the delete/create temporary queue on the broker that is the test on ActiveMQ/BTM.

So whilst we don't need caching in the DMLC for Session or Consumer, as you point out it caches those, it avoids the trip into the synchronized Bitronix pool and means we only end up going to the pool if we experience an error rather than every message as occurs with CACHE_NONE.

I'm not clear on what benefit we lose in terms of caching, this caches the connection, session and consumer as before, just it's a unique connection for that instance of the AsyncMessageListenerInvoker. There is the disadvantage that we've now have many more connections open because we're unable to use a shared connection to the broker.

I do find it odd it rejects the connection, the connection is in use by multiple transactions on separate threads (shared connection), so it can't be closed, but you'd hope it would be deferred and returned to the bitronix pool when those other transactions were completed. It however appears it is not and the DMLC has lots it's reference, so it's left open and not in the pool. Your suggestion of handling that error on close and retaining the connection could work.

We end up here with stack traces through these two lines in BTM:
https://github.com/bitronix/btm/blob/btm-2.1.3/btm/src/main/java/bitronix/tm/resource/common/TransactionContextHelper.java#L140
https://github.com/bitronix/btm/blob/btm-2.1.3/btm/src/main/java/bitronix/tm/resource/jms/DualSessionWrapper.java#L188

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 11, 2018

Andrew Tulloch commented

I think there's a couple of subtle details in the exception handling that may not be obvious. Initially looking at:
https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L249

It appears that the exception is re-thrown after we rollback the transaction, which would theoretically provide the same behaviour if the transaction was thrown in the commit call here:
https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L251

But when you dig down to AbstractPollingMessageListenerContainer you discover that only JMS Exceptions are re-thrown from there:
https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L330

So if for example you had an OptmisticLockingException from persisting to a database there it's not re-thrown meaning the rollback call never happens, but status was already set to rollbackonly here:
https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L325

So we actually reach the commit call to the transaction manager, but btm does seem to take this as a rollback as the status is rollback only.

We've noticed that in our consumer we were calling EntityManager#save(Object), which doesn't issue a flush, causing the flush to occur on the TransactionManager commit(status) call, this could result in the OptimisticLockingException in an unfortunate place without the check for JMSException and re-throw, so the container thinks we've had an infrastructure failure, when in fact we've just had some DB Update failure.

Changing this to EntityManager#saveAndFlush(Object) has reduced this massively as an persistence related exceptions now happen doReceiveAndExecute() and result in only JMS exceptions being rethrown and btm seeing a rollback-only status and performing a transaction rollback.

Not sure if this is quite intended to work this way, but it results in very different exception handling and outcomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.