Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JdbcChannelMessageStore: poor performance for long queue of messages #2629

Closed
kamil-gawlik opened this issue Nov 14, 2018 · 11 comments
Closed
Labels
status: waiting-for-reporter Needs a feedback from the reporter

Comments

@kamil-gawlik
Copy link

Hi,
We recently faced serious performance issues with spring-integration-jdbc:5.0.7 and PostgreSQL 9.5.13.

With DB backed queue containing over 1.8 million messages, the average speed of processing stored data was 2messages per second. After a long investigation we found out, that problem was caused by the function polling messages and more precisely part checking the size of the queue.
For millions of records MessageGroupQueue.size method called following SQL from AbstractChannelMessageStoreQueryProvider.getCountAllMessagesInGroupQuery:

SELECT COUNT(MESSAGE_ID) from %PREFIX%CHANNEL_MESSAGE where GROUP_KEY=? and REGION=?

which could take even few seconds to complete (see: slow counting).

Quick fix:

Our workaround consisted on extending PostgresChannelMessageStoreQueryProvider.getCountAllMessagesInGroupQuery method to use faster count method using approach similar to presented here: fast count.

Improvement suggestion

I would suggest changing following part of MessageGroupQueue.poll:

try {
  while (this.size() == 0 && timeoutInNanos > 0) {
    timeoutInNanos = this.messageStoreNotEmpty.awaitNanos(timeoutInNanos);
  }
  message = this.doPoll();
}

to use new method this.isEmpty(), which for PosgresSQL can run following query:
select exists(select 1 from %PREFIX%channel_message where GROUP_KEY=? and REGION=?)

Using our workaround seems to be not the best solution as it returns only estimated value.

Regards, Kamil

PS. It may be connected to #2628 issue

@artembilan
Copy link
Member

Not sure why you show a MessageGroupQueue, since the story definitely should fall only into the store implementation and queries and DB configuration.

Fully not clear how that exists query can help us with the count, but what we have realized that we need to fix index to the:

CREATE INDEX INT_CHANNEL_MSG_DATE_IDX ON INT_CHANNEL_MESSAGE (GROUP_KEY, REGION, CREATED_DATE, MESSAGE_SEQUENCE);

I believe you can do that just right now on your DB and come back to us with the feedback after that.

Also I would like to say that framework doesn't call MessageGroupQueue.size() explicitly.

Or you have a problem in other place, or you your problem is really about that poll query, but not count...

@artembilan artembilan added the status: waiting-for-reporter Needs a feedback from the reporter label Nov 14, 2018
@artembilan
Copy link
Member

Oh! Sorry, I see the size() call from the poll(long timeout, TimeUnit unit).

So, I think your idea about extra isEmpty() contract on the message store will really make sense and improve our performance.

@garyrussell
Copy link
Contributor

Setting the consumer's receive timeout to 0 will avoid that size() call.

@artembilan
Copy link
Member

Yeah... Default one is:

private volatile long receiveTimeout = 1000;

Does it help somehow in your test with Oracle, Gary?

I can install MySQL though, to be sure that we do as much testing as possible.

@garyrussell
Copy link
Contributor

Yes; I am testing with oracle now; improved a little so far, but not a lot.

@garyrussell
Copy link
Contributor

Seems to me, we can replace this

while (this.queue.size() == 0 && nanos > 0) {

To simply poll() and check for non-null.

@artembilan
Copy link
Member

Yeah... Looks like our pollMessageFromGroup() is never blocked. Does not make sense to emulate blocking with an extra size() call.

@garyrussell
Copy link
Contributor

I mean this...

long nanos = TimeUnit.MILLISECONDS.toNanos(timeout);
long deadline = System.nanoTime() + nanos;
Message<?> message = this.queue.poll();
while (message == null && nanos > 0) {
	this.queueSemaphore.tryAcquire(nanos, TimeUnit.NANOSECONDS); // NOSONAR - ok to ignore result
	message = this.queue.poll();
	if (message == null) {
		nanos = deadline - System.nanoTime();
	}
}
return message;

@artembilan
Copy link
Member

Looks like you talk about the code in the QueueChannel.doReceive() and that piece with the size() is not related to our MessageGroupQueue since this one is indeed BlockingQueue where we call its own poll(long timeout, TimeUnit unit).

But I agree that that one has to be fixed as we are discussing it here.

@garyrussell
Copy link
Contributor

Oh, right; yes, of course.

@artembilan
Copy link
Member

artembilan commented Nov 15, 2018

See JIRA https://jira.spring.io/browse/INT-4553 where we are going to remove usage of that size() together with some indexes improvements.

garyrussell added a commit to garyrussell/spring-integration that referenced this issue Nov 16, 2018
JIRA: https://jira.spring.io/browse/INT-4553
Fixes spring-projects#2628
Fixes spring-projects#2629

- Avoid `size()` calls on the MGS, use `poll()` instead.
- Optimize the indexes for the `INT_CHANNEL_MESSAGE` table.
garyrussell added a commit to garyrussell/spring-integration that referenced this issue Nov 16, 2018
JIRA: https://jira.spring.io/browse/INT-4553
Fixes spring-projects#2628
Fixes spring-projects#2629

- Avoid `size()` calls on the MGS, use `poll()` instead.
- Optimize the indexes for the `INT_CHANNEL_MESSAGE` table.
garyrussell added a commit to garyrussell/spring-integration that referenced this issue Nov 20, 2018
JIRA: https://jira.spring.io/browse/INT-4553
Fixes spring-projects#2628
Fixes spring-projects#2629

- Avoid `size()` calls on the MGS, use `poll()` instead.
- Optimize the indexes for the `INT_CHANNEL_MESSAGE` table.
garyrussell added a commit to garyrussell/spring-integration that referenced this issue Nov 20, 2018
JIRA: https://jira.spring.io/browse/INT-4553
Fixes spring-projects#2628
Fixes spring-projects#2629

- Avoid `size()` calls on the MGS, use `poll()` instead.
- Optimize the indexes for the `INT_CHANNEL_MESSAGE` table.
artembilan pushed a commit that referenced this issue Nov 30, 2018
JIRA: https://jira.spring.io/browse/INT-4553
Fixes #2628
Fixes #2629

- Avoid `size()` calls on the MGS, use `poll()` instead.
- Optimize the indexes for the `INT_CHANNEL_MESSAGE` table.

Avoid size call when no timeout too.

Polishing - PR Comments

Missed a doc fix

Another missed %PREFIX%

Fix underscores

Polishing; PR comments; make MGQ extendable.

Fix version in doc.

* Polishing `@since`
* Use diamonds whenever it is possible

**Cherry-pick to 5.0.x**

# Conflicts:
#	src/reference/asciidoc/jdbc.adoc
#	src/reference/asciidoc/whats-new.adoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-reporter Needs a feedback from the reporter
Projects
None yet
Development

No branches or pull requests

3 participants