Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ shared store write optimisations #8507

Merged
merged 9 commits into from Jun 20, 2023
Merged

CQ shared store write optimisations #8507

merged 9 commits into from Jun 20, 2023

Conversation

lhoguin
Copy link
Contributor

@lhoguin lhoguin commented Jun 9, 2023

Another round of CQ shared message store optimisations and refactors, this time around writes.

The first commit is more a refactor than anything, although no longer calling fsync does help in some scenarios. The second commit should provide a noticeable boost in performance when there are no consumers.

@mkuratczyk I have renamed the branch.

@michaelklishin

This comment was marked as outdated.

Before they would only be sent periodically or when
rolling over to a new file.
We know the messages are on disk or were acked so there is no
need to do sets intersections/subtracts in this scenario.
Instead of having the message store send a message to the queue
with the confirms for messages ignored due to the flying
optimisation, we have the queue handle the confirms directly
when removing the messages.

This avoids sending potentially 1 Erlang message per 1 AMQP
message to the queue.
Also make use of the opened file for multi-reads instead
of opening/reading/closing each time.
The way I initially did this the maybe_gc would be triggered
based on candidates from 15s ago, but run against candidates
from just now. This is sub-optimal because when messages are
consumed rapidly, just-now candidates are likely to be in a
file about to be deleted, and we don't want to run compaction
on those.

Instead, when sending the maybe_gc we also send the candidates
we had at the time. Then 15s later we check if the file still
exists. If it's gone, great! No compaction to do.
@lhoguin lhoguin marked this pull request as ready for review June 20, 2023 12:54
@lhoguin
Copy link
Contributor Author

lhoguin commented Jun 20, 2023

This is ready for review/merge.

@lhoguin lhoguin requested a review from mkuratczyk June 20, 2023 12:54
@michaelklishin michaelklishin added this to the 3.13.0 milestone Jun 20, 2023
@mkuratczyk
Copy link
Contributor

Full test results:
https://snapshots.raintank.io/dashboard/snapshot/oo6eAYwUI2exFVFxdt6mTQIsb7AT2hrt?orgId=2

The biggest difference is in confirm latency, when using -c 1 (here for 5kb messages, but similar results for all tested):
Screenshot 2023-06-20 at 19 58 06

And when publishing to a long queue (no consumers):
Screenshot 2023-06-20 at 20 02 12

(the sudden drop in throughput is when the queues reach their max-length - dropping messages is currently quite expensive)

@mkuratczyk mkuratczyk merged commit 985f4e8 into main Jun 20, 2023
16 checks passed
@mkuratczyk mkuratczyk deleted the lh-msg-store-bis branch June 20, 2023 18:04
@michaelklishin
Copy link
Member

michaelklishin commented Jun 20, 2023

So, a 40% drop in publisher confirm latency on one workload, and a 60% drop on another (both have publishers and consumers online). Not bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants