Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit batching of operations to improve performance #159

Open
romix opened this issue May 11, 2012 · 2 comments

Comments

Projects
None yet
7 participants
@romix
Copy link

commented May 11, 2012

In some cases, you do a lot of updates or reads over a short period of time.

If you know this in advance, you can explicitly batch it using putAll or getAll. But this requires this knowledge and it affects the code of the system, which should be adapted to use these two calls.

In many cases, you don't know in advance which concrete entries will be updated/read (e.g. these operations are done by different parts of the code based on some dynamic conditions). So, you use the usual puts and gets. As a result, a lot of put/get operations are issues over a short period of time. When processed one-by-one it is rather inefficient under load.

Idea:

  • Add support for configurable implicit batching of operations. A user can define the max length of the batch queue (i.e. the max number of operations in one batch) and max time, which HZ uses to wait for more operations to come. Also max-time-in-a-batch-queue can be introduced to tell how long a given operation can wait in the queue, before it is sent out for processing.
  • When operations are called, HZ queues them into a batch queue and checks if the max queue size is reached. If so, a batch request is sent (similar to putAll/readAll).
  • Alternatively, if there were no new operations added to the queue over max-time period, a batch request is formed and the queue is flushed
  • If max-time-in-a-batch-queue timeout expires for certain entries, they can be batched together and sent for processing

When it can be applied:

  • async operations
  • sync operations from different threads (assuming they are independent). Sync operations from the same thread cannot be batched for obvious reason - they are blocking, i.e. there is at most one pending sync operation on each thread.

Benefits:

  • Performance benefits are expected to be similar to the benefits obtained from implementing putAll/getAll methods by means of batching. As you may remember, these two operations used to be rather slow, before batching was introduced. Once it was introduced, it gives a significant performance increase. I've seen 3-4 times better throughput in many cases.
  • This change does not require any changes in the user's code. Only configuration needs to be updated to enable it.

Remark: Something similar exists for Infinispan under the name Replication Queue: https://docs.jboss.org/author/display/ISPN/Asynchronous+Options#AsynchronousOptions-ReplicationQueue

@mdogan mdogan added PENDING and removed Team: Core labels May 28, 2014

@bwzhang2011

This comment has been minimized.

Copy link

commented Oct 28, 2014

@mdogan , how does such issue going on ?

@jerrinot jerrinot added this to the Backlog milestone May 12, 2015

@bwzhang2011

This comment has been minimized.

Copy link

commented May 29, 2015

@jerrinot, any idea for batch processing further support or design ?

@pveentjer pveentjer changed the title Implciit batching of operations to improve performance Implicit batching of operations to improve performance May 30, 2015

@enesakar enesakar removed the PENDING label Nov 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.