Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUARKUS-3230 Write combining #428

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

franz1981
Copy link

@franz1981 franz1981 commented Aug 22, 2023

This is proposing a pattern which reactive folks are pretty used to, see https://akarnokd.blogspot.com/2015/05/operator-concurrency-primitives_11.html

What it solves?

  • concurrent writers won't compete awaiting each others to acquire the lock, but elect a "combiner" thread which drain their request
  • a single thread (light contention or fast disks) won't use the in-flight concurrent queue (which is good), but would still pay the cost of 2 more atomic ops (increment/decrement) of the shared write counter, if compared to the existing implementation

What it doesn't solve (and can make things worse, actually!):

  • a slow disk means that combiner thread can keep on accumulating writes from other threads, going OOM (a reentrant lock won't do it, unless the caller is a virtual thread: in such case it can still happen on the wait list hidden in the lock)
  • it doesn't exploit batched flushes (which can be very beneficial in case flush implies fsync - which can be deferred at the end of a batch)

The second thing could be achieved using a slightly different pattern which doesn't optimize for a single-writer thread/light concurrency as this and focus on reducing the amount of atomic operations and using null Queue::poll to mark the natural "end of a batch" state (see https://github.com/eclipse-vertx/vert.x/blob/04d4d65fb47a3db4cd8a320795a397f1ae678173/src/main/java/io/vertx/core/net/impl/pool/CombinerExecutor.java or ValueQueueDrainOptimized in the mentioned blog post for alternative patterns).

What's interesting of this PR is that the shared counters can be used to track in-flight still not completed requests and can be used to perform some form of adapative back-pressure, eg

  • if the number of in-flight requests (in bytes or number of items) exceed a threshold, the write request can bring the current thread as a parker, and park awaiting its write request to be completed
  • if it happens, the combiner will execute it and unpark it

@geoand @dmlloyd

@franz1981 franz1981 requested a review from jamezp as a code owner August 22, 2023 08:12
@franz1981 franz1981 changed the title QUARKUS-3230 Cooperative write batching QUARKUS-3230 Cooperative write Aug 22, 2023
@franz1981 franz1981 changed the title QUARKUS-3230 Cooperative write QUARKUS-3230 Write combining Aug 22, 2023
Copy link
Member

@jamezp jamezp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this. The OOME is a concern for me when a lot of logging happens with slow I/O. This handler is used for writing to various streams.

@franz1981
Copy link
Author

franz1981 commented Aug 22, 2023

@jamezp Re the OOM I can implement different backpressure strategies. The simpler one is the more punitive, where an incoming producer is blocked if its write request make the backlog to become too big, and wait till the whole backlog (till now) is cleared up.
It won't save the day in the case of virtual threads, but the original code will fall as well...

Others could employ other mechanisms (using semaphore or other lazy counters), more fair or unfair, to block or slow down producers.
Let me know If you want me to commit a draft of one of these, but I think we need to make sure the original issue is fixed with this already in this form. The backpressure mechanism is to make this prod ready, I think

@jamezp
Copy link
Member

jamezp commented Aug 22, 2023

@franz1981 I think we need to do something and IMO blocking is okay. We are eventually bound to I/O and we can't risk losing log messages. I'm not sure what the best approach will be, but I think we need to do something. It will likely be rare that we'd see an issue, but it's definitely possible.

@franz1981
Copy link
Author

@jamezp I've implemented the simplest backpressure strategy, somehow similar to what it happen today (let's say)

eg right now everyone battle for the same lock, get enqueued as a waiter and awaken again when got its chance to enter the lock. It means that if a Thread is the very last to enter (assuming a fair enqueue, for simplicity), it will take all the previous and the current requests before being able to make it progress.

What I've implemented is exactly the same behaviour: if the number of waiters become "too much" (now hardcoded), a new incoming one would block till its request is completed.

@dmlloyd
Copy link
Member

dmlloyd commented Aug 24, 2023

How does this compare to using the async handler?

@franz1981
Copy link
Author

franz1981 commented Aug 28, 2023

@dmlloyd this one doesn't need any background thread, it perform everything in foreground, electing s different "combiner" everytime ie who execute others concurrent actions.

It's no different (especially the backpressure strategy) than a lock, but allow some degree of concurrency, which a lock doesn't allow

@jamezp
Copy link
Member

jamezp commented Sep 5, 2023

TBH this makes me a little nervous. We introduce a queue which in theory could get overloaded and cause issues. If we feel performance is an issue, the AsyncHandler could easily be used. If we want to use something like this, we should have a way to either block or discard messages like the AsyncHandler already does.

If the queue becomes overloaded we have potential OOME issues and we can't log it, because it happens in logging.

@franz1981
Copy link
Author

franz1981 commented Sep 5, 2023

I have already implemented a backpressured strategy @jamezp which allow a default max level of concurrency which behave exactly the same of a reentrant lock.

We introduce a queue which in theory could get overloaded and cause issues

The ratio behind this pr is the presence of a queue/stack already in the reentrant lock, when there are too many thread contending on it: the synchronizer behind it enqueue the current thread as a waiter in a linked stack/queue, having the same behaviour of this pr, but without any mechanism to make progress and let the blocked thread to move on (indeed the threads doesn't cooperate but compete for such lock). This pr grant a much better progress guarantees in this specific scenario.

If the queue becomes overloaded we have potential OOME issues and we can't log it

This can happen the same, but having a concurrency level of 0 (the lock ensure no concurrency to happen) it slow down so much the application that y reduce the chances to make it happen, but if the logger is called within a blocking thread pool we still risk it to grow unbounded or till max capacity, causing OOM due to native threads creation, which is not any better.

If you don't feel this to be the right direction, please raise it on the QUARKUS issue which name this pr and we can either mark it as invalid or solve it differently

@jamezp
Copy link
Member

jamezp commented Sep 5, 2023

Sorry about that @franz1981. I see the blocking/back-pressure now. Not sure how I missed that.

I'm not necessarily opposed to this. It does seem complicated for what appears to be a simple I/O write, but I understand why. I'd like to get @dmlloyd's opinion here too as he understands the Quarkus side much better than I do.

@jamezp
Copy link
Member

jamezp commented Oct 12, 2023

Just saw a question asking for updates on this in the Quarkus JIRA :)

My thought is this seems like a lot of added complication and I'm not sure what we gain. Do we have a measurement of how much this improves performance? While much more naive, the AyncHandler would likely have similar performance I'd think. In either case we don't get away from a synchronous write.

That said, I'd like @dmlloyd's opinion too.

@dmlloyd
Copy link
Member

dmlloyd commented Oct 12, 2023

What about only batching flushes? Something like this in the write path:

lock.lock();
try {
   write();
   if (autoFlush && ! lock.hasQueuedThreads()) {
       // nobody else is waiting to write, so flush
       flush();
   }
} finally {
   lock.unlock();
}

Then only the last waiter will flush, effectively batching them. Relying on hasQueuedThreads might be fragile though for various reasons; we might need some kind of "reliable" version of that (for example keeping a separate wait counter or something).

This avoids additional data structures and avoids the disadvantage that the async handler has that caller info may have to be materialized. It would work as a general performance improvement I think, just reducing the total number of flushes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants