Improved scalability of the BlockingIdentifierGenerator #1611

Sanne · 2023-04-24T17:09:27Z

As a follow up to the ideas discussed on #1610 :

the current design was introduced in #1608 and seems to behave correctly under load, but there are two concerns:

concurrent events leading to multiple next(hi) events might leave some more gaps in the sequence unitilized; this is not a problem but could be considered wasteful use of the limited ranges available, especially under load.
the use of the CombinerExecutor is guaranteeing safety and co-operation among the reactive threads, but in certain scenarious might become a scalability problem.

Current scalability limitations

To elaborate on the scalability issue, imagine you have 100+ clients ("threads" of reactive flows) and a blocksize of 50; on need to increment the hi, a single thread will issue a nextval message to the DB and this will replenish the pool in some time T which is the typical latency of performing a nextval operation (including its roundtrip to the DB), while all other threads are waiting; if the number of clients is higher than the blocksize (e.g. 100 clients and blocksize 50 would not me unthinkable) some of such clients will need to wait an additional T, potientally multiples.

If the load is high enough some clients will need to wait many T periods of time, and there is no fairness guarantee, implying a service problem.

Impact of disabling the hi/lo pool

Ironically, while the hi/lo algorithm is designed to improve performance, in such scenarios it seems an application would be better off w/o hi/lo: if each single "thread" is simply instructed to proceed with performing a nextval operation directly on the DB, then each such operation would have a constant latency of T. While using hi/lo would allow for some operations to complete faster, at cost of others potentially having nT latency. having it off actually guarantees a stable latency of T for each single request, and I'd argue that this is a very desireable property in production systems, as consistent latency is a great attribute.

Having both benefits

The conclusion is that we should re-design our implementation of the hi/lo pool in such a way that having a blocksize of 50 reduces by 50 times the amount of roundtrips to the DB, but w/o the scalability drawbacks.
This is not extremely hard: what we need to do is that when a request to the DB to refresh the "hi" value is issued, only 50 clients are allowed to wait for it - additional clients in need for an ID will need to trigger a different request to the DB.
This implies that multiple "hi refresh" operations are being waited on concurrently, and we won't be guaranteeing ordering among them as long as each returned block is redistributed among the set of clients we decided to serve and no more; this does imply that generated identifiers won't be monotonic.

The text was updated successfully, but these errors were encountered:

Sanne added enhancement New feature or request performance labels Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved scalability of the BlockingIdentifierGenerator #1611

Improved scalability of the BlockingIdentifierGenerator #1611

Sanne commented Apr 24, 2023

Improved scalability of the BlockingIdentifierGenerator #1611

Improved scalability of the BlockingIdentifierGenerator #1611

Comments

Sanne commented Apr 24, 2023

Current scalability limitations

Impact of disabling the hi/lo pool

Having both benefits