Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
FEAT: async post-commit #381
Asynchronous post commit
The post-commit phase of the transaction, the purpose of which is to effectively clean the undo logs that were being used during the transaction, can take a lot of time, especially when the transaction has performed many TX_FREE operations, as the bulk of that operation is performed once the transaction finishes. The thing is though, the time-frame in which this phase is finished is completely irrelevant to the correctness of the system. Currently it is performed sequentially after finishing the transaction, performing the pre-commit phase (flushing of data), and marking the transaction as committed (on the on-media layout).
The idea is to run the post-commit phase in a separate worker thread that runs in the background. The worker would perform everything that is currently performed in the post-commit phase of the transaction (cleanup the vectors, perform TX_FREE and so on) and additionally would release the lane once finished.
This mechanism would be opt-in and the user would have to provide the library with a thread that will be allowed to run for the entire life of the application.
This optimization makes the transactional free operation almost free from the perspective of the calling thread, as the actual free would be offloaded to a completely separate thread.
Two new CTL entry point would be added:
The thread entry points takes a pointer to an existing thread that runs the worker function and queue depths defines how many transactions can wait in the workers queue before every other calling thread would have to wait - this is to limit the potential issue with using too many lanes (as lanes would only be released once the worker thread finishes the task).
The worker thread would be implemented using a multiple produced single consumer queue (a circular buffer with queue depth defining its size). Due to the fact that the actual performance of the worker doesn't matter that much, it would spend most of its time sleeping and only periodically waiting to check if there's work to be performed.
I've implemented a simple benchmark that preallocates objects and then creates workers that free them inside of a transaction. Each worker frees
As you can see, the benefit can be quite significant when a single transaction performs a lot of work (the data shows up to 6x improvement). The biggest benefit is for workloads that perform big transactions and then perform a different task. There's also no significant adverse effects when the CPU is over provisioned (i.e. more threads then CPU cores), the post-commit workers simply sleep most of the time (but that's assuming the queue length is smaller than the number of lanes).
For very tiny transactions, the benefit is very small because we add communication overhead to already small amount of work.
There's also are very noticeable diminishing returns when increasing the number of post-commit workers, this is because actual worker threads eventually hit the threshold of how fast they can perform the transactions.
As for tree benchmarks, they are not a relevant workload for this feature, at least in the way the