-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Readers-Writer lock for resource group #17131
base: master
Are you sure you want to change the base?
Conversation
8385813
to
62e71a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some unit test to make sure the methods fail when the thread has not acquired a lock beforehand is great. (i.e. around internalStartNext, startInBackground, enqueueQuery etc).
Also for the the shadow run, what exactly we tested? The test should cover the cluster with ~2K queries on it (running + queued), we should check if adjusted queue size metric is still publishing data and not timing out.
} | ||
else { | ||
id = new ResourceGroupId(name); | ||
root = this; | ||
this.lock = new ReentrantReadWriteLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With default acquisition policy being non-fair where order of entry of read/write lock is unspecified, it make sense to use fair policy instead the default one.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html
Acquisition order
This class does not impose a reader or writer preference ordering for lock access. However, it does support an optional fairness policy.
Non-fair mode (default)
When constructed as non-fair (the default), the order of entry to the read and write lock is unspecified, subject to reentrancy constraints. A nonfair lock that is continuously contended may indefinitely postpone one or more reader or writer threads, but will normally have higher throughput than a fair lock.
Fair mode
When constructed as fair, threads contend for entry using an approximately arrival-order policy. When the currently held lock is released, either the longest-waiting single writer thread will be assigned the write lock, or if there is a group of reader threads waiting longer than all waiting writer threads, that group will be assigned the read lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some unit test to make sure the methods fail when the thread has not acquired a lock beforehand is great. (i.e. around internalStartNext, startInBackground, enqueueQuery etc).
Also for the the shadow run, what exactly we tested? The test should cover the cluster with ~2K queries on it (running + queued), we should check if adjusted queue size metric is still publishing data and not timing out.
My initial shadow run was mainly to make sure there are no deadlock introduced. I will do more testing with both fair and non-fair mode. I was a bit worried to use fair mode as there is an impact on throughput, but I will do more testing to verify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on testing and also digging deeper into the code, it looks like reader-writer lock is not going to help much here as we have more write locks compared to read locks when the query throughput is high. Also based on testing, metrics (not just adjusted queue size, but all other metrics) are stopped getting published when cpu utilization is consistently ~83% for longer time, so the issue seems to be not due to lock contention. I have converted this PR to draft for future reference in case we want to optimize locking further (places where we feel we can downgrade locking from write to read).
There are endpoints like /v1/queryState endpoint that takes resource group lock while fetching pathToRoot for a given resource group. This leads to lock contention when there are few thousand queries queued on a cluster. We can use Readers-Writer lock to avoid lock contention.
62e71a6
to
7ba62a2
Compare
There are endpoints like /v1/queryState that takes resource group lock while fetching pathToRoot. This leads to lock contention when there are few thousand queries queued on a cluster. We can use Readers-Writer lock to avoid lock contention.
Test plan -