Fix resource group concurrency for multi coordinator #16956

swapsmagic · 2021-11-08T18:54:41Z

When a burst of traffic comes to coordinator, it ended up running more than allowed queries.
Two reasons for that:

In multi coordinator, we were not stamping last running query for non leaf resource groups, which lead to
shouldWaitForResourceManagerUpdate to return always true for non leaf resource groups. So if the traffic is coming from
lot of different resource groups, coordinator end up running less than allowed in each resource group but at root level it
ends up running more.
ResourceManagerResourceGroupService cache end up having stale resource group info which also end up allowing coordinators
to run more than allowed queries at a cluster level.
As part of this diff we are fixing 1 by stamping last running query to all it's parent resource groups. And to address 2, making
cache refresh rate and expiration configerable.

Test plan - Unit test + regression test on verifier cluster

== RELEASE NOTES ==

General Changes
* Fix resource group concurrency for multi coordinator

When a burst of traffic comes to coordinator, it ended up running more than allowed queries. Two reasons for that: 1. In multi coordinator, we were not stamping last running query for non leaf resource groups, which lead to shouldWaitForResourceManagerUpdate to return always true for non leaf resource groups. So if the traffic is coming from lot of different resource groups, coordinator end up running less than allowed in each resource group but at root level it ends up running more. 2. ResourceManagerResourceGroupService cache end up having stale resource group info which also end up allowing coordinators to run more than allowed queries at a cluster level. As part of this diff we are fixing 1 by stamping last running query to all it's parent resource groups. And to address 2, making cache refresh rate and expiration configerable.

ajaygeorge · 2021-11-09T02:53:21Z

...o-main/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroup.java

@@ -152,7 +152,7 @@
    private final CounterStat timeBetweenStartsSec = new CounterStat();

    @GuardedBy("root")
-    private AtomicLong lastRunningQueryStartTime = new AtomicLong();
+    private AtomicLong lastRunningQueryStartTime = new AtomicLong(currentTimeMillis());


can you help me understand why initializing with currentTimeMillis() is important?

The reason for this is: internal resource groups created as and when needed, and this make sure lastRunningQueryStartTime for the newly created resource group won't be set to 0. This helps make sure we won't end up letting resource group to run more than allowed queries in case RM updates are delayed and we wait for the RM update before running more queries on the resource group.

swapsmagic force-pushed the fix_resource_group_concurrency branch 2 times, most recently from 840cf32 to 9f6a51a Compare November 8, 2021 21:30

swapsmagic force-pushed the fix_resource_group_concurrency branch from 9f6a51a to 7cfc76c Compare November 8, 2021 23:09

swapsmagic requested review from tdcmeehan, vaishnavibatni, abhiseksaikia and neeradsomanchi November 8, 2021 23:27

ajaygeorge reviewed Nov 9, 2021

View reviewed changes

swapsmagic requested a review from ajaygeorge November 9, 2021 17:47

tdcmeehan approved these changes Nov 10, 2021

View reviewed changes

tdcmeehan merged commit 48fadf1 into prestodb:master Nov 10, 2021

varungajjala mentioned this pull request Nov 23, 2021

Add release notes for 0.266 #17030

Merged

2 tasks

hackeryang mentioned this pull request Apr 12, 2023

[Design] Disaggregated Presto Coordinators #15453

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix resource group concurrency for multi coordinator #16956

Fix resource group concurrency for multi coordinator #16956

swapsmagic commented Nov 8, 2021 •

edited

Loading

ajaygeorge Nov 9, 2021

swapsmagic Nov 9, 2021

Fix resource group concurrency for multi coordinator #16956

Fix resource group concurrency for multi coordinator #16956

Conversation

swapsmagic commented Nov 8, 2021 • edited Loading

ajaygeorge Nov 9, 2021

Choose a reason for hiding this comment

swapsmagic Nov 9, 2021

Choose a reason for hiding this comment

swapsmagic commented Nov 8, 2021 •

edited

Loading