Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce query limits at the resource group level #16303

Merged
merged 2 commits into from Jul 21, 2021

Conversation

kyleCampbel1
Copy link
Contributor

@kyleCampbel1 kyleCampbel1 commented Jun 20, 2021

Introduce the ability to add query limits (cpu time, execution time, and memory limit) at the resource group level. These limits can complement the SelectorResourceEstimate passed to SelectorSpec to ensure that if the query exceeds the indented query limits, we can fail the query and protect the resource group.

This is enabled by adding to the ResourceGroupSpec the field perQueryLimits, a JSON object of the optional fields executionTimeLimit, totalMemoryLimit, cpuTimeLimit which is then passed to the coordinator to implement the appropriate limits.

Test plan - Unit tests added and build success are sufficient for first commit. Second commit will have e2e by deploying fbpkg to test cluster to ensure resource group limits can be enforced by SqlQueryManager. Unit testing also on second commit.

Depends on https://github.com/facebookexternal/presto-facebook/pull/1599

== RELEASE NOTES ==

Resource Groups Changes
* Add support to specify query limits (Cpu time, total memory and execution time) at the resource group level. See :doc:`/admin/resource-groups`.

SPI Changes
* Add `ResourceGroupQueryLimits` to the SPI and the corresponding getter and setter functions in `ResourceGroups`.

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 3 times, most recently from 5e3db7e to 85f46c1 Compare June 21, 2021 14:05
@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 2 times, most recently from d4b4731 to 84c028e Compare June 23, 2021 00:22
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level - 2 main comments here -

  1. Lets rename the field names to start with perQuery and end with Limit. If the variable type is DataSize/Duration - then the name can end with Limit. However, if the fields have the type long - in that case we need to put the unit at the end of the name likes Millis or Bytes so that it is clear to the reader of the code what is the unit in question for the long value.
  2. Use requireNonNull wherever possible in constructors to ensure that we are not passing around nulls

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I realized after reviewing this PR is that the values that we passed to InternalResourceGroup is not really being used since we are just using the values from ResourceGroupConfigurationManager - lets see if we need those changes - we can make the changes a lot simpler

@mayankgarg1990
Copy link
Contributor

haven't reviewed the full PR - will continue the same tomorrow

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 2 times, most recently from 6afefbf to 1ed5b7d Compare June 24, 2021 18:01
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for commit " Add resource limits per query on a resource group "

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing some more comments

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 4 times, most recently from 7969aa4 to 9b45461 Compare June 28, 2021 16:17
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add resource limits per query on a resource group

For this commit - mostly good - just some naming issues

You should rename the commit title to Support passing per query limits in resource group configuration

The commit description starts with saying Allows for more intelligent thread allocation for queues - this diff commit doesn't do that. Please ensure that the commit description and title matches with what the commit actually does

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - please make the SelectorResult/SelectionContext changes we talked about

@mayankgarg1990
Copy link
Contributor

@tdcmeehan - this diff is ready for an initial look. We are still looking at unittests and code style fixes - but otherwise - would love to get an idea on the approach and the PR should be ready for final review by tomorrow.

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 2 times, most recently from 0757343 to 9361c39 Compare June 29, 2021 22:56
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there - this should be last major set of review comments

@kyleCampbel1 kyleCampbel1 changed the title Resource group query limits Introduce query limits at the resource group level Jul 13, 2021
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 1

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 4 times, most recently from 409786c to ccb9d9a Compare July 14, 2021 02:36
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly naming changes and things here and there

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 4 times, most recently from ebecbcf to 27fe716 Compare July 16, 2021 18:26
@kyleCampbel1 kyleCampbel1 force-pushed the ResourceGroup_query_limits branch 2 times, most recently from 4029568 to 5ff2403 Compare July 18, 2021 17:49
Setting resource group resources on a query for cpu time, execution time,
and memory usage, adds greater granularity to resource allocation.
Enables query manager to kill a query which
exceeds its resource group allocation.
@tdcmeehan tdcmeehan merged commit 2211630 into prestodb:master Jul 21, 2021
41 checks passed
@arunthirupathi
Copy link
Contributor

testTwoQueriesAtSameTime is failing most of the times. I have several open PRs and i have seen that failing at least 3 times.

https://github.com/prestodb/presto/pull/16434/checks?check_run_id=3128611696#step:7:35072

Error: Tests run: 1874, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1,800.32 s <<< FAILURE! - in TestSuite
Error: testTwoQueriesAtSameTime(com.facebook.presto.execution.resourceGroups.db.TestQueuesDb) Time elapsed: 60.014 s <<< FAILURE!

Often fails. This is the last change that touched this code. Could this be a regression or should the test be just disabled ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants