Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip history optimizer query registration if previous registration timeout #21308

Merged
merged 2 commits into from Nov 9, 2023

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Nov 3, 2023

Description

Part of #20355

In current code, HBO optimizer runs twice. If in the first run, the query registration timeout, it's very likely that the second run will timeout too. Instead of run both and timeout twice, in this PR, I add code to record the timeout of the first run, so that the second run will skip if the first run timeout.

Motivation and Context

Reduce latency for HBO.

Impact

HBO latency for queries which timeout during HBO query registration will be half as of now. For example, if timeout limit is 10 seconds, and the total latency caused by HBO optimizer which timeout will be 2*10 = 20 seconds. After this change, it will be 10 seconds.

Test Plan

Test locally end to end, optimizer latency decreased from 25.44s to 10.2s


Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve history optimizer latency for queries which timeout during query registration in history optimizer

@feilong-liu feilong-liu requested a review from a team as a code owner November 3, 2023 20:19
@feilong-liu feilong-liu marked this pull request as draft November 3, 2023 20:19
@@ -142,7 +151,7 @@ private Map<PlanCanonicalizationStrategy, PlanNodeWithHash> getPlanNodeHashes(Pl

private PlanNodeStatsEstimate getStatistics(PlanNode planNode, Session session, Lookup lookup, PlanNodeStatsEstimate delegateStats)
{
if (!useHistoryBasedPlanStatisticsEnabled(session) || historyBasedStatisticsCacheManager.loadHistoryFailed(session.getQueryId())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadHistoryFailed is a macro optimization for HBO latency added in #19933. For queries which do not load any history statistics, we will skip attempt to read history stats. However, by the design, all statistics are cached during query registration, and the read here will be reading from cache, hence the saving is small. On the other hand, we run HBO optimizer twice, having one run reading empty does not mean the second time will still be empty. Hence remove this macro optimization here.

@feilong-liu feilong-liu marked this pull request as ready for review November 3, 2023 20:51
This cache adds a query ID when HBO failed to fetch history stats for
this query. Later during cost based planning, it will skip reading history
stats. However, the saving is small as the reading will be from cache.
And currently we run HBO optimizer twice, getting empty stats for one run
does not necessarily mean getting empty stats for the other run. This is
an over optimization and I remove it here.
Currently we run HBO optimizer twice. When the first run timeout,
it's likely that the second run will timeout too. Here we record
the timeout, and skip registration for the next run.
Copy link
Contributor

@ajaygeorge ajaygeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@feilong-liu feilong-liu merged commit dae8a28 into prestodb:master Nov 9, 2023
56 checks passed
@feilong-liu feilong-liu deleted the no_double_hbo_timeout branch November 9, 2023 23:30
@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants