Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Metastore Code #22671

Merged
merged 1 commit into from
May 14, 2024

Conversation

konjac-h
Copy link
Contributor

@konjac-h konjac-h commented May 5, 2024

Description

Optimize the Code by reducing unnecessary for loop. Reducing time complexity from O(2n) to O(n)

Motivation and Context

See Description Above

Impact

Improve time complexity of Metastore

Test Plan

Simple Code Optimization.
Ran Verifier test to ensure nothing breaks

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

== NO RELEASE NOTE ==

@konjac-h konjac-h requested a review from a team as a code owner May 5, 2024 11:28
@konjac-h konjac-h requested a review from presto-oss May 5, 2024 11:28
elharo
elharo previously requested changes May 5, 2024
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test

@konjac-h
Copy link
Contributor Author

konjac-h commented May 6, 2024

@elharo Hey thanks for reviewing. I totally agree that we should have a regression test. I filed a separate task, T188102626, to track it but I don't think the test will be the blocker of this PR given that this is a really straightforward optimization improving the time complexity from O(2n) to O(n)

@steveburnett
Copy link
Contributor

The checkbox for "If release notes are required, they follow the release notes guidelines." is checked but there is no release note.

Please add a release note following the release notes guidelines, even if it's only

== NO RELEASE NOTE ==

@swapsmagic
Copy link
Contributor

thanks for fixing this. It will help with the regression that we are seeing with the recent release in production in meta.

@elharo
Copy link
Contributor

elharo commented May 6, 2024

Was this an outright bug (valid query failed, incorrect results reported) or a performance problem? If the former I would expect a test that exposes the bug to be written first before a fix is attempted. Otherwise it's likely to regress again even if it's manually verified this time

@elharo elharo dismissed their stale review May 6, 2024 17:37

for now

@jainxrohit
Copy link
Contributor

@konjac-h can you please add in the PR description on what exactly was the issue?

@rschlussel
Copy link
Contributor

A few comments:

  1. it would be great to add more details to the issue description about what kind of issue we were seeing (e.g latency regression when fetching xxx from the metastore, failure when xxx, etc.), what the root cause of the problem was, and how this change addresses it.
  2. it's great that you provided queryids for Meta employees, but you should make sure to provide an example query or category of query for the general public outside of meta to be able to see what kind of queries were causing problems.
  3. If you are fixing a bug you should add a test. The fact that a fix was needed indicates that there was something not working properly in the code, and even if your fix is simple, adding a test ensures that we don't regress.

@rschlussel
Copy link
Contributor

Also, in addition to adding more information to the PR description, you should add more information to the commit message as well describing the problem that it's fixing more specifically.

@konjac-h
Copy link
Contributor Author

konjac-h commented May 7, 2024

@rschlussel Sounds good. thanks for comments/review. I will fill out the information you mentioned.

@rschlussel
Copy link
Contributor

rschlussel commented May 7, 2024

this is a really straightforward optimization improving the time complexity from O(2n) to O(n)

I'm surprised that iterating through a list 2x instead of 1x is enough to cause query failures. That seems like a very sensitive threshold, and I would expect in that case we would have already been seeing similar failures for something using 2x as many partitions. Also the error for the failing query is a thread interrupt when talking to the metastore, but it seems like the fix only touches the cache, and not even the method that was in the other stack trace (loadPartitionColumnStatistics). Are you sure this is the root cause of the issue?

@konjac-h
Copy link
Contributor Author

konjac-h commented May 7, 2024

@rschlussel These two PRs were originally suspected to be the problem so I optimized the only one that could do better. After this, we found another PR that actually contributes to the majority of the delay.

I decided to keep this PR because faster is better even though its impact is not that huge

@rschlussel
Copy link
Contributor

Got it, thanks for the explanation! So i would update the PR description and commit message, as this isn't really related to the metastore regression described above. Instead i would describe it as an opportunistic code improvement that removes an extra iteration through a list. You also shouldn't need to add a test then.

@konjac-h
Copy link
Contributor Author

konjac-h commented May 7, 2024

Sounds good. Thanks for guidance!

@konjac-h konjac-h changed the title Fix Metastore Regression Issue Optimize Metastore Code May 9, 2024
@konjac-h
Copy link
Contributor Author

konjac-h commented May 9, 2024

@rschlussel do you mind taking a look and stamp it when you get a chance? Thanks

@NikhilCollooru NikhilCollooru merged commit 87e201c into prestodb:master May 14, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants