-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Metastore Code #22671
Optimize Metastore Code #22671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test
@elharo Hey thanks for reviewing. I totally agree that we should have a regression test. I filed a separate task, T188102626, to track it but I don't think the test will be the blocker of this PR given that this is a really straightforward optimization improving the time complexity from O(2n) to O(n) |
The checkbox for "If release notes are required, they follow the release notes guidelines." is checked but there is no release note. Please add a release note following the release notes guidelines, even if it's only
|
thanks for fixing this. It will help with the regression that we are seeing with the recent release in production in meta. |
Was this an outright bug (valid query failed, incorrect results reported) or a performance problem? If the former I would expect a test that exposes the bug to be written first before a fix is attempted. Otherwise it's likely to regress again even if it's manually verified this time |
@konjac-h can you please add in the PR description on what exactly was the issue? |
A few comments:
|
Also, in addition to adding more information to the PR description, you should add more information to the commit message as well describing the problem that it's fixing more specifically. |
@rschlussel Sounds good. thanks for comments/review. I will fill out the information you mentioned. |
I'm surprised that iterating through a list 2x instead of 1x is enough to cause query failures. That seems like a very sensitive threshold, and I would expect in that case we would have already been seeing similar failures for something using 2x as many partitions. Also the error for the failing query is a thread interrupt when talking to the metastore, but it seems like the fix only touches the cache, and not even the method that was in the other stack trace (loadPartitionColumnStatistics). Are you sure this is the root cause of the issue? |
@rschlussel These two PRs were originally suspected to be the problem so I optimized the only one that could do better. After this, we found another PR that actually contributes to the majority of the delay. I decided to keep this PR because faster is better even though its impact is not that huge |
Got it, thanks for the explanation! So i would update the PR description and commit message, as this isn't really related to the metastore regression described above. Instead i would describe it as an opportunistic code improvement that removes an extra iteration through a list. You also shouldn't need to add a test then. |
Sounds good. Thanks for guidance! |
d68659e
to
c631318
Compare
@rschlussel do you mind taking a look and stamp it when you get a chance? Thanks |
Description
Optimize the Code by reducing unnecessary for loop. Reducing time complexity from O(2n) to O(n)
Motivation and Context
See Description Above
Impact
Improve time complexity of Metastore
Test Plan
Simple Code Optimization.
Ran Verifier test to ensure nothing breaks
Contributor checklist
== NO RELEASE NOTE ==