Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timely invalidation of parquet metadata cache #17500

Merged
merged 1 commit into from Apr 6, 2022

Conversation

hackeryang
Copy link
Contributor

@hackeryang hackeryang commented Mar 22, 2022

Test plan

  • Use existing unit tests, and add some test cases about invalidation of the cache.

The reason of this pr is detailed in this issue: #17472

Thank you.

== RELEASE NOTES ==

General Changes
* If hive files are changed, the modification time can be used to invalidate parquet metadata cache immediately.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 22, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • login: hackeryang / name: Yang Yicheng (f429df5826918ff19c71551f753a3b65a7b2fe57)
  • login: lliiun-z / name: Lin (4e03c03e36c39620cb67ac1e78edef16b455991c, 0bcc616eed8d45a531b0d67ad8adf7efe6ee4969)

@hackeryang
Copy link
Contributor Author

hackeryang commented Mar 23, 2022

Hi @shixuan-fan and @highker , can you please help review when you have time?

I'm sorry to include the other two commits from the latest master branch that are not mine, because some test cases failed incorrectly which can pass on my mac. When i pull the latest master branch into my local branch, the tests are all passed, a little strange ha ha.

@highker highker requested review from zhenxiao and beinan March 23, 2022 05:12
@zhenxiao
Copy link
Collaborator

thank you, @hackeryang
the commit: f429df5 looks good. Could you please elaborate why the other 2 commits are needed? I think the 3rd commit, add modification time could independently solve the problem. Or did I miss anything?

@hackeryang
Copy link
Contributor Author

thank you, @hackeryang the commit: f429df5 looks good. Could you please elaborate why the other 2 commits are needed? I think the 3rd commit, add modification time could independently solve the problem. Or did I miss anything?

Thank you for your review, @zhenxiao . Sorry the 3 commits confused you, yes the real commit is f429df5.
But the first time i commit that, some test case in github failed:
wecom-temp-7ae0d7e2f49857f1ccab889b4e727191
and some .pom file can't be fetched:
wecom-temp-39d46c12f3fa43692963a71af1c2816c
However, the test cases are successfully passed on my computer. For example on my mac:
wecom-temp-66bd57056e144a5e46b65124642d1144
and the .pom file exists in central maven repository:
wecom-temp-ecbc3bc9399099decf9de42b7d61a804
So i thought maybe i met a bug in github or something, i had to try to merge into the latest master branch, in order to invoke the github test cases again. Then, all the test cases passed successfully, that error phenomenon the first time confused me.

The other two commits is not mine, they are in the latest master branch. I just need the contents in the latest master branch to pass the tests.

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hackeryang ! Looks like some irrelevant commits are involved in this PR, could you help to clean up first? Thanks!

@beinan
Copy link
Member

beinan commented Mar 24, 2022

thank you, @hackeryang the commit: f429df5 looks good. Could you please elaborate why the other 2 commits are needed? I think the 3rd commit, add modification time could independently solve the problem. Or did I miss anything?

Thank you for your review, @zhenxiao . Sorry the 3 commits confused you, yes the real commit is f429df5. But the first time i commit that, some test case in github failed: wecom-temp-7ae0d7e2f49857f1ccab889b4e727191 and some .pom file can't be fetched: wecom-temp-39d46c12f3fa43692963a71af1c2816c However, the test cases are successfully passed on my computer. For example on my mac: wecom-temp-66bd57056e144a5e46b65124642d1144 and the .pom file exists in central maven repository: wecom-temp-ecbc3bc9399099decf9de42b7d61a804 So i thought maybe i met a bug in github or something, i had to try to merge into the latest master branch, in order to invoke the github test cases again. Then, all the test cases passed successfully, that error phenomenon the first time confused me.

The other two commits is not mine, they are in the latest master branch. I just need the contents in the latest master branch to pass the tests.

if the other two commits have been merged to master, could you try rebase master and push again? thx!

@hackeryang
Copy link
Contributor Author

Thank you @hackeryang ! Looks like some irrelevant commits are involved in this PR, could you help to clean up first? Thanks!

OK, i will try to rebase and clean the commits, thank you~

@hackeryang hackeryang force-pushed the invalidate_cache branch 2 times, most recently from 281b20e to d872c4b Compare March 24, 2022 07:08
@hackeryang hackeryang requested a review from beinan March 24, 2022 08:19
@hackeryang
Copy link
Contributor Author

hackeryang commented Mar 25, 2022

Thank you @hackeryang ! Looks like some irrelevant commits are involved in this PR, could you help to clean up first? Thanks!

OK, i will try to rebase and clean the commits, thank you~

Hi, @zhenxiao and @beinan , all of your advices above have been in the codes, please help review again when you have time~Thank you

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hackeryang , the PR looks good to me, do you mind to add a unit test to cover this feature? thanks a lot!

ParquetFileMetadata fileMetadataCache = cache.get(
parquetDataSource.getId(),
() -> delegate.getParquetMetadata(parquetDataSource, fileSize, cacheable, modificationTime));
if (fileMetadataCache.getModificationTime() == modificationTime) {
return fileMetadataCache;
}
else {
cache.invalidate(parquetDataSource.getId());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more specific of my previous comments, I think we might need add a couple of junit test cases to cover both valid and invalid cache cases. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more specific of my previous comments, I think we might need add a couple of junit test cases to cover both valid and invalid cache cases. what do you think?

Thanks for the advice, i have added unit tests about that.

@hackeryang hackeryang force-pushed the invalidate_cache branch 2 times, most recently from b16ee42 to e94a63e Compare March 29, 2022 06:29
@hackeryang hackeryang requested a review from beinan March 29, 2022 07:55
@hackeryang
Copy link
Contributor Author

hackeryang commented Mar 29, 2022

Hello @hackeryang , the PR looks good to me, do you mind to add a unit test to cover this feature? thanks a lot!

Hi, @zhenxiao and @beinan , i've added some tests in testCaching() of AbstractTestParquetReader, please review again, thank you~
In my debug of unit tests, the cache is successfully invalidated:
wecom-temp-dd8d5505a41b355ab0e081d690ac038e

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thx for the contribution!

@beinan beinan merged commit 83aa60c into prestodb:master Apr 6, 2022
41 checks passed
@hackeryang
Copy link
Contributor Author

lgtm, thx for the contribution!

Thank you! It's very excited to contribute for our project!

@hackeryang hackeryang deleted the invalidate_cache branch April 6, 2022 05:58
@mshang816 mshang816 mentioned this pull request May 17, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants