Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Pass optional readers to override and cache metadata objects #249

Merged
merged 5 commits into from
Jun 30, 2022

Conversation

annanay25
Copy link
Contributor

There are two ways to design metadata caching -

  • Override the column chunk reader: We pass in the cached reader as the “main” reader with an override to use a raw backend reader for column chunk data. (PR Add support for caching parquet metadata #180)
  • Override the metadata reader: We pass in the backend reader as the “main” reader with an override to use a cached reader for metadata. (This PR)

Option 2 struct me (a lot) later than it should have, and this design works much better for us.

Signed-off-by: Annanay Agarwal annanay.agarwal@grafana.com

config.go Outdated Show resolved Hide resolved
config.go Outdated Show resolved Hide resolved
file.go Outdated Show resolved Hide resolved
@annanay25
Copy link
Contributor Author

Just checking with you if the overall approach taken in this PR make sense?

Signed-off-by: Annanay Agarwal <annanay.agarwal@grafana.com>
Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, I left a few suggestions, let me know what you think!

file.go Outdated Show resolved Hide resolved
file.go Show resolved Hide resolved
bloom/filter.go Outdated Show resolved Hide resolved
bloom/filter.go Outdated Show resolved Hide resolved
annanay25 and others added 2 commits June 29, 2022 20:27
Signed-off-by: Annanay Agarwal <annanay.agarwal@grafana.com>
file.go Show resolved Hide resolved
Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship this!

@achille-roussel achille-roussel merged commit 7691e3e into segmentio:main Jun 30, 2022
@annanay25 annanay25 deleted the cache-metadata branch June 30, 2022 17:54
@achille-roussel achille-roussel added the enhancement Improve a feature that already exists label Jul 11, 2022
@fpetkovski
Copy link

Can this functionality be used to pre-fetch metadata in one shot instead of doing it with smaller reads?

I am trying to optimize opening a file from object storage, and now each page index leads to a single 4KB read.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Improve a feature that already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants