Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added example to read parquet in parallel with rayon #658

Merged
merged 1 commit into from
Dec 5, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Dec 4, 2021

This shaves a lot of time when reading many columns. The tradeoff, as usual, is that it loads all pages of a given row group into memory, to be parallelized via par_iter.

There is a middle ground here were we can use Rayon's par_bridge, but at that point we are mixing IO and CPU-bounded tasks, which for network IO / s3 makes things more difficult.

@jorgecarleitao jorgecarleitao added the documentation Improvements or additions to documentation label Dec 4, 2021
@jorgecarleitao
Copy link
Owner Author

cc @ritchie46 , I think this is how polars can do it. ^_^

@ritchie46
Copy link
Collaborator

Awesome! Thanks for this! :)

@codecov
Copy link

codecov bot commented Dec 4, 2021

Codecov Report

Merging #658 (76c96ed) into main (299818a) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #658   +/-   ##
=======================================
  Coverage   69.55%   69.56%           
=======================================
  Files         299      299           
  Lines       16735    16735           
=======================================
+ Hits        11640    11641    +1     
+ Misses       5095     5094    -1     
Impacted Files Coverage Δ
src/bitmap/utils/slice_iterator.rs 92.53% <0.00%> (+1.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 299818a...76c96ed. Read the comment docs.

@jorgecarleitao jorgecarleitao merged commit de87058 into main Dec 5, 2021
@jorgecarleitao jorgecarleitao deleted the parquet_rayon branch December 5, 2021 12:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants