Skip to content

v2.16.0

Compare
Choose a tag to compare
@mjakubowski84 mjakubowski84 released this 07 Feb 21:45
· 18 commits to master since this release

This release introduces a feature that enables significant improvement in the performance of reading Parquet files. Parquet storage, like a data lake usually consists of a huge number of files. How can we speed up the reading of such a storage? Simply by reading multiple files in parallel at the same time!
Parquet4s by default reads a file by file - in a sequence. Now, by using Akka, Pekko or FS2, you can choose a parallelism level and read multiple files at the same time, while still controlling the utilization of resources. Simply use the option parallelism(n = ???) when defining your reader.

Besides that, there were numerous minor and bugfix dependency updates, e.g. in Pekko, Cats Effect, FS2 and Slf4j.

Big thanks to @calvinlfer for his contribution.