Skip to content

v2.12.0

Compare
Choose a tag to compare
@mjakubowski84 mjakubowski84 released this 07 Aug 17:58
· 60 commits to master since this release

This is a release that bring in many changes!

  1. Support for reading from and writing to abstract data interfaces.
    Together with @huajiang-tubi we added the ability to read from org.apache.parquet.io.InputFile and writing to org.apache.parquet.io.OutputFile. Additionally, @huajiang-tubi implemented InMemoryInputfile and InMemoryOutputFile for reading and writing Parquet files from/to bytes. All modules received the new functionality. New API functions are defined as alternatives to existing end-steps in builders with Path. Please mind that the new API is still marked as experimental, that is, it might be a subject to change in the subsequent minor releases.

  2. Every module, including core, will try to read partitions. Prior 2.12.0, when using core module, one needed to explicitly use ParquetReader.as[MyData].partitioned... in order to scan a partitioned directory. It was designed so, in order to avoid I/O operations in low-level module, when one was sure that they are reading a single file. However, the underlying Parquet library still was doing an attempt to scan the directory. With this release, this behaviour changes. In order to support InputFile, we needed to replace existing Parquet abstraction with a more low-level code. This allowed us to enrich existing code and execute partition discovery with existing directory scanning. In effect, the experience regarding reading partitions is consistent across all modules!
    Therefore, ParquetReader.as[MyData].partitioned is now marked as deprecated and has no real effect.

  3. Scala 3 is upgraded to 3.3.0 LTS version.

  4. Various minor dependency updates.

  5. A more strict and consistent code linting (thanks to the update to Scala 3.3.0).

Introduced changes enable multiple new opportunities! Stay tuned, as quite soon there will be more new features soon!