v2.12.0
This is a release that bring in many changes!
-
Support for reading from and writing to abstract data interfaces.
Together with @huajiang-tubi we added the ability to read fromorg.apache.parquet.io.InputFile
and writing toorg.apache.parquet.io.OutputFile
. Additionally, @huajiang-tubi implementedInMemoryInputfile
andInMemoryOutputFile
for reading and writing Parquet files from/to bytes. All modules received the new functionality. New API functions are defined as alternatives to existing end-steps in builders withPath
. Please mind that the new API is still marked as experimental, that is, it might be a subject to change in the subsequent minor releases. -
Every module, including core, will try to read partitions. Prior 2.12.0, when using core module, one needed to explicitly use
ParquetReader.as[MyData].partitioned...
in order to scan a partitioned directory. It was designed so, in order to avoid I/O operations in low-level module, when one was sure that they are reading a single file. However, the underlying Parquet library still was doing an attempt to scan the directory. With this release, this behaviour changes. In order to supportInputFile
, we needed to replace existing Parquet abstraction with a more low-level code. This allowed us to enrich existing code and execute partition discovery with existing directory scanning. In effect, the experience regarding reading partitions is consistent across all modules!
Therefore,ParquetReader.as[MyData].partitioned
is now marked as deprecated and has no real effect. -
Scala 3 is upgraded to 3.3.0 LTS version.
-
Various minor dependency updates.
-
A more strict and consistent code linting (thanks to the update to Scala 3.3.0).
Introduced changes enable multiple new opportunities! Stay tuned, as quite soon there will be more new features soon!