Skip to content

Latest commit

 

History

History
124 lines (101 loc) · 5.71 KB

CHANGELOG.md

File metadata and controls

124 lines (101 loc) · 5.71 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[v0.12.0] - 2022-08-18

  • Added support for type string and []string in bytearray store: #93
  • Added memory allocation tracking and max memory option to protect against adversarial input: #87
  • Fixed CHANGELOG for v0.11.0.

v0.11.0 - 2022-04-21

  • Cleaned up API by removing Schema* interfaces, deprecating function using dotted notation, and introducing ColumnPath type instead.
  • Improved error handling around row groups.
  • Improved write performance.
  • Fixed thrift crash bugs.

v0.10.0 - 2022-02-18

  • Updated to parquet-format 2.9.0.
  • Added support for page data statistics.
  • Improved test coverage.
  • Fixed issue with dictionary encoding where bit width was determined incorrectly.
  • Added support for writing per-page CRC32 checksums.
  • Added support for optional CRC32 checksum validation on read.
  • Fixed issue with float32/64 dictionary encoding where trying to encode NaN failed with an error.

v0.9.0 - 2022-02-10

  • Implemented lazy loading of pages.
  • Added support for writing multiple data pages per row group.

v0.8.0 - 2022-02-02

  • Set correct total size and total compressed in row group data.
  • Fixed delta bit-pack encoding of int32 and int64 and delta-length bit-packing encoding of byte_arrays when only a single value is written.
  • Fixed build issue in the floor package that somehow made it into the previous release.

v0.7.0 - 2022-02-01

  • Relax schema parser requirement that field names begin with a letter or underscore.
  • Remove unsigned support from all integer encodings.
  • Introduced explicit Clone method for parquetschema.SchemaDefinition.
  • Corrected parsing of DECIMAL as ConvertedType if scale and precision aren't available.
  • Fixed nested optional field reads.

v0.6.1 - 2021-11-19

  • perf - cache result of GetSchemaDefinition call
  • updated readme, added special mentions section

v0.6.0 - 2021-11-3

  • Added a schema generator which uses reflection to automatically generate a parquet schema from a go object.
  • Upgraded CI golang to 1.17.2
  • Added test to reproduce issue 41: #41

v0.5.0 - 2021-10-29

  • Upgraded thrift to v0.15.0
  • Fixed reading & writing of INT96 time

v0.4.0 - 2021-10-06

  • Fixed issues where fields in structs that were not defined in the schema raised errors (array, slice, time, map)
  • Added support for reflect encoding/decoding of additional types to/from parquet types
    • Go type <-> Parquet type
    • int <-> int64
    • int32 <-> int64
    • int16 <-> int64
    • int8 <-> int64
    • uint <-> int64
    • uint16 <-> int64
    • uint8 <-> int64
    • int64 <-> int32
    • uint64 <-> int32
    • uint32 <-> int32
  • Removed some inconsistent api behaviors
    • reflect marshaling/unmarshalling now ignores fields not defined in the schema (this already happens when the marshaller is used by floor.Writer and goparquet.FileReader)
    • int32PlainDecoder and int32PlainEncoder now no longer support uint values
  • Added function IsAfterUnixEpoch to allow test whether a timestamp can be written as Julian date.
  • Allowed setting selected columns after opening a parquet file through SetSelectedColumns method.
  • Fixed dictStore to correctly reset valueSize which should reduce size of written files.
  • Fixed reflection-based marshal/unmarshaling for some basic types.
  • Fixed DECIMALs computation of maximum amount of digits.
  • Improved parsing of legacy timestamps.
  • Allowed reading of file metadata before properly opening a parquet file.
  • Added method SeekToRowGroup to allow seeking to specific row groups.

v0.3.0 - 2020-12-15

  • Added examples how to use the low-level and high-level APIs.
  • Added backwards compability in floor reader and writer for lists as produced by Athena.
  • Fixed many crash issues found during fuzzing of the parquet reader.

v0.2.1 - 2020-11-04

  • Release to correct missing changelog.

v0.2.0 - 2020-11-04

  • Added csv2parquet command to convert CSV files into parquet files
  • Fixed issue in parquet-tool cat that wouldn't print any records if no limit was provided
  • Added support for backward compatible MAPs and LISTs in schema validation
  • Added ValidateStrict method to validate without regard for backwards compatibility

v0.1.1 - 2020-05-26

  • Added high-level interface to access file and column metadata

v0.1.0 - 2020-04-24

  • Initial release