Add parquet metadata subcommand and refactor parquet_tools#776
Merged
thinkingfish merged 11 commits intomainfrom Apr 12, 2026
Merged
Add parquet metadata subcommand and refactor parquet_tools#776thinkingfish merged 11 commits intomainfrom
thinkingfish merged 11 commits intomainfrom
Conversation
Add metadata, compare, and compare-schema subcommands to the parquet tool, porting core functionality from the parquet-arrow repository. Extract existing annotate logic into its own module for cleaner organization. New subcommands: - parquet metadata: display file-level and column-level metadata (--schema flag shows only column metadata) - parquet compare: compare data values between two parquet files - parquet compare-schema: compare schemas between two parquet files
The --geometry flag shows table geometry: logical shape (columns x rows) and per-row-group detail (row count and byte size). Without flags, all sections are shown. --schema and --geometry each filter to their respective section.
Shows only file-level key-value metadata, skipping geometry and column-level schema output.
- --json: output in pretty-printed JSON for programmatic consumption.
File-level values that are valid JSON are nested as objects rather
than escaped strings.
- --field=KEY: extract and print a single file-level metadata key's
raw value. Combined with --json, pretty-prints JSON values.
- Human-readable schema output now uses an aligned table with verbose
field values collapsed to {...}.
Values like systeminfo are deeply nested JSON and not useful as raw strings. Now --field always attempts JSON pretty-printing regardless of --json flag.
Row group details are now shown in an aligned table with a total row. Byte sizes use human-readable units (KiB/MiB/GiB).
Schema now uses the same ASCII table style as geometry. metric_type is
lifted into its own column for easy scanning. Remaining metadata values
are truncated at 60 characters, collapsing to {...} when exceeded.
Schema comparison is achievable by diffing the output of 'parquet metadata --schema' or 'parquet metadata --schema --json' for two files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parquet_toolsfrom a singlemod.rsinto a modular structure, extracting the existingannotatelogic intoannotate.rsread_parquet_footer()helper for reading parquet metadata and schemaparquet metadatasubcommand for inspecting parquet filesparquet metadatasubcommandDisplays file-level metadata, table geometry, and column schema for a parquet file.
Flags (filters — default shows all sections):
--file— file-level key-value metadata only--geometry— logical table shape and per-row-group detail (ASCII table with human-readable byte sizes)--schema— column-level metadata as a pipe-delimited table withmetric_typeas its own column--json— output in JSON format for programmatic use (combinable with the above)--field=KEY— extract and print a single file-level metadata key (auto-pretty-prints JSON values)Example outputs:
Files changed
src/parquet_tools/mod.rs— slimmed to command definitions, dispatch, and sharedread_parquet_footer()helpersrc/parquet_tools/annotate.rs— extracted from mod.rs, unchanged behaviorsrc/parquet_tools/metadata.rs— new subcommand