Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add show meta #39

Open
gingerwizard opened this issue Apr 14, 2023 · 1 comment
Open

Add show meta #39

gingerwizard opened this issue Apr 14, 2023 · 1 comment

Comments

@gingerwizard
Copy link

Showing row groups, the compression used, metadata and encoding.

e.g.

docker run -v $PWD/houseprices:/data markhneedham/pq meta /data/house_prices.parquet

File path:  /data/house_prices.parquet
Created by: parquet-cpp version 1.5.1-SNAPSHOT
Properties: (none)
Schema:
message schema {
  required int32 price (INTEGER(32,false));
  required int32 date (INTEGER(16,false));
  required binary postcode1;
  required binary postcode2;
  required int32 type (INTEGER(8,true));
  required int32 is_new (INTEGER(8,false));
  required int32 duration (INTEGER(8,true));
  required binary addr1;
  required binary addr2;
  required binary street;
  required binary locality;
  required binary town;
  required binary district;
  required binary county;
}


Row group 0:  count: 1000000  6.32 B records  start: 4  total(compressed): 6.026 MB total(uncompressed):9.089 MB
--------------------------------------------------------------------------------
           type      encodings count     avg size   nulls   min / max
price      INT32     Z _ R     1000000   1.72 B     0       "100" / "523000000"
date       INT32     Z _ R     1000000   1.77 B     0       "9131" / "19405"
postcode1  BINARY    Z _ R     1000000   0.00 B     0       "0x" / "0x42413131"
postcode2  BINARY    Z _ R     1000000   0.16 B     0       "0x" / "0x39595A"
type       INT32     Z _ R     1000000   0.19 B     0       "0" / "4"
is_new     INT32     Z _ R     1000000   0.04 B     0       "0" / "1"
duration   INT32     Z _ R     1000000   0.07 B     0       "0" / "2"
addr1      BINARY    Z _ R     1000000   1.20 B     0       "0x" / "0x5A5954454B20484F555345"
addr2      BINARY    Z _ R     1000000   0.20 B     0       "0x" / "0x5A4F4E452043"
street     BINARY    Z _ R     1000000   0.44 B     0       "0x" / "0x5A494F4E5320434C4F5345"
locality   BINARY    Z _ R     1000000   0.35 B     0       "0x" / "0x5A45414C53"
town       BINARY    Z _ R     1000000   0.08 B     0       "0x4142424F5453204C414E474..." / "0x595354524144204D4555524947"
district   BINARY    Z _ R     1000000   0.06 B     0       "0x41445552" / "0x594F524B"
county     BINARY    Z _ R     1000000   0.03 B     0       "0x41564F4E" / "0x594F524B"

@gingerwizard gingerwizard changed the title Add row group option Add show meta Apr 14, 2023
@gingerwizard
Copy link
Author

parquet-tools inspect house_prices.parquet --detail
gives this information but not per row group and not succinctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant