Skip to content

Commit

Permalink
MINOR: [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length
Browse files Browse the repository at this point in the history
In `ParquetFilePrinter`, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

After:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```
  • Loading branch information
pitrou committed Feb 19, 2024
1 parent b224c58 commit ca359ba
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 1 deletion.
2 changes: 1 addition & 1 deletion cpp/src/parquet/printer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ void ParquetFilePrinter::DebugPrint(std::ostream& stream, std::list<int> selecte
for (auto i : selected_columns) {
const ColumnDescriptor* descr = file_metadata->schema()->Column(i);
stream << "Column " << i << ": " << descr->path()->ToDotString() << " ("
<< TypeToString(descr->physical_type());
<< TypeToString(descr->physical_type(), descr->type_length());
const auto& logical_type = descr->logical_type();
if (!logical_type->is_none()) {
stream << " / " << logical_type->ToString();
Expand Down
10 changes: 10 additions & 0 deletions cpp/src/parquet/types.cc
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,16 @@ std::string TypeToString(Type::type t) {
}
}

std::string TypeToString(Type::type t, int type_length) {
auto s = TypeToString(t);
if (t == Type::FIXED_LEN_BYTE_ARRAY) {
s += '(';
s += std::to_string(type_length);
s += ')';
}
return s;
}

std::string ConvertedTypeToString(ConvertedType::type t) {
switch (t) {
case ConvertedType::NONE:
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/parquet/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -796,6 +796,8 @@ PARQUET_EXPORT std::string ConvertedTypeToString(ConvertedType::type t);

PARQUET_EXPORT std::string TypeToString(Type::type t);

PARQUET_EXPORT std::string TypeToString(Type::type t, int type_length);

PARQUET_EXPORT std::string FormatStatValue(Type::type parquet_type,
::std::string_view val);

Expand Down

0 comments on commit ca359ba

Please sign in to comment.