Writing Dates and Timestamps #188

nevi-me · 2018-11-08T18:06:57Z

I'm continuing with my adventures of writing csv to parquet, but I got stuck with how to write times/dates to parquet.
Specifically, how do I declare the schema (assuming I'm using the text format message schema {})?

I read up on the logical types and their mapping to/from data types, so I tried using i64 for my schema, but I think I'm missing something because I don't know how to map the type to a TIMESTAMP.

I also tried Google, to try look for the format of the schema, but with no luck (for timestamps). Is there some place that documents this?

The text was updated successfully, but these errors were encountered:

sadikovi · 2018-11-09T08:03:58Z

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types

sadikovi · 2018-11-09T08:05:20Z

I would use TIMESTAMP_MILLIS now, which is just INT64 with corresponding logical type, probably the easiest to write.

nevi-me · 2018-11-09T13:24:55Z

Thanks @sadikovi, I was confused by the UTC stuff on the timestamp logical type.

Writing a timestamp now works with message schema {REQUIRED INT64 MyField (TIMESTAMP_MILLIS)}, but I'm unable to read the parquet file back in Pandas or PySpark.

PySpark:

spark.read.parquet("file1.parquet").printSchema()
// this correctly shows the schema as below, but .show() throws an error

// printing schema
root
 |-- Id: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Indicator: boolean (nullable = true)
 |-- Timestamp: timestamp (nullable = true)

# trying to show records

Py4JJavaError: An error occurred while calling o62.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 16, localhost, executor driver): org.apache.parquet.io.ParquetDecodingException: Dictionary encoding not supported for type: BOOLEAN

Pandas:

pd.read_parquet("file1.parquet")

ArrowIOError: Not yet implemented: Dictionary encoding is not implemented for boolean values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing Dates and Timestamps #188

Writing Dates and Timestamps #188

nevi-me commented Nov 8, 2018

sadikovi commented Nov 9, 2018

sadikovi commented Nov 9, 2018

nevi-me commented Nov 9, 2018

Writing Dates and Timestamps #188

Writing Dates and Timestamps #188

Comments

nevi-me commented Nov 8, 2018

sadikovi commented Nov 9, 2018

sadikovi commented Nov 9, 2018

nevi-me commented Nov 9, 2018