Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fruits.parquet generated by test/integration.js is unreadable by Hadoop parquet-tools 1.9.0 #29

Open
drauschenbach opened this issue Dec 9, 2017 · 5 comments

Comments

@drauschenbach
Copy link

drauschenbach commented Dec 9, 2017

Build parquet-mr/parquet-tools per these instructions.

Then run its cat command to dump the fruits.parquet file that is generated:

$ java -jar target/parquet-tools-1.9.0.jar cat parquetjs/fruits.parquet 

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/davidr/workspaces/parquet-mr/parquet-tools/target/parquet-tools-1.9.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/Users/davidr/workspaces/parquetjs/fruits.parquet; isDirectory=false; length=1411554; replication=1; blocksize=33554432; modification_time=1512831680000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}

Using parquetjs v0.8.0.

@eliezershindler
Copy link

I'm getting Error:TypeError: Cannot read property 'num_values' of null when trying to read 'fruits.parquet' using the read functionality of the module

@eliezershindler
Copy link

I can read it fine when using @drauschenbach tool above

@sfescape
Copy link

sfescape commented May 1, 2018

I get the num_values error when writing fields with null values. Not writing those fields when their value is null avoided the issue.

@ZJONSSON
Copy link
Contributor

ZJONSSON commented May 1, 2018

You might want to check out this PR here #56 which has some fixes to RLE encoding and does verification of the generated files with parquet-mr.

I think you should be able to install this branch simply by:

npm install zjonsson/parquetjs#0c7948d4fa64acf76e481256422c6f4a6ba56815

@ZJONSSON
Copy link
Contributor

ZJONSSON commented May 2, 2018

Also - if you want to avoid the headache of building and configuring parquet-tools you can simply add this to your .bashrc (or paste in console) and use docker to take care of everything.

parquet-tools() { docker run -w /home -v ${PWD}:/home nathanhowell/parquet-tools $@; }

You have to be in the same directory as the parquet file you want to inspect (since current directory will be mounted to the docker as /home). You can then use the tools directly on any parquet file, i.e.:

parquet-tools dump fruits.parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants