Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding repeated properties to schema results in corrupt parquet file. #67

Closed
dylandepass opened this issue Jun 1, 2018 · 3 comments
Closed

Comments

@dylandepass
Copy link

Version 0.8.0

Having some issues with repeated. The resulting parquet file seems to have errors in it.

org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/PATHTOFILE/profile.parquet

Here is the code I'm testing with, it's the identities object that is causing the problem.


let schema = new parquet.ParquetSchema({
    person: {
        repeated: false,
        fields: {
            firstName: {
                type: 'UTF8'
            },
            lastName: {
                type: 'UTF8'
            }
        }
    },
    identities: {
        repeated: true,
        fields: {
            id: {
                type: 'UTF8'
            },
            xid: {
                type: 'UTF8'
            }
        }
    }
});

async function writeToParquet(schema) {
    // create new ParquetWriter that writes to 'fruits.parquet`
    var writer = await parquet.ParquetWriter.openFile(schema, 'profile.parquet');

    writer.appendRow({
        person: {
            firstName: "Test",
            lastName: "User"
        },
        identities: [{
            id: "ID",
            xid: "XID"
        },{
            id: "ID",
            xid: "XID"
        }]
    });

    await writer.close();
}

writeToParquet(schema);```
@ZJONSSON
Copy link
Contributor

ZJONSSON commented Jun 1, 2018

There is a bug in the RLE encoding that has probably been fixed here #57, but not merged yet. See parquet-mr tests (rebased to the fix) here #56

You can check out the PR branch by installing the last commit in the PR:

npm install zjonsson/parquetjs#07fb2fd8fc03bf2b57243531eaf91f2d60f5e460

@ZJONSSON
Copy link
Contributor

ZJONSSON commented Jun 1, 2018

See also #43

@dylandepass
Copy link
Author

Appreciate the help, that fixed my issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants