Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop for Impala-generated file #37

Closed
spaztic1215 opened this issue Sep 23, 2016 · 1 comment
Closed

Infinite loop for Impala-generated file #37

spaztic1215 opened this issue Sep 23, 2016 · 1 comment
Assignees

Comments

@spaztic1215
Copy link

Hi there,

Was wondering what condition would cause an infinite loop in this while-loop block: https://github.com/jcrobak/parquet-python/blob/master/parquet/__init__.py#L354-L360

Using the following file which we generated from Impala: https://www.dropbox.com/s/kah986gqjt7mrnr/movies.0.parquet at some point where it reads Bytes 65278 -> 112466 it gets stuck in an endless loop b/c the values stop updating. However, we've been able to read smaller Impala-generated files, so not sure if this is a limitation with file size (the file is 100MB+ but there are only 5 columns of data).

Any insight would be hugely appreciated, thanks!

Jenny

@jcrobak
Copy link
Owner

jcrobak commented Sep 25, 2016

Hi Jenny—thanks for the report. The problem seems to be that I haven't implemented support for null values (via definition_levels) for the encoding used by the rating column on that file.

I should have a fix shortly—I'd like to add some regression tests to ensure this bug doesn't pop up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants