Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to tell Presto to skip invalid parquet files in query ? #14652

Open
varunbpatil opened this issue Jun 15, 2020 · 1 comment
Open

Comments

@varunbpatil
Copy link

I'm using Presto 0.236 with Hive connector.

presto:default> SELECT Time FROM mytable WHERE date_ >= 20200613 ORDER BY Time;

Query 20200615_151944_00029_cyskg, FAILED, 1 node
Splits: 118 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200615_151944_00029_cyskg failed: hdfs://x.x.x.x:8020/abc/test.parquet is not a valid Parquet File

The error is correct in that the above parquet file - test.parquet - is invalid.

But, I want Presto to be able to skip such files in the query. Is this possible?

I tried an older version 0.181 and it does skip invalid parquet files, but I want some features in the newer Presto version and was wondering if there is a flag for this.

@Ravion
Copy link
Contributor

Ravion commented Jun 15, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants