Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional corruption of index #67

Open
eyala opened this issue Nov 16, 2022 · 5 comments
Open

Occasional corruption of index #67

eyala opened this issue Nov 16, 2022 · 5 comments
Assignees

Comments

@eyala
Copy link
Collaborator

eyala commented Nov 16, 2022

Sometimes, when indexing an additional partition on an existing index over an Avro table, some index files become corrupt.

Reading the index produces the following error:

org.apache.spark.SparkException: Job aborted due to stage failure:
Aborting TaskSet 10.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 0.0 in stage 10.0 (TID 2167, lvshdc5dn2191.lvs.paypalinc.com, executor 177): org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
	at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:149)
	at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214)

This happens with Dione 0.5.x.

@eyala eyala self-assigned this Nov 16, 2022
@uzadude
Copy link
Collaborator

uzadude commented Nov 27, 2022

@eyala - any update here?

@eyala
Copy link
Collaborator Author

eyala commented Dec 25, 2022

I can verify that this fix works for the sample data I have. I have given a snapshot version for the PayPal team with this bug, and if it works for them, too, we can make the sync interval a param and include it in all three of our Spark flavors.

@shay1bz
Copy link
Collaborator

shay1bz commented Dec 25, 2022

Just a small note on that - the sync interval of the b-tree file is an internal implementation detail.. If it is exposed externally, it should be clear that this is just a workaround, and not some meaningful user-facing configuration that the user needs to consider.

@uzadude
Copy link
Collaborator

uzadude commented May 20, 2023

@eyala - can we close this issue?

@abhishekjain293
Copy link

Hi @shay1bz @eyala

Thank you for the support and taking this up.
We have conducted thorough manual testing of various scenarios and it appears that the issue has been successfully fixed.
However, I would like to highlight that the fix has not yet been rolled out to the production environment. And will inform here once the production rollout happens and it becomes stable.

Thanks
Abhishek Jain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants