Occasional corruption of index #67

eyala · 2022-11-16T12:27:56Z

Sometimes, when indexing an additional partition on an existing index over an Avro table, some index files become corrupt.

Reading the index produces the following error:

org.apache.spark.SparkException: Job aborted due to stage failure:
Aborting TaskSet 10.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 0.0 in stage 10.0 (TID 2167, lvshdc5dn2191.lvs.paypalinc.com, executor 177): org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
	at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:149)
	at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214)

This happens with Dione 0.5.x.

The text was updated successfully, but these errors were encountered:

uzadude · 2022-11-27T13:45:03Z

@eyala - any update here?

eyala · 2022-12-25T13:09:16Z

I can verify that this fix works for the sample data I have. I have given a snapshot version for the PayPal team with this bug, and if it works for them, too, we can make the sync interval a param and include it in all three of our Spark flavors.

shay1bz · 2022-12-25T14:21:00Z

Just a small note on that - the sync interval of the b-tree file is an internal implementation detail.. If it is exposed externally, it should be clear that this is just a workaround, and not some meaningful user-facing configuration that the user needs to consider.

uzadude · 2023-05-20T15:01:44Z

@eyala - can we close this issue?

abhishekjain293 · 2023-05-31T08:35:49Z

Hi @shay1bz @eyala

Thank you for the support and taking this up.
We have conducted thorough manual testing of various scenarios and it appears that the issue has been successfully fixed.
However, I would like to highlight that the fix has not yet been rolled out to the production environment. And will inform here once the production rollout happens and it becomes stable.

Thanks
Abhishek Jain

eyala self-assigned this Nov 16, 2022

uzadude mentioned this issue Dec 21, 2022

Reproducing strange bug #68

Open

This was referenced Dec 24, 2022

Fix sync interval #69

Merged

Adding cache mechanism for iteration #70

Closed

eyala mentioned this issue Mar 16, 2023

Sync interval fix for 0.5.x branch #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional corruption of index #67

Occasional corruption of index #67

eyala commented Nov 16, 2022

uzadude commented Nov 27, 2022

eyala commented Dec 25, 2022

shay1bz commented Dec 25, 2022

uzadude commented May 20, 2023

abhishekjain293 commented May 31, 2023

Occasional corruption of index #67

Occasional corruption of index #67

Comments

eyala commented Nov 16, 2022

uzadude commented Nov 27, 2022

eyala commented Dec 25, 2022

shay1bz commented Dec 25, 2022

uzadude commented May 20, 2023

abhishekjain293 commented May 31, 2023