Cannot load a PLINK file containing 20 million variants #5564

danking · 2019-03-08T17:41:19Z

I believe we are encountering this known Kryo limitation: EsotericSoftware/kryo#497, EsotericSoftware/kryo#382 (also see related GATK issue: broadinstitute/gatk#1524)

Danfeng saw the referenced stack trace when trying to broadcast the variants for Plink (see: LoadPlink.scala:202. She was running a import_plink, count.

The details in EsotericSoftware/kryo#382 indicate that a bad interaction between the data and a hash function can cause this integer map to exceed its size limitations at a load factor of 5%. Even a 20x increase in footprint puts us at 400 million. Each element of that array has 6 entries, so we're at 1.2 billion. That definitely feels like the danger zone. Maybe there's more variants than Danfeng expects, maybe there's more overhead than we've accounted for.

The GATK folks have been chasing down the fix. Kryo released 4.0.0 which should fix this issue. Spark upgraded to Kryo 4.0.0 on September 8th of 2018. (resolving Spark-20389). This change made it to 2.4.0, but it was not back ported to other versions of Spark.

GATK references a temporary fix via JVM options, which apparently forces the JVM to use an alternative hash function with better behavior in this specific case:

spark.executor.extraJavaOptions -XX:hashCode=0
spark.driver.extraJavaOptions -XX:hashCode=0

A generally interesting blog post on Java's hashCode, which I haven't fully read, claims that the JVM previously defaulted to a PRNG draw for an object's hash code. In JDK 8 it uses some function of the current thread state. It appears this old strategy is preserved as JVM hashCode parameter value 0 and is less likely to trigger the bad behavior in Kryo. This -XX:hashCode option is undocumented 1, 2 🤷‍♀️.

Another suggested Kryo option is to disable reference tracking. This would cause duplicate objects in the object graph to be serialized twice:

Kryo kryo = new Kryo();
kryo.setReferences(false);

The text was updated successfully, but these errors were encountered:

danking · 2019-03-08T17:57:48Z

I added a discuss post for our users https://discuss.hail.is/t/i-get-a-negativearraysizeexception-when-loading-a-plink-file/899

danking · 2019-03-08T19:28:53Z

Confirmed that the hashCode solution works for Danfeng.

danking · 2019-03-08T19:39:23Z

Closing with recommended solution hashCode=0, long term plan: eliminate Spark.

danking self-assigned this Mar 8, 2019

danking closed this as completed Mar 8, 2019

pkamenarsky mentioned this issue Jul 17, 2020

otp-1.4.0-shaded.jar does not save (write) big graph to file after building it opentripplanner/OpenTripPlanner#3128

Closed

danking mentioned this issue Jan 22, 2024

NegativeArraySizeException when converting plink bed to hailmatrix on large chromosome #14168

Open

This was referenced Apr 29, 2024

I get a NegativeArraySizeException when loading a PLINK file iris-garden/test-process#1678

Closed

I get a NegativeArraySizeException when loading a PLINK file iris-garden/test-process#2264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load a PLINK file containing 20 million variants #5564

Cannot load a PLINK file containing 20 million variants #5564

danking commented Mar 8, 2019

danking commented Mar 8, 2019

danking commented Mar 8, 2019

danking commented Mar 8, 2019

Cannot load a PLINK file containing 20 million variants #5564

Cannot load a PLINK file containing 20 million variants #5564

Comments

danking commented Mar 8, 2019

danking commented Mar 8, 2019

danking commented Mar 8, 2019

danking commented Mar 8, 2019