You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Danfeng saw the referenced stack trace when trying to broadcast the variants for Plink (see: LoadPlink.scala:202. She was running a import_plink, count.
The details in EsotericSoftware/kryo#382 indicate that a bad interaction between the data and a hash function can cause this integer map to exceed its size limitations at a load factor of 5%. Even a 20x increase in footprint puts us at 400 million. Each element of that array has 6 entries, so we're at 1.2 billion. That definitely feels like the danger zone. Maybe there's more variants than Danfeng expects, maybe there's more overhead than we've accounted for.
The GATK folks have been chasing down the fix. Kryo released 4.0.0 which should fix this issue. Spark upgraded to Kryo 4.0.0 on September 8th of 2018. (resolving Spark-20389). This change made it to 2.4.0, but it was not back ported to other versions of Spark.
A generally interesting blog post on Java's hashCode, which I haven't fully read, claims that the JVM previously defaulted to a PRNG draw for an object's hash code. In JDK 8 it uses some function of the current thread state. It appears this old strategy is preserved as JVM hashCode parameter value 0 and is less likely to trigger the bad behavior in Kryo. This -XX:hashCode option is undocumented 1, 2 🤷♀️.
Another suggested Kryo option is to disable reference tracking. This would cause duplicate objects in the object graph to be serialized twice:
Kryokryo = newKryo();
kryo.setReferences(false);
The text was updated successfully, but these errors were encountered:
I believe we are encountering this known Kryo limitation: EsotericSoftware/kryo#497, EsotericSoftware/kryo#382 (also see related GATK issue: broadinstitute/gatk#1524)
Danfeng saw the referenced stack trace when trying to broadcast the variants for Plink (see: LoadPlink.scala:202. She was running a import_plink, count.
The details in EsotericSoftware/kryo#382 indicate that a bad interaction between the data and a hash function can cause this integer map to exceed its size limitations at a load factor of 5%. Even a 20x increase in footprint puts us at 400 million. Each element of that array has 6 entries, so we're at 1.2 billion. That definitely feels like the danger zone. Maybe there's more variants than Danfeng expects, maybe there's more overhead than we've accounted for.
The GATK folks have been chasing down the fix. Kryo released 4.0.0 which should fix this issue. Spark upgraded to Kryo 4.0.0 on September 8th of 2018. (resolving Spark-20389). This change made it to 2.4.0, but it was not back ported to other versions of Spark.
GATK references a temporary fix via JVM options, which apparently forces the JVM to use an alternative hash function with better behavior in this specific case:
A generally interesting blog post on Java's hashCode, which I haven't fully read, claims that the JVM previously defaulted to a PRNG draw for an object's hash code. In JDK 8 it uses some function of the current thread state. It appears this old strategy is preserved as JVM hashCode parameter value 0 and is less likely to trigger the bad behavior in Kryo. This
-XX:hashCode
option is undocumented 1, 2 🤷♀️.Another suggested Kryo option is to disable reference tracking. This would cause duplicate objects in the object graph to be serialized twice:
The text was updated successfully, but these errors were encountered: