java.lang.ClassCastException on df.count() #34

nelson2005 · 2018-06-25T16:41:34Z

I loaded a sas dataset into a dataframe without incident, and can print the schema. However, when I do anything with it (like count() or save()) I get a cast exception (trimmed stack below, let me know if more is helpful). It's a small dataset, 1.6GB. I did notice that the latest maven spark-sas7bdat doesn't pull in the latest parso (2.0.9). The only bug apparently fixed in parso 2.0.9 seems not to be related to what I'm seeing

Command in spark shell:
spark.read.format("com.github.saurfang.sas.spark").load("hdfs://nameservice1/users/bob/test.sas7bdat").count()

This is on Cloudera spark 2.2
libraryDependencies += "saurfang" % "spark-sas7bdat" % "2.0.0-s_2.11"

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
......

nelson2005 · 2018-06-26T17:03:22Z

Okay, after some research I found the issue. I'm running spark2-shell on a YARN cluster, the sas7bdat jar needs to added to the spark-shell classpath and the executor classpath, and the saurfang/parso jars need to be uploaded to the executors as part of the job. In the end my fat-jar assembly command was

spark2-shell -cp /tmp/tools_scala_shell.jar --conf spark.executor.extraClassPath=./tools_scala_shell.jar --files /tmp/tools_scala_shell.jar

Thanks for a great library!

nelson2005 closed this as completed Jun 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.ClassCastException on df.count() #34

java.lang.ClassCastException on df.count() #34

nelson2005 commented Jun 25, 2018

nelson2005 commented Jun 26, 2018

java.lang.ClassCastException on df.count() #34

java.lang.ClassCastException on df.count() #34

Comments

nelson2005 commented Jun 25, 2018

nelson2005 commented Jun 26, 2018