You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have packaged the chapter 6 and included the jar using spark-shell.
When I am trying to execute the below code without @transient
@transient val conf = new Configuration()
conf.set(XmlInputFormat.START_TAG_KEY, "")
conf.set(XmlInputFormat.END_TAG_KEY, "")
val kvs = sc.newAPIHadoopFile(path, classOf[XmlInputFormat], classOf[LongWritable],
classOf[Text], conf)
val rawXmls = kvs.map(p => p._2.toString)
I get Caused by: java.io.NotSerializableException: org.apache.hadoop.conf.Configuration .
With transient in place I can proceed further, but after the below transformation
val plainText = rawXmls.flatMap(wikiXmlToPlainText)
I ran a plainText.count
And it gives me the below error.
java.lang.NoClassDefFoundError: com/google/common/base/Charsets
at com.cloudera.datascience.common.XmlInputFormat$XmlRecordReader.(XmlInputFormat.java:79)
at com.cloudera.datascience.common.XmlInputFormat.createRecordReader(XmlInputFormat.java:55)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
Am I missing something here.
I am using spark 1.2 and Hadoop 2.5.2
The text was updated successfully, but these errors were encountered:
@svishnu88 No the Guava issue should be solveable by just not using it directly in the code. Let's see if that's possible. The @transient may be needed to work around how the closure cleaner works. @jwills have you seen this?
OK, the Guava thing shouldn't be an issue. I'm still not sure what to make of the Configuration issue. I wonder if we need to include transient in the text as a precaution? still feels like the wrong way to address this.
I have packaged the chapter 6 and included the jar using spark-shell.
When I am trying to execute the below code without @transient
@transient val conf = new Configuration()
conf.set(XmlInputFormat.START_TAG_KEY, "")
conf.set(XmlInputFormat.END_TAG_KEY, "")
val kvs = sc.newAPIHadoopFile(path, classOf[XmlInputFormat], classOf[LongWritable],
classOf[Text], conf)
val rawXmls = kvs.map(p => p._2.toString)
I get Caused by: java.io.NotSerializableException: org.apache.hadoop.conf.Configuration .
With transient in place I can proceed further, but after the below transformation
val plainText = rawXmls.flatMap(wikiXmlToPlainText)
I ran a plainText.count
And it gives me the below error.
java.lang.NoClassDefFoundError: com/google/common/base/Charsets
at com.cloudera.datascience.common.XmlInputFormat$XmlRecordReader.(XmlInputFormat.java:79)
at com.cloudera.datascience.common.XmlInputFormat.createRecordReader(XmlInputFormat.java:55)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
Am I missing something here.
I am using spark 1.2 and Hadoop 2.5.2
The text was updated successfully, but these errors were encountered: