Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class #862

Closed
mikals0ft opened this issue Oct 7, 2016 · 8 comments
Closed

Comments

@mikals0ft
Copy link

Hi,
I am using version 5.0 of this library to create DataFrames from my ElasticSearch cluster. I am able to create the DataFrame, but as soon as I run an action on it like take or count, it gives me the following Class Not Found error:

java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:10)
at org.elasticsearch.spark.sql.ScalaEsRowRDDIterator.(ScalaEsRowRDD.scala:31)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:27)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:20)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more

I'm guessing this is because the latest version of the library doesn't support Scala 2.11. Can somebody please confirm that this is true? Thanks.

Note: I am using Spark 2.0.0

@akoira
Copy link

akoira commented Oct 13, 2016

I had a similar issue on spark 2.0.1 and elasticsearch-hadoop 2.4.0.
I replaced file SPARK_HOME/jars/elasticsearch-hadoop-2.4.0.jar with elasticsearch-spark_2.11-2.4.0.jar
which I got here https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark_2.11

@funkomatic
Copy link

Just a comment to say that this issue occurs with 5.0.0-rc1 but not with 5.0.0-beta1.
So there might be a possible regression between the two releases.
On my side I had to rollback to beta1 in order to make the reading work with spark 2.0.0.
I sincerely hope the next release will fix the behavior.

@jbaiera
Copy link
Member

jbaiera commented Oct 25, 2016

This exception almost always indicates that the ES-Hadoop jar is expecting an incompatible version of Scala. Scala 2.11 is not backwards compatible with 2.10. Because of this, we release two versions: one for each version of Scala. The Spark artifact names (elasticsearch-spark-20_2.11-5.0.0-rc1.jar) are formatted with the version of Spark that it supports (20 means 2.0+), the version of Scala it supports (2.11 is Scala 2.11.x) and the version of ES-Hadoop (5.0.0-rc1).

@funkomatic
Copy link

Hi, thanks for the info I'll try this library instead of the hadoop rc1.

@qindeqiang
Copy link

@michaelqiu94 My ES Cluster is 5.5.1,I have download the newest jar to resolve the issue.The following is download url:https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-20_2.11/5.5.1.

@ebuildy
Copy link
Contributor

ebuildy commented Feb 27, 2018

To see the Spark scala version, run spark-shell (since spark-submit --version doesnt display it).

@ghost
Copy link

ghost commented May 31, 2018

I got this exception when I used elasticsearch-hadoop:2.4.5 against the spark version spark-core:2.2.0.

Since Spark 2.0 uses scala_2.11 by default and we may have to rebuild spark if we want to use scala_2.10, I resorted to use spark:1.x to which uses scala_2.10 by default. It fixed the error.

@jbaiera
Copy link
Member

jbaiera commented Jun 1, 2018

@karthikeyanpa90 You can use Scala 2.10 with Spark 2.X and up. You just need to use the compatibility jar that we release instead of the regular version: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html#_minimalistic_binaries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants