-
Notifications
You must be signed in to change notification settings - Fork 998
Description
I've created a spark job that reads from elasticsearch and then writes back to it. Locally it runs fine, but when I submit it to a cluster I'm not able to get it working. I've tried different things, as described below.
Spark-cluster is running spark-1.5.1-bin-without-hadoop on 3 nodes as a test setup.
I included the following packages in my uberjar:
org.elasticsearch/elasticsearch-spark_2.10
: 2.2.0-m1 (also tried with 2.1.1)org.elasticsearch/elasticsearch-hadoop-mr
: 2.2.0-m1
The job then fails with
java.lang.ClassNotFoundException: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException
The ClassNotFoundException
surprised me, it also prevents me from seeing the
actual reason for the exception, so I tried to solve the ClassNotFoundException.
I downloaded the packages on all nodes, added them to the spark classpath, rebooted
spark and re-submitted the job.
It then tells me
15/10/30 10:57:42 WARN scheduler.TaskSetManager:
Lost task 0.0 in stage 5.0 (TID 11, 10.20.0.97):
java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
....
Makes sense, there are multiple ES-Hadoop packages.. So I build an uberjar without
the packages described in step 1.
Exception in thread "main" java.lang.ExceptionInInitializerError
....
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.rdd.api.java.JavaEsSpark
This also happens when I don't put the packages on the spark-classpath but provide
them when submitting the job:
- submitted with
--packages org.elasticsearch:elasticsearch-hadoop-mr:2.2.0-m1,org.elasticsearch:elasticsearch-spark_2.10:2.2.0-m1
- submitted with
--jars /home/hadoop/extra/elasticsearch-spark_2.10-2.2.0-m1.jar,/home/hadoop/extra/elasticsearch-hadoop-mr-2.2.0-m1.jar
So now I'm a bit confused of what's the right way to go about this. Do you have
any tips on how to run the job?