New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleHttpConnectionManager problem with elasticsearch-hadoop-2.1.2.jar on Spark #618

Closed
vichargrave opened this Issue Nov 28, 2015 · 3 comments

Comments

Projects
None yet
3 participants
@vichargrave

vichargrave commented Nov 28, 2015

I'm seeing an issue when using pyspark, Elasticsearch 1.7.0, elasticsearch-hadoop-2.1.2.jar on Spark 1.5.1 (all running on my Mac OS Yosemite system). I run the simple program shown below (from the article at http://qbox.io/blog/elasticsearch-in-apache-spark-python). After the print(es_rdd.first()) statement is executed, pyshark just hangs:

Using Python version 2.7.10 (default, Oct 19 2015 18:31:17)
SparkContext available as sc, HiveContext available as sqlContext.

>>> es_rdd = sc.newAPIHadoopRDD(
...     inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
...     keyClass="org.apache.hadoop.io.NullWritable",
...     valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
...     conf={ "es.resource" : "titanic/passenger" })
15/11/27 18:16:41 WARN EsInputFormat: Cannot determine task id...
>>> print(es_rdd.first())
15/11/27 18:16:50 WARN EsInputFormat: Cannot determine task id...
15/11/27 18:16:51 WARN SimpleHttpConnectionManager: SimpleHttpConnectionManager being used incorrectly.  Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.

When I stop Elasticsearch I get the following output:

15/11/27 18:33:32 ERROR NetworkClient: Node [10.0.0.2:9200] failed (The server 10.0.0.2 failed to respond with a valid HTTP response); no other nodes left - aborting...
15/11/27 18:33:32 WARN NewHadoopRDD: Exception in RecordReader.close()
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.0.0.2:9200]]
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:329)
    at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:337)
    at org.elasticsearch.hadoop.rest.RestClient.deleteScroll(RestClient.java:403)
    at org.elasticsearch.hadoop.rest.ScrollQuery.close(ScrollQuery.java:70)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.close(EsInputFormat.java:262)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.org$apache$spark$rdd$NewHadoopRDD$$anon$$close(NewHadoopRDD.scala:190)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
    at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
    at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
    at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
    at org.apache.spark.scheduler.Task.run(Task.scala:90)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
(u'892', {u'fare': u'7.8292', u'name': u'Kelly, Mr. James', u'embarked': u'Q', u'age': u'34.5', u'parch': u'0', u'pclass': u'3', u'sex': u'male', u'ticket': u'330911', u'passengerid': u'892', u'sibsp': u'0', u'cabin': None})

Note 10.0.0.2 is the IP address of my Mac. At any rate, I end up getting the expected output (the last line above) after a series of error messages. When I use elasticsearch-hadoop.2.1.0.jar instead of 2.1.2 I do not see this problem and the program runs without error.

Is this an incompatibility problem with elasticsearch-hadoop.2.1.2.jar, Elaseticsearch 1.7.0, and Spark 1.5.1?

@larghir

This comment has been minimized.

Show comment
Hide comment
@larghir

larghir Dec 22, 2015

I am getting a similar error:
ERROR NetworkClient: Node [The server <...> failed to respond with a valid HTTP response] failed (<...>:9300); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[<...>:9300]]

<...> stands for the actual IP

I am using Spark 1.5.1, Elasticsearch 1.7.1 and elasticsearch-spark_2.11 version 2.1.1

Port is open, I am able to connect with a different client.
Any hints appreciated.

larghir commented Dec 22, 2015

I am getting a similar error:
ERROR NetworkClient: Node [The server <...> failed to respond with a valid HTTP response] failed (<...>:9300); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[<...>:9300]]

<...> stands for the actual IP

I am using Spark 1.5.1, Elasticsearch 1.7.1 and elasticsearch-spark_2.11 version 2.1.1

Port is open, I am able to connect with a different client.
Any hints appreciated.

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Jan 8, 2016

Member

I'm pretty sure you bumped into the same issue as described in #591. This has been fixed in master and will be included in the upcoming 2.2 rc1. Can you please check it out once is released and if it's not working, reopen the issue?

Thanks,

Member

costin commented Jan 8, 2016

I'm pretty sure you bumped into the same issue as described in #591. This has been fixed in master and will be included in the upcoming 2.2 rc1. Can you please check it out once is released and if it's not working, reopen the issue?

Thanks,

@vichargrave

This comment has been minimized.

Show comment
Hide comment
@vichargrave

vichargrave Jan 8, 2016

OK will do. Thanks.

On 1/8/16, Costin Leau notifications@github.com wrote:

Closed #618.


Reply to this email directly or view it on GitHub:
#618 (comment)

vichargrave commented Jan 8, 2016

OK will do. Thanks.

On 1/8/16, Costin Leau notifications@github.com wrote:

Closed #618.


Reply to this email directly or view it on GitHub:
#618 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment