Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use elasticsearch-spark to connect to elasticsearch server behind a proxy server #643

Closed
andrimirandi opened this issue Jan 5, 2016 · 4 comments

Comments

@andrimirandi
Copy link

commented Jan 5, 2016

I have used elasticsearch-spark to connect to elasticsearch server without a problem,
but somehow I found problem connecting to elasticsearch server behind proxy server
here's my configuration

my server configuration is as follow :

  • I have a single public ip address which I put in my routing server ( lets say its x.x.x.x)
  • I many nodes, one of them is elasticsearch which use a local ip (192.168.0.*)

when I try to connect to x.x.x.x using elasticsearch-spark using this code

val elasticIndex = "index/type"
val url = "x.x.x.x:9200"
val reader = sqlContext.read.
     format("org.elasticsearch.spark.sql").
     option("es.net.http.auth.user","username").
     option("es.net.http.auth.pass","password").
     option("es.nodes",url)

println(s"Loading: ${url} ...")
val df = reader.load(elasticIndex)
df.printSchema()
df.count()

the printSchema works fine...
but the count returns connection timeout error

when I take a look at the logs, I found that the local address 192.168.0.* is still being used

16/01/05 00:38:33 DEBUG ScalaEsRowRDD: Discovered Elasticsearch version [1.4.4]
16/01/05 00:38:33 DEBUG HttpMethodBase: Resorting to protocol version default close connection policy
16/01/05 00:38:33 DEBUG HttpMethodBase: Should NOT close connection, using HTTP/1.1
16/01/05 00:38:33 DEBUG HttpConnection: Releasing connection back to connection manager.
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.method.retry-handler = org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport$1@408711b5
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.connection-manager.timeout = 60000
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.socket.timeout = 60000
16/01/05 00:38:33 INFO CommonsHttpTransport: Using detected HTTP Auth credentials...
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.authentication.preemptive = true
16/01/05 00:38:33 DEBUG DefaultHttpParams: Set parameter http.tcp.nodelay = true
16/01/05 00:38:33 DEBUG HttpMethodDirector: Preemptively sending default basic credentials
16/01/05 00:38:33 DEBUG HttpMethodDirector: Authenticating with BASIC <any realm>@43.231.128.57:9200
16/01/05 00:38:33 DEBUG HttpMethodParams: Credential charset not configured, using HTTP element charset
16/01/05 00:38:33 DEBUG HttpConnection: Open connection to x.x.x.x:9200
16/01/05 00:38:33 DEBUG HttpMethodBase: Adding Host request header
16/01/05 00:38:33 DEBUG ScalaEsRowRDD: Nodes discovery enabled - found [192.168.0.5:9200]
16/01/05 00:38:33 DEBUG HttpMethodBase: Resorting to protocol version default close connection policy
16/01/05 00:38:33 DEBUG HttpMethodBase: Should NOT close connection, using HTTP/1.1
16/01/05 00:38:33 DEBUG HttpConnection: Releasing connection back to connection manager.

am I missing something ?
please help

Thanks

@costin

This comment has been minimized.

Copy link
Member

commented Jan 5, 2016

Handling your deployment is explained in the docs.

Next time, for questions please use the forum instead of the bug tracker. Thanks!

@costin costin closed this Jan 5, 2016

@costin costin added the :Rest label Jan 5, 2016

@andrimirandi

This comment has been minimized.

Copy link
Author

commented Jan 7, 2016

Thanks

@yogeshdarji99

This comment has been minimized.

Copy link

commented Nov 18, 2016

Hello, i run the same code as yours in my docker container:

Code:

import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._

val elasticIndex = "twitter"
val url = "172.17.X.X:9200"
val reader = sqlContext.read.format("org.elasticsearch.spark.sql").option("es.nodes",url)

println(s"Loading: ${url} ...")
val df = reader.load(elasticIndex)
df.printSchema()
df.count()

Getting below error while executing val df = reader.load(elasticIndex)

scala> val df = reader.load(elasticIndex)
java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql. Please find packages at http://spark-packages.org
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
  ... 58 elided
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.sql.DefaultSource
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
  at scala.util.Try$.apply(Try.scala:192)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
  at scala.util.Try.orElse(Try.scala:84)
  at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
  ... 61 more

Am I missing anything. Any help is appreciated!

@ebuildy

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2017

Indeed, this is a very good question!

Took me a few time to figure out:

If you use an Uber jar (via minimizeJar to true), this class will not be included, hence the error, despite you follow deployement recommandation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.