New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to process into Elasticsearch Found #622

Closed
douwevdijk opened this Issue Dec 14, 2015 · 12 comments

Comments

Projects
None yet
3 participants
@douwevdijk

douwevdijk commented Dec 14, 2015

Hi,

We just migrated to a cloud solution using Found. However, when trying to process data into the cluster we are constantly getting the same error:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

We did not have this issue on our local machine ( using the exact same statement ) so maybe this has something to do with Found?

More details:

Script using: PIG
Script statement: STORE bla INTO 'blabla' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://xxx.eu-west-1.aws.found.io:9200');

I also tried without using port number, no luck.

@ebuildy

This comment has been minimized.

Show comment
Hide comment
@ebuildy

ebuildy Dec 15, 2015

Contributor

I got a similar error:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1967)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:110)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:58)
    at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:227)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:457)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:438)
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:115)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)

Using ES / Spark / Python, running a simple:

rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat","org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable",conf={"es.resource" :"events/events"})

It looks like an issue with elasticSearch server version?

Contributor

ebuildy commented Dec 15, 2015

I got a similar error:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1967)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:110)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:58)
    at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:227)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:457)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:438)
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:115)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)

Using ES / Spark / Python, running a simple:

rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat","org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable",conf={"es.resource" :"events/events"})

It looks like an issue with elasticSearch server version?

@ebuildy

This comment has been minimized.

Show comment
Hide comment
@ebuildy

ebuildy Dec 15, 2015

Contributor

Are you using the v2.1.0?

Here https://github.com/elastic/elasticsearch-hadoop/blob/v2.1.0/mr/src/main/java/org/elasticsearch/hadoop/rest/RestClient.java#L110 this cannot work, since ES servers gives:

http_address: "172.17.0.2:9200"
Contributor

ebuildy commented Dec 15, 2015

Are you using the v2.1.0?

Here https://github.com/elastic/elasticsearch-hadoop/blob/v2.1.0/mr/src/main/java/org/elasticsearch/hadoop/rest/RestClient.java#L110 this cannot work, since ES servers gives:

http_address: "172.17.0.2:9200"
@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Dec 15, 2015

Member

Guys, what version of ES-Hadoop are you using? 2.1.2 is not compatible with Elasticsearch 2.0 - only ES-Hadoop 2.2 (currently out at beta1).

Member

costin commented Dec 15, 2015

Guys, what version of ES-Hadoop are you using? 2.1.2 is not compatible with Elasticsearch 2.0 - only ES-Hadoop 2.2 (currently out at beta1).

@ebuildy

This comment has been minimized.

Show comment
Hide comment
@ebuildy

ebuildy Dec 15, 2015

Contributor

Yes got it! Was this reason.
Le 15 déc. 2015 7:48 PM, "Costin Leau" notifications@github.com a écrit :

Guys, what version of ES-Hadoop are you using? 2.1.2 is not compatible
with Elasticsearch 2.0 - only ES-Hadoop 2.2 (currently out at beta1).


Reply to this email directly or view it on GitHub
#622 (comment)
.

Contributor

ebuildy commented Dec 15, 2015

Yes got it! Was this reason.
Le 15 déc. 2015 7:48 PM, "Costin Leau" notifications@github.com a écrit :

Guys, what version of ES-Hadoop are you using? 2.1.2 is not compatible
with Elasticsearch 2.0 - only ES-Hadoop 2.2 (currently out at beta1).


Reply to this email directly or view it on GitHub
#622 (comment)
.

@douwevdijk

This comment has been minimized.

Show comment
Hide comment
@douwevdijk

douwevdijk Dec 15, 2015

I tried it, still no luck. Does anyone know which version Found is using?

douwevdijk commented Dec 15, 2015

I tried it, still no luck. Does anyone know which version Found is using?

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Dec 16, 2015

Member

@douwevdijk Can you be a bit more specific about what does not work?
Also, have you read the section on cloud/restricted environments?

Without enough information, it's hard to offer any solution...

Member

costin commented Dec 16, 2015

@douwevdijk Can you be a bit more specific about what does not work?
Also, have you read the section on cloud/restricted environments?

Without enough information, it's hard to offer any solution...

@douwevdijk

This comment has been minimized.

Show comment
Hide comment
@douwevdijk

douwevdijk Dec 16, 2015

Hi Costin,

Yes I have read this. I have tried processing with just the most minimal security settings and I checked if any IP was blocked.

But: I am constantly getting the same error and cannot explain it: java.lang.StringIndexOutOfBoundsException: String index out of range: -1

Theoretically this should mean there is something wrong with my code, but I also tried with just processing 1 field. See below my code:

register /usr/lib/pig/lib/elasticsearch-hadoop-pig-2.2.0-beta1.jar
register /usr/lib/pig/lib/piggybank.jar;

d = LOAD 'leads.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE') AS (lead_id:chararray, cmp:chararray, status: chararray);

STORE d INTO 'chin/leads' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://XXX.eu-west-1.aws.found.io:9200', 'es.mapping.pig.tuple.use.field.names:true');

FYI, I tried without setting port to 9200 as this is supposed to be default but got a node not available error.

douwevdijk commented Dec 16, 2015

Hi Costin,

Yes I have read this. I have tried processing with just the most minimal security settings and I checked if any IP was blocked.

But: I am constantly getting the same error and cannot explain it: java.lang.StringIndexOutOfBoundsException: String index out of range: -1

Theoretically this should mean there is something wrong with my code, but I also tried with just processing 1 field. See below my code:

register /usr/lib/pig/lib/elasticsearch-hadoop-pig-2.2.0-beta1.jar
register /usr/lib/pig/lib/piggybank.jar;

d = LOAD 'leads.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE') AS (lead_id:chararray, cmp:chararray, status: chararray);

STORE d INTO 'chin/leads' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://XXX.eu-west-1.aws.found.io:9200', 'es.mapping.pig.tuple.use.field.names:true');

FYI, I tried without setting port to 9200 as this is supposed to be default but got a node not available error.

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Dec 16, 2015

Member

Please turn on logging and post them as gist. The error indicates version 2.2.0 is not used but rather an older version (2.1.0) potentially which is probably still available in the classpath.

Member

costin commented Dec 16, 2015

Please turn on logging and post them as gist. The error indicates version 2.2.0 is not used but rather an older version (2.1.0) potentially which is probably still available in the classpath.

@douwevdijk

This comment has been minimized.

Show comment
Hide comment
@douwevdijk

douwevdijk Dec 16, 2015

Hi Costin,

See here: https://gist.github.com/douwevdijk/1a76b4e061d65e6e9b14

I tried to enable logging but I am not sure this gist includes the lines you need.

douwevdijk commented Dec 16, 2015

Hi Costin,

See here: https://gist.github.com/douwevdijk/1a76b4e061d65e6e9b14

I tried to enable logging but I am not sure this gist includes the lines you need.

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Dec 16, 2015

Member

Right, there's no logging only the exception. By the way, you should see in the job start-up the ES-Hadoop version used.

One more thing, can you try the latest dev snapshot?

Member

costin commented Dec 16, 2015

Right, there's no logging only the exception. By the way, you should see in the job start-up the ES-Hadoop version used.

One more thing, can you try the latest dev snapshot?

@douwevdijk

This comment has been minimized.

Show comment
Hide comment
@douwevdijk

douwevdijk Dec 16, 2015

Hi,

Indeed PIG was using the 2.1 jar file. So I have succesfully imported the latest .jar ( again). But now I have to dealer with a connection timeout ( didnt had that with the previous .jar )...

douwevdijk commented Dec 16, 2015

Hi,

Indeed PIG was using the 2.1 jar file. So I have succesfully imported the latest .jar ( again). But now I have to dealer with a connection timeout ( didnt had that with the previous .jar )...

@douwevdijk

This comment has been minimized.

Show comment
Hide comment
@douwevdijk

douwevdijk Dec 16, 2015

OK, it works now! After enabling es.nodes.wan.only. Thanks for the support.

douwevdijk commented Dec 16, 2015

OK, it works now! After enabling es.nodes.wan.only. Thanks for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment