Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write elastic search data #891

Closed
jayesh4192 opened this issue Nov 10, 2016 · 4 comments
Closed

Unable to write elastic search data #891

jayesh4192 opened this issue Nov 10, 2016 · 4 comments

Comments

@jayesh4192
Copy link

jayesh4192 commented Nov 10, 2016

Issue description

Unable to write the data using ESStorage using multi index dynamic write.
I am able to index and write the data without dynamic write.
So when I say es.resource=index_name/some_type which is static, I am able to write it.
if I change it to es.resource.write=index_name/{some_field_name} it doesn't work. It just says Cannot detect ES version , which is misleading too.

Description

Steps to reproduce

Code:
register /tmp/elasticsearch-hadoop-pig-5.0.0.jar ;

define EsStorage org.elasticsearch.hadoop.pig.EsStorage('es.resource={name}/some', 'es.nodes=some server', 'es.port=4080', 'es.http.timeout=2m', 'es_net_proxy_http_host=some_proxy_host', 'es_net_proxy_http_port=4080', 'es.http.retries=2', 'es_batch_write_retry_wait=100s', 'es_batch_write_refresh=false');

layout = load '$LAYOUT_TABLE' using PigStorage(',') as (layout_id, name);
layout = foreach layout generate
REPLACE(layout_id,'"','') as layout_id,
REPLACE(name,'"','') as name;
layout = foreach layout generate CurrentTime() as timestamp, (long)layout_id as layout, (chararray)name as name;
STORE layout into '{name}/some' USING EsStorage();

Test/code snippet

Strack trace:
16/11/10 20:41:25 INFO tez.TezJob: DAG Status: status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-23, vertexId=vertex_1478766437060_121497_1_00, diagnostics=[Task failed, taskId=task_1478766437060_121497_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:247)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:545)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:149)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:189)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:129)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:378)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:243)

Stack trace goes here

Version Info

OS: :
JVM :
Hadoop/Spark:
ES-Hadoop : 5.0
ES :

@jbaiera
Copy link
Member

jbaiera commented Nov 10, 2016

Could you increase the logging level to TRACE and add the logs here? Thanks.

@jbaiera jbaiera added the :Pig label Nov 10, 2016
@jayesh4192
Copy link
Author

For some reason I was unable to get the logs when I run it on grid, so I tried running it locally and see another issues, here are the logs, if I turn off auto index create - https://gist.github.com/jayesh4192/fbbd3e7a5280dd12569831993483ba56

@jayesh4192
Copy link
Author

My use case is store the data in multiple indices based on a filtering criteria, for ex: if I have a table which contains dep_name and employee records and I need to store the employee data per department. For example - computer_science/students, electrical/students.
Is it possible to create multiple indexes on the fly ? here is my script

register /tmp/elasticsearch-hadoop-pig-5.0.0.jar ;

define EsStorage org.elasticsearch.hadoop.pig.EsStorage('es.resource.write={name}/publisher', 'es.nodes=host', 'es.port=4080', 'es.http.timeout=2m', 'es_net_proxy_http_host=httpproxy', 'es_net_proxy_http_port=4080', 'es.http.retries=2', 'es.index.auto.create=true');

layout = load '$LAYOUT_TABLE' using PigStorage(',') as (layout_id, name);
layout = foreach layout generate
REPLACE(layout_id,'"','') as layout_id,
REPLACE(name,'"','') as name;
layout = foreach layout generate CurrentTime() as timestamp, (long)layout_id as layout, (chararray)name as name;
STORE layout into '{name}/publisher' USING EsStorage();

here is the gist with the error - https://gist.github.com/jayesh4192/011e9c8683224e0944fd103125602df7

@jbaiera
Copy link
Member

jbaiera commented Mar 28, 2017

@jayesh4192 My guess in this case is that you have index creation disabled in Elasticsearch when a document is indexed. When using a pattern for your resource write you will need to set action.auto_create_index to true on every node in your cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants