Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't append a value to array in elasticsearch from hive #2078

Open
ealio opened this issue Apr 15, 2023 · 1 comment
Open

can't append a value to array in elasticsearch from hive #2078

ealio opened this issue Apr 15, 2023 · 1 comment

Comments

@ealio
Copy link

ealio commented Apr 15, 2023

What kind an issue is this?

  • [* ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.

Issue description

Description
I want to sync data from hive to elasticsearch using the tool of elasticsearch-hadoop. But there is an field of array type (defined as keyword type), I want append the new value into the array when I run insert SQL in hive. But it always failed with the error below.
Ended Job = job_local1121272017_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 FAIL

Steps to reproduce

1. Create an index in the elasticsearch with mapping defined as shown below.
curl -X PUT -H "Content-Type:application/json" -d '{"mappings":{"employee":{"dynamic":false,"properties" : {"empname" : {"type" : "keyword"},"id" : {"type" : "long"},"targetid":{"type":"keyword"}}}}}' "http://localhost:9200/vidaa"

It's created successfully as shown below.

es@ecs-18775:~$ curl "http://localhost:9200/vidaa/employee/_mapping?pretty" { "vidaa" : { "mappings" : { "employee" : { "dynamic" : "false", "properties" : { "empname" : { "type" : "keyword" }, "id" : { "type" : "long" }, "targetid" : { "type" : "keyword" } } } } } } es@ecs-18775:~$

2. Create an external table in the Hive, with and script defined to append data when update.
CREATE EXTERNAL TABLE ext_employee ( id BIGINT, empName STRING, targetid ARRAY<STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' tblproperties('es.resource'='vidaa/employee', 'es.mapping.id'='id','es.write.operation'='upsert','es.update.script.params' = 'a_data:targetid','es.update.script.inline'="ctx._source.targetid.add(params.a_data)",'es.nodes' ='localhost', 'es.port' = '9200', 'es.nodes.wan.only' = 'true');

3. Then I attempted to insert values to ext_employee table and expect sync data to elasticsearch index. The SQL statements are:
insert into ext_employee values (8, 'Vicky', array('co2')); insert into ext_employee values (9, 'Kevin', array('co3'));

4. I want to store 'co2' and 'co3' into an array in elasticsearch like "targetid":["co2", "co3]. But I always got the error below.
Ended Job = job_local1121272017_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 FAIL

Version Info

OS: : ubuntu 18
JVM : 1.8
Hadoop/Spark: Hive: 3.1.3
ES-Hadoop : elasticsearch-hadoop-8.7.0.jar
ES : 6.1.4

Feature description

@jbaiera
Copy link
Member

jbaiera commented Jun 29, 2023

Sorry for the late response here - Are you able to obtain more information from the failed tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants