Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert into elastic search from a partitioned table throws error #724

Closed
smang1 opened this issue Mar 25, 2016 · 7 comments

Comments

@smang1
Copy link

commented Mar 25, 2016

Hello,

I have noticed that Selecting data from a partitioned hive table and inserting into elastic search does not work very will and the map reduce job ends in the following error.


URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1458893148211_0019&tipid=task_1458893148211_0019_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:139)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
        ... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:110)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:155)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:221)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:81)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:66)
        ... 16 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

I have tested similar scenarios by using the different source tables ( Stored as parquet, stored as parquet and snappy compressed) and it works fine. But when i use partitioned hive table as my source table, the job fails with the above error.

I have used Cloudera 5.5 VM for hadoop, elasticsearch-2.2.1 and elasticsearch-hadoop-2.2.0-rc1.jar for my tests.

I attach a zip file with two HQL scripts and the ES-Hadoop jar for reproducing this issue.

Hive-ES.zip

Thanks and Regards
Sa'M

@costin

This comment has been minimized.

Copy link
Member

commented Apr 5, 2016

I don't see anything related to ES in the above stack trace - in fact it looks like reading the partitioned table from a parquet file fails. That is, Spark fails when reading the data before sending it to ES-Hadoop.

@costin

This comment has been minimized.

Copy link
Member

commented Apr 8, 2016

The issue appears to be invalid and without any further update, am closing it.

Cheers,

@olivier-ds

This comment has been minimized.

Copy link

commented Jan 4, 2017

smang1,

I am facing exactly the same issue. Difficult to incriminate ES-Hadoop indeed since the issue occurs only with partitioned tables in Parquet format (no issue with a flat tables in Parquet format or partitioned table in Avro format). Very likely that the issue is in Parquet Serde, nothing to do with ES-Hadoop

@lvguanming

This comment has been minimized.

Copy link

commented Apr 25, 2017

My solution is create temp table . Maybe the issue is in Parquet .

  1. create temp table from select data from 'parquet' format table data ;
    CREATE TABLE temp.order_index_2016 row format delimited fields terminated by '|' STORED AS **RCFile** AS select id, userId, substr(createTime,0,19) as createTime from ods.b_order where time >='2016-01-01' and time<'2017-01-01';

  2. use temp table as source table load to ES (hive table already mapping to ES use ES-Hadoop before).
    insert into temp.order_index_es select id, userId, createTime from temp.order_index_2016

Above 2 step test pass .... This exception doesn't happen.

@kiddingbaby

This comment has been minimized.

Copy link

commented Jul 24, 2017

the same question in es-hadoop-5.2.2, hive-1.2.1, but when i change the serde from parquet to orc is ok.

@WhisperLoli

This comment has been minimized.

Copy link

commented May 5, 2019

@kiddingbaby when i use orc, another terrible thing was happen, only one partition data was in elasticsearch and others were missing.

@WhisperLoli

This comment has been minimized.

Copy link

commented May 7, 2019

@kiddingbaby
I definitely forgot which step. it's OK now. I create a temporary table stored as orc, then import into es from orc table. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.