Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Hadoop-Spark2Elasticsearch data ingestion problem: Elasticsearch index docs count is greater than Hive table rows count #628
Hi all. I am trying to store a Hive table into an Elasticsearch 1.7 index following this approach: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html.
I see some job failures due to this exeception:
I would say that yarn job re-submission mechanism causes hadoop to re-send records which causes doc replication in ES. Does this explanation make sense? Any suggestions about how to fix it?
Thanks in advance for your help.