Skip to content

Replace/Ignore DOT character in field names before inserting into Elastic Search #758

Closed
@gowthamsadasivam

Description

@gowthamsadasivam

What kind an issue is this?

  • Feature Request

Feature description

Elastic Search v 2.x stopped supporting the DOT ('.') character in the field names. There has been various discussions going on to support and handle the side effects of this change.

elastic/elasticsearch#17759

elastic/elasticsearch#15714

elastic/elasticsearch#15951

Reproduce the issue

Example Document:

app { "adv.id": "efT3Fg5JnvJVs57IOnc" }

^ Here the field name "adv.id" contains '.' dot character. While trying to insert using Hive insert with ES-Hadoop Connector, results in the following error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - Field name [adv.id] cannot contain '.'; Bailing out..

^ And the entire Hive Job will be failed due to the error.

Meanwhile Logstash supports this via de_dot as well as we can even use Ruby block to replace the DOT character with something else before writing to Elastic Search. I couldn't find a similar feature that can be used to achieve the same with ES-Hadoop Connector.

It would be great if there a feature/configuration that can be used to replace the DOT character in the field names with some other character or just ignore the document which contains the DOT character before writing to Elastic Search.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions