Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenSearch Sink option to include Event tags in the document #2800

Closed
wants to merge 2 commits into from

Conversation

kkondaka
Copy link
Collaborator

@kkondaka kkondaka commented Jun 1, 2023

Description

Add OpenSearch Sink option to include Event tags in the document.
This change allows writing events tags to the event document in the opensearch. A new opensearch config option tags_key_name is added. When this option is present, the tags_key_name value is used as name of the key to store the tags in the document.

Example config

- opensearch:                                                                                                                                          
       hosts: ["https://opensearch-node1:9200"]                                                                                                            
       tags_key_name: "tags"                                                                                                                                                                                                                                                                
       index: test_index

Example output

     {
        "_index": "new_http_logs_may_31",
        "_id": "nugsdYgBRCBJrQoAlWPJ",
        "_score": 1,
        "_source": {
          "message": "wordone wordtwo wordthree",
          "tags": [
            "tag1",
            "tag2"
          ]
        }
      }

Resolves #629

Issues Resolved

Resolves #629

Check List

  • [X ] New functionality includes testing.
  • [X ] New functionality has been documented.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
@@ -165,6 +165,8 @@ If a single record turns out to be larger than the set bulk size, it will be sen

- `document_root_key`: The key in the event that will be used as the root in the document. The default is the root of the event. If the key does not exist the entire event is written as the document. If the value at the `document_root_key` is a basic type (ie String, int, etc), the document will have a structure of `{"data": <value of the document_root_key>}`. For example, If we have the following sample event:

- `tags_key_name` (optional): The key name to be used to include event tags in the document. By default this is not set. When this option is not used, tags are not included in the document. If the option is used but the event does not have any tags, it stored as empty list in the document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick with my proposal here (#629 (comment)). All of the sinks need tag support, and we can provide a common config that all sinks can add.

Adding just the tags_key_name here is not ideal for 2 reasons.

  1. It doesn't consolidate the tagging parameter names (we will go back to adding this key name manually to all the sink plugins, and they could stop being consistent).
  2. It doesn't scale very well if we want to add features/options for how the sink handles tags. I think the tagging config in the link above is very direct and easy to understand.

I think we at least need to consolidate whatever configuration / parameter we choose so that if it changes, we don't have to change code in every sink plugin. We could do this a number of ways I think. One option is to create an annotation in data-prepper-api (something like @DataPrepperSinkTagsKey that will set the @JsonProperty("tags_key_name")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did something similar with routes which made the routes available as a global property.

Here is some of the work for it.

@JsonInclude(JsonInclude.Include.NON_EMPTY)
@JsonProperty("routes")
private final List<String> routes;

I think that we could have Data Prepper core read this flag and then update the underlying Event for the sink. Then sinks don't have to have any tagging-specific code. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dlvenable and @graytaylor0. It makes sense to make this a sink level JsonProperty. I will make it similar to routes sink option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable reading the flag can be done in the Data Prepper core but the flag needs to be set in each Sink separately because not all sinks will have this option set. Secondly, we cannot update underlying event (even the current approach I have here is incorrect) because modifying an event would automatically modify the event FOR all sinks because we do not make a copy today when there are non-pipeline-connector sinks.

Only way to do it without copying is to do event.toJsonStringWithTags() instead of event.toJsonString() in DocumentBuilder.build in OpenSearchSink (and equivalent code in other sinks)

@kkondaka kkondaka closed this Jun 23, 2023
@kkondaka kkondaka deleted the os-tag branch July 13, 2023 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tagging Events in Data Prepper
3 participants