Support an S3 DLQ in OpenSearch #2298

dlvenable · 2023-02-21T19:19:39Z

Background

The current DLQ in the OpenSearch sink only writes to local files. However, sometimes pipelines authors want these DLQ files on Amazon S3.

Additionally the current DLQ format does not embed useful information on the pipeline. So a pipeline author must add a DLQ file name with the pipeline name to distinguish between multiple sinks and pipelines.

Solution

Create an S3 DLQ option in the OpenSearch sink.

Configurations

The DLQ should allow pipeline authors to configure:

The bucket name (required)
The key prefix (optional; defaults to no prefix and writes to the root of the bucket)

It should use the existing aws: sts_role_arn or aws_sts_role_arn to access the bucket.

Example:

sink:
- opensearch:
    hosts: [...]
    aws:
      sts_role_arn: arn:...
    s3_dlq:
      bucket_name: my-bucket
      key_prefix: path/to/my/dlq/

Compression

This should use compression for all files. Perhaps in the future we could add an option to disable if desired.

Format

This should use the same format as the current DLQ. Namely JSON-ND where each JSON object has the following properties:

Document field - the full document
failure field - the error from OpenSearch

Additionally it should add the following (these can be added to the current DLQ as well):

indexName - the target Index name. With the new dynamic index name, this might be different for any given sink.

Additional Metadata

This should store additional metadata which is relevant for all events. This could be expressed in the S3 object key itself so that it doesn't have to be repeated.

Pipeline name
The DLQ version format. Start at "1"

The key can embed this information:

dlq-v${version}-${pipelineName}-${PLUGIN_ID}-${timestampIso8601}-${uniqueId}.jsonnd.gz

The ${PLUGIN_ID} is currently static, so it will always be opensearch. By using this for now, the format will extend when Data Prepper supports #1025.

A hypothetical full path might be:

path/to/my/dlq/dlq-v1-raw-trace-pipeline-opensearch-20230221T10:11:12Z-a258d8eb-b264-41c6-871a-b53793eaf743.jsonnd.gz

Alternative - Metadata in JSON

The DLQ can include the following metadata in each JSON object:

Pipeline name (e.g. pipelineName: "raw-trace-pipeline")
A DLQ version format ("version" : "1")

Batching

The DLQ should build the document on a local file and send after reaching a threshold. The primary threshold is time. Thus, after a period of time, the file will be written to S3 no matter what. Secondarily, it can have a size threshold in bytes. Once that threshold is reached, it will write to S3 even if the time has not been met. This is similar behavior to that proposed in #1048.

Questions

Is there a standard extension for JSON-ND? I have .jsonnd above, but I'm not sure I've really seen this.
Should we rename the Document field to document? This is more consistent with other JSON. The downside is it would be different from the current DLQ format.

Alternatives

Generic DLQ

It could be useful to have a generic DLQ concept. However, the sink data may vary so it needs some discussion on the format and approach. Having a DLQ for the OpenSearch sink would cover a lot of ground and help users out quickly.

Related Issues

This DLQ is somewhat like #1048, except it is for the DLQ.

The text was updated successfully, but these errors were encountered:

kkondaka · 2023-02-21T19:35:54Z

Looks good to me. We should move the index_name to some high level key like attributes so that the format can be made generic for future generic DLQ.

sharraj · 2023-02-21T19:39:04Z

We should give option to add <index_name> also the objects.

sharraj · 2023-02-21T19:39:57Z

Also we should add detailed error numbers to the metadata. This will give hints to the user while this data landed in DLQs.

dlvenable · 2023-02-21T19:40:01Z

We should move the index_name to some high level key like attributes so that the format can be made generic for future generic DLQ.

I'd like to clarify what you mean. I think you are suggesting that the index name go under a new attributes property in the top-level? Perhaps like the following:

{"document" : "...failed document...", "failure" : "...message...", "attributes: {"indexName" : "my-index-4xx"}}

dlvenable added enhancement New feature or request plugin - sink A plugin to write data to a destination. labels Feb 21, 2023

github-actions bot added the untriaged label Feb 21, 2023

dlvenable assigned chenqi0805 Feb 21, 2023

dlvenable removed the untriaged label Feb 22, 2023

dlvenable added this to the v2.2 milestone Mar 15, 2023

cmanning09 mentioned this issue Mar 21, 2023

adding dlqObject and dlqWriter interface to #2392

Merged

4 tasks

This was referenced Mar 29, 2023

updating opensearch sink constructor to support loading dlq plugins #2415

Merged

implementing s3 dlq writer #2419

Merged

Demo Branch Combining two Work in progress PRs #2428

Closed

adding support for dlq plugins in opensearch #2429

Merged

This was referenced Apr 5, 2023

adding data-time patterns to key-path-prefix and creating readme for S3 DLQ #2451

Merged

[DOC] Add Documentation for Data Prepper OpenSearch DLQ S3 plugin opensearch-project/documentation-website#3692

Closed

cmanning09 closed this as completed in #2451 Apr 5, 2023

besha100 mentioned this issue Feb 1, 2024

Support compression for the DLQ S3 objects #4074

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support an S3 DLQ in OpenSearch #2298

Support an S3 DLQ in OpenSearch #2298

dlvenable commented Feb 21, 2023

kkondaka commented Feb 21, 2023

sharraj commented Feb 21, 2023

sharraj commented Feb 21, 2023

dlvenable commented Feb 21, 2023

Support an S3 DLQ in OpenSearch #2298

Support an S3 DLQ in OpenSearch #2298

Comments

dlvenable commented Feb 21, 2023

Background

Solution

Configurations

Compression

Format

Additional Metadata

Alternative - Metadata in JSON

Batching

Questions

Alternatives

Generic DLQ

Related Issues

kkondaka commented Feb 21, 2023

sharraj commented Feb 21, 2023

sharraj commented Feb 21, 2023

dlvenable commented Feb 21, 2023