-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support an S3 DLQ in OpenSearch #2298
Comments
Looks good to me. We should move the |
We should give option to add <index_name> also the objects. |
Also we should add detailed error numbers to the metadata. This will give hints to the user while this data landed in DLQs. |
I'd like to clarify what you mean. I think you are suggesting that the index name go under a new attributes property in the top-level? Perhaps like the following:
|
Background
The current DLQ in the OpenSearch sink only writes to local files. However, sometimes pipelines authors want these DLQ files on Amazon S3.
Additionally the current DLQ format does not embed useful information on the pipeline. So a pipeline author must add a DLQ file name with the pipeline name to distinguish between multiple sinks and pipelines.
Solution
Create an S3 DLQ option in the OpenSearch sink.
Configurations
The DLQ should allow pipeline authors to configure:
It should use the existing
aws: sts_role_arn
oraws_sts_role_arn
to access the bucket.Example:
Compression
This should use compression for all files. Perhaps in the future we could add an option to disable if desired.
Format
This should use the same format as the current DLQ. Namely JSON-ND where each JSON object has the following properties:
Document
field - the full documentfailure
field - the error from OpenSearchAdditionally it should add the following (these can be added to the current DLQ as well):
indexName
- the target Index name. With the new dynamic index name, this might be different for any given sink.Additional Metadata
This should store additional metadata which is relevant for all events. This could be expressed in the S3 object key itself so that it doesn't have to be repeated.
"1"
The key can embed this information:
The
${PLUGIN_ID}
is currently static, so it will always beopensearch
. By using this for now, the format will extend when Data Prepper supports #1025.A hypothetical full path might be:
Alternative - Metadata in JSON
The DLQ can include the following metadata in each JSON object:
pipelineName: "raw-trace-pipeline"
)"version" : "1"
)Batching
The DLQ should build the document on a local file and send after reaching a threshold. The primary threshold is time. Thus, after a period of time, the file will be written to S3 no matter what. Secondarily, it can have a size threshold in bytes. Once that threshold is reached, it will write to S3 even if the time has not been met. This is similar behavior to that proposed in #1048.
Questions
.jsonnd
above, but I'm not sure I've really seen this.Document
field todocument
? This is more consistent with other JSON. The downside is it would be different from the current DLQ format.Alternatives
Generic DLQ
It could be useful to have a generic DLQ concept. However, the sink data may vary so it needs some discussion on the format and approach. Having a DLQ for the OpenSearch sink would cover a lot of ground and help users out quickly.
Related Issues
This DLQ is somewhat like #1048, except it is for the DLQ.
The text was updated successfully, but these errors were encountered: