Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 as a Sink #1048

Closed
dlvenable opened this issue Feb 16, 2022 · 1 comment
Closed

Support S3 as a Sink #1048

dlvenable opened this issue Feb 16, 2022 · 1 comment
Assignees
Labels
plugin - sink A plugin to write data to a destination.
Milestone

Comments

@dlvenable
Copy link
Member

dlvenable commented Feb 16, 2022

Some pipeline authors would like to save Events to S3.

Some teams using OpenSearch for observability have looked for an ability to store all their events (all logs, trace, and metrics) into S3. This can be a more cost-effective storage solution for data that may not be very important. To fully support this use-case, Data Prepper would need to store the objects into S3 in a form that can later be played back. They could run a sink which loads the data from S3 using the existing S3 source and then sends it into OpenSearch later.

Data Prepper should have a Sink which saves Events to S3 as objects. Likely, an object should contain multiple events.

The S3 sink should support the following:

  • Configurations for the bucket name, key path and key pattern. The key pattern should support timestamps such as logs-${YYYY.mm}.
  • The key pattern should support the time at which it was written, using a similar format to the OpenSearch sink's index pattern.
  • The sink will collect objects (ideally in a local file to handle faults) before sending them to S3 as a large object.
  • Configurations to determine thresholds for writing the S3 objects. These can be any of 1) how many events, 2) how many bytes; or 3) how long events should be collected before writing the S3 object
  • The ability to encode events using a concept similar to Support generic parsers/codecs #1532 for sink-based codecs.
  • The pipeline author can configure the output codec that they wish to use (e.g. newline, JSON, CSV). This will be a pipeline configuration option on the S3 sink.
@dlvenable dlvenable created this issue from a note in Data Prepper Project Roadmap (2.1 - Additional Plugins - Aug 2022) Feb 16, 2022
@dlvenable dlvenable added plugin - sink A plugin to write data to a destination. untriaged labels Feb 16, 2022
@dlvenable dlvenable changed the title Support S3 as a Source Support S3 as a Sink Feb 16, 2022
@dlvenable dlvenable moved this from 2.1 - Additional Plugins (Sep 2022) to 2.2 (Oct 2022) in Data Prepper Project Roadmap May 18, 2022
deepaksahu562 pushed a commit to deepaksahu562/data-prepper that referenced this issue Jan 25, 2023
deepaksahu562 pushed a commit to deepaksahu562/data-prepper that referenced this issue Feb 28, 2023
deepaksahu562 pushed a commit to deepaksahu562/data-prepper that referenced this issue Feb 28, 2023
<h3>Description</h3>
Created "s3-sink" plugin. Github issue : <a
href="https://github.com/opensearch-project/data-prepper/issues/1048">#1048</a>

<h3>Added Functionality</h3>

<ul>
<li>Configurations for the bucket name, key path and key pattern.</li>
<li>The key pattern support timestamps such as
logs-${YYYY.mm}-${uniqueId}.</li>
<li>Collection of objects from Buffer and store it in RAM/Local file
before writing to S3 bucket based on threshold limit </li>
</ul>

<h3>Check List</h3>
<input type = "checkbox"> New functionality s3-sink plugin<br>
<input type = "checkbox"> New functionality has been documented.<br>
<input type = "checkbox"> New functionality has javadoc added.<br>
<br>

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.<br>
For more information on following Developer Certificate of Origin and
signing off your commits, please check <a
href="https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md">here</a>
</body>
</html>
deepaksahu562 pushed a commit to deepaksahu562/data-prepper that referenced this issue Feb 28, 2023
Commits are signed per the DCO using --signoff

Description

Created "s3-sink" plugin. Github issue :
opensearch-project#1048

Added Functionality

Configurations for the bucket name, key path and key pattern.
The key pattern support timestamps such as logs-${YYYY.mm}-${uniqueId}.
Collection of objects from Buffer and store it in RAM/Local file.

Check List
New functionality s3-sink plugin.
New functionality has been documented.
New functionality has javadoc added.
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.<br>
For more information on following Developer Certificate of Origin and
signing off your commits, please check
https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md
This was referenced Feb 28, 2023
deepaksahu562 pushed a commit to deepaksahu562/data-prepper that referenced this issue Feb 28, 2023
Description

Created "s3-sink" plugin.
Github issue : opensearch-project#1048

Added Functionality

Configurations for the bucket name, key path and key pattern.
The key pattern support timestamps such as logs-${YYYY.mm}-${uniqueId}.
Collection of objects from Buffer and store it in RAM/Local file.

Check List
New functionality s3-sink plugin.
New functionality has been documented.
 New functionality has javadoc added.
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md
This was referenced Feb 28, 2023
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue Feb 28, 2023
…1048

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue Mar 1, 2023
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue Mar 2, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue Mar 3, 2023
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue Apr 19, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
dlvenable pushed a commit that referenced this issue Apr 28, 2023
Initial commit for the S3 Sink #1048

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

---------

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
@dlvenable dlvenable added this to the v2.3 milestone May 5, 2023
@dlvenable dlvenable self-assigned this May 5, 2023
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue May 12, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue May 15, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
deepaksahu562 added a commit to deepaksahu562/data-prepper that referenced this issue May 16, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin - sink A plugin to write data to a destination.
Projects
Archived in project
Development

No branches or pull requests

1 participant