Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github-issue#1048 : s3-sink with in_memory buffer implementation. #2623

Merged
merged 26 commits into from
May 12, 2023

Conversation

deepaksahu562
Copy link
Contributor

@deepaksahu562 deepaksahu562 commented May 2, 2023

Description

  • Implementation of s3-sink in-memory buffer functionality.

Issues Resolved

GitHub-issue #1048

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
Comment on lines 53 to 55
numEvents = s3SinkConfig.getThresholdOptions().getEventCount();
byteCapacity = s3SinkConfig.getThresholdOptions().getMaximumSize();
duration = s3SinkConfig.getThresholdOptions().getEventCollectTimeOut().getSeconds();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update these variable names for better readability, maybe maxEvents, maxBytes, maxCollectionDuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 79 to 83
if (isUploadedToS3) {
LOG.info("Snapshot uploaded successfully");
} else {
LOG.info("Snapshot upload failed");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dropping the records which are failed to upload to S3?
I think metrics are more helpful here instead of logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have retry or send the data to DLQ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes records will be dropped if upload fails even after reaching max retries.
As suggested added metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have retry or send the data to DLQ.

Added retry functionality.

private final StopWatch watch;

InMemoryBuffer() {
byteArrayOutputStream = new ByteArrayOutputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reset the stream instead of creating a new stream object every time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the code to reset the stream instead of creating a new stream object every time.

@deepaksahu562 deepaksahu562 changed the title S3 sink in memory Github-issue#1048 : s3-sink with in memory buffer implementation. May 2, 2023
@deepaksahu562 deepaksahu562 changed the title Github-issue#1048 : s3-sink with in memory buffer implementation. Github-issue#1048 : s3-sink with in_memory buffer implementation. May 3, 2023
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
when(codec.parse(any())).thenReturn("{\"message\":\"31824252-adba-4c47-a2ac-05d16c5b8140\"}");
S3SinkService s3SinkService = new S3SinkService(s3SinkConfig, bufferFactory, codec);
assertNotNull(s3SinkService);
s3SinkService.output(generateRandomStringEventRecord());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to assert some metrics here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added metrics.

S3SinkService s3SinkService = new S3SinkService(s3SinkConfig, bufferFactory, codec);
assertNotNull(s3SinkService);
s3SinkService.output(generateRandomStringEventRecord());
assertThat(s3SinkService, instanceOf(S3SinkService.class));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this one line above. No point to asserting this after doing output().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

reentrantLock.lock();
final String bucket = s3SinkConfig.getBucketOptions().getBucketName();
if (currentBuffer == null) {
currentBuffer = bufferFactory.getBuffer();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why initialize the currentBuffer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approaches suggested by David.

" Event_count = {} Records & Event_collection_duration = {} Sec",
maxBytes.getBytes(), currentBuffer.getEventCount(), currentBuffer.getDuration());
boolean isUploadedToS3 = currentBuffer.flushToS3(s3Client, bucket, generateKey());
if (isUploadedToS3) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should retry few times before failing. And log the failure reason clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

final String encodedEvent;
encodedEvent = codec.parse(event);
final byte[] encodedBytes = encodedEvent.getBytes();
if (willExceedThreshold()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this condition never becomes true? It will never be uploaded to S3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_count & event_collect_timeout are mandatory attributes so the condition will toggle to false/true.

@deepaksahu562
Copy link
Contributor Author

@dlvenable, Thanks for your review suggestions.
Addressed Change request, Please review one again.

@deepaksahu562
Copy link
Contributor Author

The build is failing with the following:

Error: eckstyle] [ERROR] /home/runner/work/data-prepper/data-prepper/data-prepper-plugins/s3-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/S3Sink.java:18:59: Using the '.*' form of import should be avoided - org.opensearch.dataprepper.plugins.sink.accumulator.*. [AvoidStarImport]

You can check your build locally to see any other errors:

./gradlew -p data-prepper-plugins/s3-sink clean check

Resolved.

encodedEvent = codec.parse(event);
final byte[] encodedBytes = encodedEvent.getBytes();

if (ThresholdCheck.checkThresholdExceed(currentBuffer, maxEvents, maxBytes, maxCollectionDuration)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the maxEvents is 10 and only one event is received, no other event is sent for one hour, I don't see how that one event is flushed to S3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in event collection duration i.e. event_collect_timeout . When the duration is met it will be flushed to S3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the email, this is not handled today. We need a way to handle this in all sinks. We will tackle this in a separate PR (either sink specific or at data prepper core level)

ashoktelukuntla
ashoktelukuntla previously approved these changes May 8, 2023
Copy link
Contributor

@ashoktelukuntla ashoktelukuntla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

kkondaka
kkondaka previously approved these changes May 9, 2023
Copy link
Collaborator

@kkondaka kkondaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is good. But, the build is currently failing:

> Task :data-prepper-plugins:s3-sink:checkstyleTest
FAILURE: Build failed with an exception.
> Task :data-prepper-plugins:s3-sink:spotlessMarkdownCheck FAILED

* What went wrong:
Execution failed for task ':data-prepper-plugins:s3-sink:spotlessMarkdownCheck'.
> The following files had format violations:
      README.md
          @@ -63,4 +63,4 @@
           This·plugin·is·compatible·with·Java·8.·See
           
           -·[CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md)
          --·[monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
          +-·[monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
  Run './gradlew :data-prepper-plugins:s3-sink:spotlessApply' to fix these violations.

You can run the checkstyle locally to verify:

./gradlew -p data-prepper-plugins/s3-sink check

@deepaksahu562
Copy link
Contributor Author

Overall this is good. But, the build is currently failing:

> Task :data-prepper-plugins:s3-sink:checkstyleTest
FAILURE: Build failed with an exception.
> Task :data-prepper-plugins:s3-sink:spotlessMarkdownCheck FAILED

* What went wrong:
Execution failed for task ':data-prepper-plugins:s3-sink:spotlessMarkdownCheck'.
> The following files had format violations:
      README.md
          @@ -63,4 +63,4 @@
           This·plugin·is·compatible·with·Java·8.·See
           
           -·[CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md)
          --·[monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
          +-·[monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
  Run './gradlew :data-prepper-plugins:s3-sink:spotlessApply' to fix these violations.

You can run the checkstyle locally to verify:

./gradlew -p data-prepper-plugins/s3-sink check

Addressed.

@dlvenable
Copy link
Member

@dlvenable
Copy link
Member

I checked out your work, and the suggestion seemed to get me past the error.

./gradlew :data-prepper-plugins:s3-sink:spotlessApply

However, I do see failing unit tests.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Merging #2623 (55816be) into main (961d492) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #2623      +/-   ##
============================================
+ Coverage     93.52%   93.54%   +0.01%     
- Complexity     2238     2250      +12     
============================================
  Files           261      262       +1     
  Lines          6275     6291      +16     
  Branches        519      520       +1     
============================================
+ Hits           5869     5885      +16     
  Misses          268      268              
  Partials        138      138              
Impacted Files Coverage Δ
.../dataprepper/model/event/DefaultEventMetadata.java 91.30% <100.00%> (+0.82%) ⬆️
...aprepper/expression/HasTagsExpressionFunction.java 100.00% <100.00%> (ø)

@deepaksahu562
Copy link
Contributor Author

I checked out your work, and the suggestion seemed to get me past the error.

./gradlew :data-prepper-plugins:s3-sink:spotlessApply

However, I do see failing unit tests.

@dlvenable \ @ashoktelukuntla Thanks for your suggestions. Build issue resolved and all checks have passed. Could you please review

Copy link
Collaborator

@kkondaka kkondaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kkondaka kkondaka merged commit 95c319d into opensearch-project:main May 12, 2023
48 checks passed
rajeshLovesToCode pushed a commit to rajeshLovesToCode/data-prepper that referenced this pull request May 16, 2023
…ensearch-project#2623)

* Github-issue#1048 : s3-sink with in-memory buffer implementation.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink with in-memory buffer implementation.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink with in-memory buffer implementation.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink - added JUnit test classes.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink - incorporated review comment.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink - incorporated review comment.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink - local-file buffer implementation.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : s3-sink - in-memory buffer implementation.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : resolved -  checkstyle error.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : incorporated review comment.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* Github-issue#1048 : incorporated review comment.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* GitHub-issue#1048 : Incorporated review comments.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* GitHub-issue#1048 : Incorporated review comments.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* GitHub-issue#1048 : Incorporated review comments.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

* GitHub-issue#1048 : Resolved javadoc issues.

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>

---------

Signed-off-by: Deepak Sahu <deepak.sahu562@gmail.com>
Signed-off-by: rajeshLovesToCode <rajesh.dharamdasani3021@gmail.com>
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this pull request May 18, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this pull request May 18, 2023
@dlvenable dlvenable mentioned this pull request Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants