Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch client implementation with pit and no context search #2910

Merged
merged 3 commits into from
Jun 21, 2023

Conversation

graytaylor0
Copy link
Member

@graytaylor0 graytaylor0 commented Jun 20, 2023

Description

This change adds support for Elasticsearch clusters by creating an Elasticsearch client when the opensearch client is unable to lookup cluster info.

This change also implements processing indices with point in time or no search context.

During testing, I found that es clusters with build-flavor of oss do not support point in time, while a build-flavor of default does. So the current requirements to use point in time is that the es cluster has to be greater than 7.10.0, and must have build-flavor that is not oss.

This was tested on 2 different es clusters with version 7.10.2, one with oss and one with default build-flavor.

The code is very similar to that for opensearch, just with different classes from the elasticsearch client.

Issues Resolved

Related to #1985

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…se pit or scroll, allow override with search_context_type parameter

Signed-off-by: Taylor Gray <tylgry@amazon.com>
…csearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
.collect(Collectors.toList());
}

private boolean shouldIndexBeProcessed(final IndicesRecord openSearchIndicesRecord, final co.elastic.clients.elasticsearch.cat.indices.IndicesRecord elasticSearchIndicesRecord) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just take in the index name here to simplify the logic.

The calls above to this method would then be filter(record -> shouldIndexBeProcessed(record.index()))

import java.util.stream.Collectors;

import static org.opensearch.dataprepper.plugins.source.opensearch.worker.client.model.MetadataKeyAttributes.DOCUMENT_ID_METADATA_ATTRIBUTE_NAME;
import static org.opensearch.dataprepper.plugins.source.opensearch.worker.client.model.MetadataKeyAttributes.INDEX_METADATA_ATTRIBUTE_NAME;

public class ElasticsearchAccessor implements SearchAccessor, ClusterClientFactory {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's room to consolidate common logic between this class and OpenSearchAccessor. We can handle that down the line though

}
}

private ElasticsearchTransport createElasticSearchTransport(final org.elasticsearch.client.RestClient restClient) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client construction logic here and below belongs in another class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make a helper factory class and move the functions for client creation to that class for opensearch and elasticsearch

Signed-off-by: Taylor Gray <tylgry@amazon.com>
@graytaylor0 graytaylor0 merged commit d4ad1b0 into opensearch-project:main Jun 21, 2023
23 of 24 checks passed
MaGonzalMayedo pushed a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jul 3, 2023
…pensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
MaGonzalMayedo pushed a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jul 3, 2023
…pensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
kkondaka pushed a commit that referenced this pull request Jul 5, 2023
* Elasticsearch client implementation with pit and no context search (#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* GitHub-Issue#2778: Refactoring config files for CloudWatchLogs Sink (#4)

Added Config Files for CloudWatchLogs Sink.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes from comments to code (including pathing and nomenclature syntax)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Refactoring config (#5)

Added default params for back_off and log_send_interval alongside test cases for ThresholdConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed deleted AwsConfig file

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed the s3 dependency from build.gradle, replaced the AwsAuth.. with AwsConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added modifiable back_off_timer, added threshold test for back_off_timer and params to AwsConfig

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes to gradle file, added tests to AwsConfig, and used Reflective mapping to tests CwlSink

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added default value test to ThresholdConfig and renamed getter for maxRequestSize

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unnecessary imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added cloudwatch-logs to settings.gradle

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added a quick fix to the back_off_time range

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

---------

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>
Co-authored-by: Taylor Gray <tylgry@amazon.com>
Co-authored-by: Marcos <alemayed@amazon.com>
MaGonzalMayedo pushed a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jul 25, 2023
…pensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
MaGonzalMayedo added a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jul 25, 2023
…-project#2922)

* Elasticsearch client implementation with pit and no context search (opensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* GitHub-Issue#2778: Refactoring config files for CloudWatchLogs Sink (#4)

Added Config Files for CloudWatchLogs Sink.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes from comments to code (including pathing and nomenclature syntax)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Refactoring config (#5)

Added default params for back_off and log_send_interval alongside test cases for ThresholdConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed deleted AwsConfig file

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed the s3 dependency from build.gradle, replaced the AwsAuth.. with AwsConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added modifiable back_off_timer, added threshold test for back_off_timer and params to AwsConfig

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes to gradle file, added tests to AwsConfig, and used Reflective mapping to tests CwlSink

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added default value test to ThresholdConfig and renamed getter for maxRequestSize

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unnecessary imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added cloudwatch-logs to settings.gradle

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added a quick fix to the back_off_time range

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

---------

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>
Co-authored-by: Taylor Gray <tylgry@amazon.com>
Co-authored-by: Marcos <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
dlvenable pushed a commit that referenced this pull request Jul 31, 2023
…ionException (#3023)

* Elasticsearch client implementation with pit and no context search (#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* GitHub-Issue#2778: Refactoring config files for CloudWatchLogs Sink (#4)

Added Config Files for CloudWatchLogs Sink.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes from comments to code (including pathing and nomenclature syntax)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Refactoring config (#5)

Added default params for back_off and log_send_interval alongside test cases for ThresholdConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed deleted AwsConfig file

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed the s3 dependency from build.gradle, replaced the AwsAuth.. with AwsConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added modifiable back_off_timer, added threshold test for back_off_timer and params to AwsConfig

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes to gradle file, added tests to AwsConfig, and used Reflective mapping to tests CwlSink

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added default value test to ThresholdConfig and renamed getter for maxRequestSize

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unnecessary imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added cloudwatch-logs to settings.gradle

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added a quick fix to the back_off_time range

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added Buffer classes, ClientFactory similar to S3, and ThresholdCheck

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unnecessary default method from ClientFactory

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added comments in Buffer Interface, change some default values to suit the plugin use case more

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unused imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Changed the unused imports, made parameters final in the ThresholdCheck

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Made changes to the tests and the method signatures in ThresholdCheck, made fixes to gradle file to include catalog

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unused methods/comments

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added CloudWatchLogsService, CloudWatchLogsServiceTest and RetransmissionLimitException

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed retransmission logging fixed value

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed unused imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed making ThresholdCheck public

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes to ThresholdCheck and CloudWatchLogsService to decouple methods

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed syntax start import in CloudWatchLogsServiceTest

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Extracted LogPusher and SinkStopWatch classes for code cleanup. Addded fixes to variables and retry logic for InterruptExceptions

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Changed method uses in CloudWatchLogsService and removed logging the batch size in LogPusher

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added Multithreaded CloudWatchLogsDispatcher for handling various async calls to perform PLE's

and added tests

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixesto test and defaulted the parameters in the config to CloudWatchLogs limits, customer can change this in config file

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added exponential backofftime

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed unused imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed up deepcopy of arraylist for service workers in CloudWatchLogsService, and fixed Log calling methods

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added CloudWatchLogsDispatcher builder pattern, fixed tests for Service and Dispatcher and modified backOffTimeBase

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unused imports

Signed-off-by:Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added resetBuffer method, removed unnecessary RetransmissionException, and added logString pass in parameter for staging log events.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Started making changes to the tests to implement the new class structure (performance enhancement)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Refactored the CloudWatchLogsDispatcher into two classes with the addition of Uploader, introduced simple multithread tests for CloudWatchLogsService

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed issues with locking in try block and added final multithreaded tests to the CloudWatchLogsService class

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added CloudWatchLogsMetricsTest, changed upper back off time bound and scale, and refactoring changes for better code syntax (renaming, refactoring methods for conciseness, etc...)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added changes to javadoc

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Update data-prepper-plugins/cloudwatch-logs/src/main/java/org/opensearch/dataprepper/plugins/sink/client/CloudWatchLogsDispatcher.java

Co-authored-by: Mark Kuhn <kuhnmar@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>

* Fixed comment on CloudWatchLogsDispatcher

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

---------

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>
Co-authored-by: Taylor Gray <tylgry@amazon.com>
Co-authored-by: Marcos <alemayed@amazon.com>
Co-authored-by: Mark Kuhn <kuhnmar@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants