Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 source - allow cross region access #4470

Open
brianmaresca opened this issue Apr 26, 2024 · 2 comments
Open

S3 source - allow cross region access #4470

brianmaresca opened this issue Apr 26, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@brianmaresca
Copy link

brianmaresca commented Apr 26, 2024

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. It would be nice to have [...]
Currently, I do not see any way to have a single pipeline consume an s3 source (with sqs) for s3 buckets that are in different regions. It would be nice to have this ability.

Example scenario:

  • two s3 buckets, one in us-west-2 and us-east-1
  • each bucket has event notifications configured with an sns topic in their respective regions
  • a single sqs queue in us-east-1 that is subscribed to both topics above (one topic in us-east-1, another in us-west-2)
  • configuration yaml:
       source:
         s3:
           notification_type: "sqs"
           codec:
             newline:
           sqs:
             queue_url:   "https://sqs.us-east-1.amazonaws.com/123456789012/asdf"
           bucket_owners:
             my-bucket-in-us-west-2: 210987654321
             my-bucket-in-us-east-1: 123456789012
           aws:
             sts_role_arn: "arn:aws:iam::123456789012:role/asdf"
             region: us-east-1
    
         
    

With the above configuration, everything goes smoothly for us-east-1. However, the pipeline fails to get objects from the us-west-2 bucket because the s3 client is configured for us-east-1. The (not very informative) error log is: [s3-source-sqs-1] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: null (Service: S3, Status Code: 400, Request ID: xxxx, Extended Request ID: xxxx)

Describe the solution you'd like
Enable (or the option to enable) cross region access on the S3 client so it is able to download objects from buckets in regions other than the one defined in the yaml config. See https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/s3-cross-region.html.

potential solution, add .crossRegionAccessEnabled() to createS3Client in S3ClientBuilderFactory:

    public S3Client createS3Client() {
        LOG.info("Creating S3 client");
            return S3Client.builder()
                .crossRegionAccessEnabled(true)
                .region(s3SourceConfig.getAwsAuthenticationOptions().getAwsRegion())
                .credentialsProvider(credentialsProvider)
                    .overrideConfiguration(ClientOverrideConfiguration.builder()
                            .retryPolicy(retryPolicy -> retryPolicy.numRetries(5).build())
                            .build())
                    .build();
    }

Describe alternatives you've considered (Optional)
Using a pipeline and sqs queue for each bucket that is in a different region. But this feels silly - extra sqs queue, pipeline, and duplicated configuration.

@dlvenable dlvenable added enhancement New feature or request and removed untriaged labels Apr 30, 2024
@dlvenable
Copy link
Member

@brianmaresca , Thank you for creating this detailed issue. It seems you are familiar with the solution. Would you be interested in creating a PR contribution for it?

@dlvenable dlvenable self-assigned this Sep 6, 2024
@dlvenable
Copy link
Member

I'm interested in solving this by using the region information from the SQS queue. This can allow us to avoid two calls to the S3 API to load the data.

Additionally, we can perform STS authentication for the desired region.

Currently we use the region defined in the region property. e.g.

source:
  s3:
    aws:
      sts_role_arn: arn:aws:iam::123456789012:role/MyRole
      region: us-east-1

We can use the STS region for the target S3 bucket instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants