Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Full load and CDC from AWS DocumentDB #4534

Closed
dinujoh opened this issue May 14, 2024 · 0 comments
Closed

Support Full load and CDC from AWS DocumentDB #4534

dinujoh opened this issue May 14, 2024 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@dinujoh
Copy link
Member

dinujoh commented May 14, 2024

Is your feature request related to a problem? Please describe.
I would like to use data-prepper to full load load/export and ingest change data capture events from AWS DocumentDB.

Describe the solution you'd like
Support DocumentDB Source to do a full scan of AWS DocumentDB collection that would export the entire collection data to Opensearch Sink. The DocumentDB Source will also read the DocumentDB stream data and would ingest any change data capture events to Opensearch Sink. For the full load, the source will implement a partition supplier that would partition the collection into multiple query partition and will do scans in parallel.

Describe alternatives you've considered (Optional)
Support Kafka Connect with Debezium mongodb connector plugins

Additional context
Sample DocumentDB source configuration:

documentdb-pipeline:
  source:
    documentdb:
      acknowledgments: true
      host: "<<docdb-2024-01-03-20-31-17.cluster-abcdef.us-east-1.docdb.amazonaws.com>>"
      port: 27017
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
      aws:
        sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
      # If id_key is specified, new key with docdb_id that matches the data from _id will be created
      # id_key: "docdb_id"    
      s3_bucket: "<<bucket-name>>"
      s3_region: "<<bucket-region>>" 
      # optional s3_prefix for Opensearch ingestion to write the temporary data
      s3_prefix: "<<path_prefix>>"
      collections:
        # collection format: <databaseName>.<collectionName>
        - collection: "<<dbname.collection1>>"
          export: true
          stream: true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant