Support Full load and CDC from AWS DocumentDB #4534

dinujoh · 2024-05-14T16:24:14Z

Is your feature request related to a problem? Please describe.
I would like to use data-prepper to full load load/export and ingest change data capture events from AWS DocumentDB.

Describe the solution you'd like
Support DocumentDB Source to do a full scan of AWS DocumentDB collection that would export the entire collection data to Opensearch Sink. The DocumentDB Source will also read the DocumentDB stream data and would ingest any change data capture events to Opensearch Sink. For the full load, the source will implement a partition supplier that would partition the collection into multiple query partition and will do scans in parallel.

Describe alternatives you've considered (Optional)
Support Kafka Connect with Debezium mongodb connector plugins

Additional context
Sample DocumentDB source configuration:

documentdb-pipeline:
  source:
    documentdb:
      acknowledgments: true
      host: "<<docdb-2024-01-03-20-31-17.cluster-abcdef.us-east-1.docdb.amazonaws.com>>"
      port: 27017
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
      aws:
        sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
      # If id_key is specified, new key with docdb_id that matches the data from _id will be created
      # id_key: "docdb_id"    
      s3_bucket: "<<bucket-name>>"
      s3_region: "<<bucket-region>>" 
      # optional s3_prefix for Opensearch ingestion to write the temporary data
      s3_prefix: "<<path_prefix>>"
      collections:
        # collection format: <databaseName>.<collectionName>
        - collection: "<<dbname.collection1>>"
          export: true
          stream: true

The text was updated successfully, but these errors were encountered:

dinujoh added the untriaged label May 14, 2024

dinujoh added this to the v2.8 milestone May 14, 2024

dinujoh closed this as completed May 14, 2024

dinujoh added enhancement New feature or request and removed untriaged labels May 14, 2024

dlvenable mentioned this issue May 16, 2024

Release Notes for version 2.8 #4538

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Full load and CDC from AWS DocumentDB #4534

Support Full load and CDC from AWS DocumentDB #4534

dinujoh commented May 14, 2024

Support Full load and CDC from AWS DocumentDB #4534

Support Full load and CDC from AWS DocumentDB #4534

Comments

dinujoh commented May 14, 2024