The goal of the project is to transfer DynamoDB records to an SQS queue, utilizing the AWS SDK for Javascript in Node.js.
Key features for the script include:
In order to use the script, the AWS CLI needs to be configured with these attributes for validation using the aws configure
command:
- Region
- Access Key
- Secret Key
For a more in-depth guide, follow the instructions here.
Also, the program works under the assumption the table will not have any updates during the process. Modifying any record in the DynamoDB table would lead to incorrect scans (breaking segments or incorrect reading data).
node index.js <path/to/input>.json
The program takes a file path of a JSON file defining four parameters:
Parameter | Type | Description |
---|---|---|
table | String | AWS Table Name |
segments | Number | Number of Segments to divide the Table into |
queue | String | AWS Queue Name |
dlArn | String | Dead Letter Queue ARN |
For an example input, look at example-input.json
.
One of the key design considerations was credential validation, especially since the script works with Worker Threads. To manage this, the top of the script begins with AWS.config.getCredentials
that prints out the thread segment number, guaranteeing that each thread is validated.
Scanning a table, especially tables with millions of records, can take a tremendous amount of time. As a solution, the SDK allows for parallel scanning.
In the script, the total number of segments is taken from the argument segments
. The main thread then created a new Worker for each segment, using the segment number as an unique ID. Each thread will scan the assigned segment of the table while completely isolated from the other segments.
The main thread is the entry point of the script (determined by isMainThread
), and does three tasks:
- Read the input file from the command line
- Start Worker Threads based on arguments
- Log the beginning and end of the process
The Worker threads are wrapped by an async IIFE and receive arguments through workerData
. The three main components of the Worker Threads are DynamoDB, SQS, and the Scanning/Messaging Loop.
DynamoDB is a highly efficient NoSQL Database service running on AWS. In this case, Records are scanned from a specified table in DynamoDB via the table
and segment
arguments.
SQS is a Message Queue service provided by AWS for high throughput applications. In this case, Records from DynamoDB are individually sent to a specified queue or corresponding dead-letter queue via the queue
and dlArn
arguments.
sendMessageBatch
was not used since it requires that each message have an Id
. To assign an Id
to each message would require creating new objects for each record, taking up both unnecessary time and space (especially with high volume scans).
The loop holds all of the business logic for the script and has this basic workflow:
ddbParams
holds the parameters for scanning the DynamoDB Table- Scan the Table to get back
Items
andLastEvaluatedKey
- Update the
ExclusiveStartKey
parameter inddbParams
toLastEvaluatedKey
- For each item in
Items
, send the record as a message to thequeue
URL - If the
ExclusiveStartKey
is not null, then go to step 1
The condition of the loop depends on ExclusiveStartKey
: if the ExclusiveStartKey
is empty, then there are no more records to scan from the Table.
Also, if there is an error in scanning, the ExclusiveStartKey
would stay the same, attempting to scan the table using the same ExclusiveStartKey
from the last iteration.
The project spanned a lot of concepts, particularly using AWS for the first time. Here are some of my takeaways:
- Keeping both main and workerthread code in one file is easier than spreading code among different files, especially for simple scripts like this one
- Taking inputs as JSON is supremely easier than managing command line arguments
- Using an async IIFE replaces top level async/await
- Many AWS functions can be "promisfied" using
.promise
, making them compatible with async/await - Coming up with a solution makes it easier to think of which AWS services to use, #workingBackwards
The script is a lightweight proof-of-concept inspired by my internship at Amazon, but is NOT written during company time and is NOT copied from any Amazon intellectual property.