You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is not one place to go to to find out all the information about the lifecycle of a compaction job (i.e. when it was created, when it was pulled off the queue, how long it took to run, whether it succeeded, etc). This information is scattered around various logs in Cloudwatch. We should record this information in a Dynamo table.
Suggested design:
Have one DynamoDB table that will be used to record information about the lifecycle of all compaction jobs (for all Sleeper tables, i.e. not one Dynamo table per Sleeper table).
Key design: hash key of compaction job id, sort key of timestamp of update.
Record the following stages of the lifecycle of a compaction job:
Job creation
Job pulled off the queue
Job finish time
Job finish status - success? total number of records read, number written (these 2 are not necessarily the same as an iterator may filter out records), the rate at which records were written).
Note that it is possible that a compaction job may be pulled off the queue twice as SQS does not guarantee that a message will only be delivered once, so maybe each compaction task should get a unique id so that we can separate the updates from the different tasks?
We could also record the lifecycle of compaction ECS tasks - creation time, total run time, total number of records processed, etc.
We will need a Java class that can report the status of a particular compaction job (by querying Dynamo for the relevant information), and a script in scripts/utility to make that class easy to use.
We will also need to update the documentation to explain how to use that script.
The text was updated successfully, but these errors were encountered:
#158 Create DynamoDB table for compaction job events #160 Record compaction job created event in DynamoDB #161 Record compaction job processing events in DynamoDB #162 Client to report compaction job status #159 Create DynamoDB table for compaction task events #163 Record compaction task events in DynamoDB
There is not one place to go to to find out all the information about the lifecycle of a compaction job (i.e. when it was created, when it was pulled off the queue, how long it took to run, whether it succeeded, etc). This information is scattered around various logs in Cloudwatch. We should record this information in a Dynamo table.
Suggested design:
Note that it is possible that a compaction job may be pulled off the queue twice as SQS does not guarantee that a message will only be delivered once, so maybe each compaction task should get a unique id so that we can separate the updates from the different tasks?
We could also record the lifecycle of compaction ECS tasks - creation time, total run time, total number of records processed, etc.
We will need a Java class that can report the status of a particular compaction job (by querying Dynamo for the relevant information), and a script in scripts/utility to make that class easy to use.
We will also need to update the documentation to explain how to use that script.
The text was updated successfully, but these errors were encountered: