Overview

This repository contains Terraform resource definitions to re-create our key findings for writing structured logs from SSM run commands and ingesting them into HoneyComb.

Our first finding was a performance delay when when we enabled CloudWatchOutput on SSM Run Commands. The second finding was a performance issue with importing AWSPowerShell modules that was masked by our default import strategy.

If you're going to run any of the experiments described in this repository, you will need both Terraform v12 and the AWS CLI installed.

AWS Infrastructure

There are four Terraform modules to make it easy to create the infrastructure for these experiments. Broadly, we'll create the following infrastructure:

CloudWatch and Lambda infrastructure to send structured logs to HoneyComb
An EC2 instance to run our experiments against
SSM documents to demonstrate provisioning orchestration
SSM documents to demonstrate AWSPowerShell module performance issues

After creating the infrastructure, you should have outputs available for the document_name, instance_id, and cloudwatch_log_group_name. You will use these values if you perform any of the necessary experiments yourself.

Performance Delay due to enabling CloudWatchOutput for SSM run commands

In this example, we execute an SSM document, our orchestrator, that invokes two additional SSM documents. Each of these documents provides minimal configuration necessary to import the bootstrapped function for structured logging.

To demonstrate this for yourself, run the following command after you create the infrastructure defined in the previous section:

aws ssm send-command --document-name $document_name --instance-ids $instanceId \
    --cloud-watch-output-config "{\"CloudWatchOutputEnabled\":$cw_enabled,\"CloudWatchLogGroupName\":\"$cw_log_group_name\"}" \
    --region us-east-1 --query Command.CommandId --output text

From HoneyComb we can see a noticeable gap in time between each of the provisioning steps that cannot be accounted for by the execution time of our SSM document.

In the AWS Console, we don't see a delay between each document, but we do see that each document takes significantly longer than we expect.

From your terminal, set cw_enabled=false and re-run the same command. If we review the execution output again from the AWS Console, we can observe a more instantaneous flow from one document to the next. Unfortunately, we cannot see this same data in HoneyComb since we don't write any structured logs to CloudWatch 😢.

Overall, executing the orchestration document with CloudWatch output enabled completes in ~30 seconds while executing the same document without CloudWatch output enabled completes in 4 seconds. The latter performance benchmark is consistent with processing logic defined in each document.

This performance penalty can be ameliorated by a combination of two changes:

Modify the finishItem function in the bootstrap script to write to a standard file location on the file system

Note: depending on your version of PowerShell you may need to add the following lines to your function:

# PowerShell 5.1 does not support a UTF8 encoding without the byte marker. The consequence of this is
# that when the logs appear in CloudWatch and are processed by the subscription filter (a Lambda), it fails 
# to correctly convert the string to an object
if (-not (Test-Path $traceReplayFileName)) {
  $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
  [System.IO.File]::WriteAllLines($traceReplayFileName, ($compressedJson | Out-String), $Utf8NoBomEncoding)
} else {
  $compressedJson | Out-File  $traceReplayFileName -Append -Encoding UTF8
}

Install the CloudWatch agent on the instance and define the logs section to monitor the standard file location used above

Performance Delay due to `AWSPowerShell` modules

Even after implementing the two fixes from the previous section, there were still mysterious gaps between each span in our trace. We eventually identified that the mysterious gaps were caused by our AWSPowerShell import strategy. PowerShell has three different strategies for importing modules into the session, each with slightly different performance characteristics. I'll call these strategies Lazy, Explicit, and Requires.

The Lazy strategy performs module import the first time a function from the module is invoked. Execute the commands below to generate the performance timings for this strategy (note: the sleep 15s command ensures that the command has time to execute before we get the invocation details). The module import plus function execution time should take approximately 12 seconds.

commandId=$(aws ssm send-command --document-name "demo-lazy-import" --instance-ids $instanceId --region us-east-1 --query Command.CommandId --output text)
sleep 15s
aws ssm get-command-invocation --command-id $commandId  --instance-id $instanceId \
  --query '{ExecutionTime:ExecutionElapsedTime,Output:StandardOutputContent}' --output table

With the Explicit strategy we execute the PowerShell Import-Module command directly before invoking the module function.

commandId=$(aws ssm send-command --document-name "demo-explicit-import" --instance-ids $instanceId --region us-east-1 --query Command.CommandId --output text)
sleep 15s
aws ssm get-command-invocation --command-id $commandId  --instance-id $instanceId \
  --query '{ExecutionTime:ExecutionElapsedTime,Output:StandardOutputContent}' --output table

Now we see module import takes more than 12 seconds to complete and function execution completes in less than 200 milliseconds. In total, these two command timings are consistent with the timings from the Lazy import strategy.

Finally, we can observe the unique commands timings for the Requires strategy by executing the following set of commands:

commandId=$(aws ssm send-command --document-name "demo-require-import" --instance-ids $instanceId --region us-east-1 --query Command.CommandId --output text)
sleep 15s
aws ssm get-command-invocation --command-id $commandId  --instance-id $instanceId \
  --query '{ExecutionTime:ExecutionElapsedTime,Output:StandardOutputContent}' --output table

Curiously, although the document execution time is more than 12 seconds, the function execution time is less than 200 milliseconds. As we've seen from previous timings, that execution time is consistent with timings when the AWSPowerShell module has already been imported into the session. What happened to the remaining execution time?

The PowerShell documentation for the Require Module strategy clarifies that "All #Requires statements are always applied globally, and must be met, before the script can execute (emphasis mine). This is true even when the #Requires statement exists in a logically unreachable part of our code, as is the case in our sample code:

Regardless of where our provisioning timing shows the import penalty for the AWSPowershell module, it's important to note that because each SSM document executes in its own PowerShell document, we incur that penalty each time we import the module.

Conclusion

Writing structured logs to CloudWatch and then ingesting them into HoneyComb made it significantly easier to spot performance bottlenecks that we could not have otherwise found.

If you created any of the infrastructure to run the experiments, make sure to tear the infrastructure down again (unless you like paying for things you don't use—seems like there's a metaphor for unstructured logs somewhere in there).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
terraform		terraform
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

AWS Infrastructure

Performance Delay due to enabling CloudWatchOutput for SSM run commands

Performance Delay due to `AWSPowerShell` modules

Conclusion

About

Releases

Packages

Languages

ericksoen/aws-systems-manager-observability-demo

Folders and files

Latest commit

History

Repository files navigation

Overview

AWS Infrastructure

Performance Delay due to enabling CloudWatchOutput for SSM run commands

Performance Delay due to AWSPowerShell modules

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Performance Delay due to `AWSPowerShell` modules

Packages