# Blog Part 2: Optimizing Large Uploads to Amazon S3 using High-Level CLI Commands
## Introduction

Many people prefer using the AWS command line interface (CLI) tool for its simplicity and versatility. CLI tools offer a way to quickly start transferring files into Amazon S3, Amazon's cloud-based object storage service, across a wide range of operating systems with minimal platform restrictions. Coupled with modern high bandwidth internet connections available at many office and home locations, using the CLI can be an attractive choice even for large (multiple Terabyte) data transfers.

This article is part of a three part series on S3 Upload Best Practices:

If you are currently evaluating the best data transfer method to Amazon S3, please take a look at this blog article comparing three of the most popular choices for large data transfers.
If you are looking to optimize the performance of your large file upload to S3 using the AWS CLI please read on.
If you are having trouble with timeouts, network consistency, or incomplete uploads, then use the low-level aws s3api commands for more control over partial retries of an inconsistent multi-part upload according to this blog article.

## Getting Started with the AWS CLI

In this article we will discuss the best practices of optimizing large data transfers using the AWS CLI.
Use this link to get set up with the AWS CLI and please see this link to authenticate with IAM user credentials using the CLI command:

In [None]:
!aws configure

AWS Access Key ID [****************NO7X]: 

## Initial File Integrity Checksum
Before you upload the file(s), calculate a file’s sha256 checksum value using a utility like openssl or shasum with one of the following commands.

In [None]:
!openssl sha256 path/large_file.txt

In [None]:
or

In [None]:
!shasum -a 256 path/large_file.txt | cut -f1 -d' ' | xxd -r -p | base64

We will compare the initial checksum with the uploaded files checksum within S3 to validate the file integrity.

## Test Transfer
Copy a single file to s3 using a simple, high-level aws s3 transfer command for multi-part uploads:

In [None]:
!aws s3 cp large_file.txt s3://mybucket/

This example uses the command aws s3 cp to automatically perform a multipart upload when the object is large. You can also use other aws s3 commands that involve uploading objects into an S3 bucket. For example, aws s3 sync or aws s3 mv.

Note: The AWS CLI will calculate and auto-populate the Content-MD5 header for both standard and multipart uploads. If the checksum that S3 calculates does not match the Content-MD5 provided, S3 will not store the object and instead will return an error message back the AWS CLI. For the purposes of this blog article, we will manually compare the file integrity for demonstration purposes. For more information please see this link.

When you run the above command you will see performance metrics while the data transfer is underway:
(Fill in with screenshot or copy paste from terminal).

## Optimization 1: AWS S3 CLI Configuration
The following default config values apply to aws s3 transfer commands:

In [None]:
s3 =
  max_concurrent_requests = 10
  max_queue_size = 1000
  multipart_threshold = 8MB
  multipart_chunksize = 8MB
  max_bandwidth = None
  use_accelerate_endpoint =false
  addressing_style = auto

Depending on your data and local network, we can adjust these values to maximize transfer speed using the aws configure set command. Let's increase these values using the below commands to compare performance.

In [None]:
!aws configure set default.s3.max_concurrent_requests 20
!aws configure set default.s3.max_queue_size 10000
!aws configure set default.s3.multipart_threshold 64MB
!aws configure set default.s3.multipart_chunksize 16MB
!aws configure set default.s3.max_bandwidth 100GB/s
!aws configure set default.s3.payload_signing_enabled True

You can run cat ~/.aws/config to verify the modified configuration settings:

In [None]:
!cat ~/.aws/config

(Fill in with screenshot or copy paste from terminal).

Feel free to further customize the above variables to maximize transfer speed for your unique environment. For more information, see the aws s3 transfer documentation.

If we rerun the earlier aws cp command:

In [None]:
!aws s3 cp large_file.txt s3://mybucket/

We see that the transfer performance has improved. This is a good start but we can still do some further optimization.

(Fill in with screenshot or copy paste from terminal).

## Optimization 2: S3 Acceleration
S3 Transfer Acceleration (S3TA) reduces the variability in Internet routing, congestion and speeds that can affect transfers, and logically shortens the distance to S3 for remote applications. S3TA improves transfer performance by routing traffic through Amazon CloudFront’s globally distributed Edge Locations and over AWS backbone networks, and by using network protocol optimizations.

To get a good idea of how much network performance can be optimized between your geographic location and various AWS regions, please use this speed comparison tool.

Once you have decided on the region for the S3 bucket to be hosted in (remember Amazon S3 has a global namespace but is regionally hosted), then we will need to configure your bucket for S3 Transfer Acceleration.

Enabling Transfer Acceleration on a bucket
The following sample command sets Status=Enabled to enable Transfer Acceleration on a bucket:

In [None]:
!aws s3api put-bucket-accelerate-configuration --bucket bucketname --accelerate-configuration Status=Enabled

In [None]:
Using Transfer Acceleration
In order to direct all future aws s3 and aws s3api requests to the accelerate endpoint, we can set the default configuration variable with:

python
Copy code

In [None]:
!aws configure set default.s3.use_accelerate_endpoint true

Initiate Full Transfer
Now that we have optimized the s3 data transfer for both CLI configuration and S3 Transfer Acceleration we can initiate a full sync of our entire local directory directory into an Amazon S3 bucket directory using the command:

In [None]:
!aws s3 sync . s3://mybucket/

With the sync command, a local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix. In this example, the user syncs the local current directory to the bucket mybucket.

## Optimization 3: Parallelism
If you still have remaining internet bandwidth that is not being utilized even with the above optimizations, then your workload may be getting bottle necked on the hardware or application level of your source computer. This is where you can introduce parallelism to further enhance your data transfer by considering separating your data storage to multiple drives and running multiple instances or windows of your CLI tool.

(More guidance on separating out directories) (especially for low tech)

## Final File Integrity Checksum in S3
Now that the file(s) have been uploaded to S3, you can validate the file integrity using the S3 provided ETag and compare it with the initial sha256 checksum value. This can be done using the aws s3api head-object command.

In [None]:
!aws s3api head-object --bucket mybucket --key large_file.txt

Now compare the ETag value returned from this command with the initial checksum value that we calculated.

## Conclusion
Transferring large amounts of data to Amazon S3 can be optimized in several ways. We discussed the CLI configuration, S3 Acceleration, parallelism and others to enhance your data transfer performance. The key is to identify the bottlenecks in your transfer and apply the right optimization techniques accordingly. Always remember to check file integrity post upload. Stay tuned for the third part of this series for a deep dive into troubleshooting S3 uploads.

## Additional Resources
AWS CLI documentation
AWS S3 Transfer Acceleration
Best Practices for Migrating Your Data to Amazon S3
Amazon S3 Data Transfer Acceleration Speed Comparison