S3 Append
provides AppendObjectAsync extension method for .NET AWS SDK S3 client, capable of appending data to existing S3 hosted objects.
Since AWS S3 is not a block storage system, content of persisted objects could not be altered in situ. One would have to download, update and upload (override) the object again but such approach suffers of multiple problems (network throughput dependency, high memory requirements, etc.) that make it practically unusable. For this reason, S3 Append
implementation relies on AWS S3 multipart copy internally to avoid a need for data to be downloaded first:
- Get to-be-updated object's metadata to determine its size
- Create multipart upload through which data is to be copied/uploaded
- Copy existing data on the server side
- Upload new data from client side
- Complete multipart upload, overriding original object
Thanks to high degree of parallelism and almost unbounded network bandwidth, AWS S3 copy operations are considerably faster than naive download-update-upload approach. Moreover, internal AWS data transfers are free of charge, making the proposed solution a no brainer in situation where client logic resides outside of AWS cloud. Still, cost is the key aspect to be considered as at least five AWS S3 requests must be issued for every and each AppendObjectAsync
method call (see Implementation for details).
Note that single
UploadPartCopy
operation could only copy up to 5 GiB of data. That said, append to an object with size of 5 TB would result in (at least) 1004 requests issued byS3 Append
logic.
When imported into scope, AppendObjectAsync
could be used in in a straightforward fashion. Consider S3 bucket 109a6d191b67
hosting fa5ec9042bc3
object with plain text content Hello
. Following code would, when executed,
using Amazon.S3;
using S3Append.Extensions;
using S3Append.Models;
using System.Threading.Tasks;
public static async Task Main(string[] args)
{
var client = new AmazonS3Client();
var request = new AppendObjectRequest
{
BucketName = "109a6d191b67",
Key = "fa5ec9042bc3",
ContentBody = " world!"
};
await client.AppendObjectAsync(request);
}
result in (the same) fa5ec9042bc3
object containing proverbial Hello world!
.
Server side copy will always has superior performance to client mediated copy. Nonetheless, even AWS's network is subject to the laws of physics so at some point, for objects which are gigabytes in size, copy operation could become unsatisfactorily slow. Problem could be partially mitigated by decreasing copy part size (and, thus, increasing copy parallelism) which is 5 GiB by default:
var request = new AppendObjectRequest
{
BucketName = "109a6d191b67",
Key = "fa5ec9042bc3",
ContentBody = " world!",
PartMaxBytes = (long) Math.Pow(2, 27) // aka 128 MiB
};
Keep in mind, however, that the smaller the copy part size, the more requests generated and, consequently, the more costs associated with respective data append operation.