Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files will always be re-uploaded #59

Open
FraserThompson opened this issue Apr 26, 2019 · 2 comments · May be fixed by #460
Open

Large files will always be re-uploaded #59

FraserThompson opened this issue Apr 26, 2019 · 2 comments · May be fixed by #460
Labels
bug Something isn't working priority: medium
Milestone

Comments

@FraserThompson
Copy link
Contributor

Files bigger than some amount of megabytes are uploaded via multipart upload by default for network resiliency, and for files uploaded via multipart upload the etag is not the md5 of the file. So this means the current method of comparing the ETag in S3 to the md5 of the local file is not a good method for verifying whether it has changed.

A potential solution could be to just compare filenames since as far as I can tell Gatsby already outputs files with the md5 sum in the filename? Another solution could be to add a custom Tag to each file uploaded containing the md5 sum and compare this rather than the ETag.

@YoshiWalsh
Copy link
Collaborator

Good find! It's possible to determine the ETag for a multipart upload. But since this is undocumented by Amazon, we really shouldn't rely on it. I like your idea of adding a custom tag with our own hash. Maybe we can move to a better hashing algorithm like HMAC-SHA1 at the same time.

If we do this, it will require the GetObjectTagging and PutObjectTagging permissions. #39 will need to include these.

@FraserThompson
Copy link
Contributor Author

I forgot to update this, but disabling multipart upload works as a workaround. The S3.ManagedUpload accepts a partSize parameter which dictates the smallest part size for multipart upload (as documented here). By default it's 5mb. I'm setting it to a size bigger than any of the files in my site, meaning multipart upload is never used and the ETag is always the MD5.

I'm not sure if this is a good production solution (presumably there's some advantage to multipart uploading?) but it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: medium
Projects
None yet
3 participants