Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42568: Handle zero-byte files in S3 presigned URLs #78

Merged
merged 1 commit into from Jan 19, 2024

Conversation

dhirving
Copy link
Contributor

@dhirving dhirving commented Jan 19, 2024

Fix an issue for presigned S3 HTTP URLs with a zero byte length where HttpResourcePath.exists() would always return False, and HttpResourcePath.size() would throw an exception.

Because we emulate HEAD using a 1-byte GET request when checking S3 presigned HTTP URLs, the server returns 416 status instead of 206 because the range is longer than the file.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e6e6904) 86.45% compared to head (fd452ff) 86.52%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #78      +/-   ##
==========================================
+ Coverage   86.45%   86.52%   +0.07%     
==========================================
  Files          27       27              
  Lines        4193     4201       +8     
  Branches      848      850       +2     
==========================================
+ Hits         3625     3635      +10     
+ Misses        447      445       -2     
  Partials      121      121              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Fix an issue for presigned S3 HTTP URLs with a zero byte length where HttpResourcePath.exists() would always return False, and HttpResourcePath.size() would throw an exception.

Because we emulate HEAD using a 1-byte GET request when checking S3 presigned HTTP URLs, the server returns 416 status instead of 206 because the range is longer than the file.
# byte file since we asked for 1 byte Range which is longer
# than the file.
#
# Servers are supposed to include a Content-Range header in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say in passing that we have in the past managed to get Google to fix their S3 server implementation.

What happens if you have a GCS signed URL? Same problem or does it really matter whether the URI was s3 or gs originally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure -- I don't have a convenient way to generate signed GCS URLs.

The 416 thing isn't Google-specific and is standards-compliant behavior, btw, so this change isn't only necessary for Google compatibility. I just failed to consider what would happen with 0-byte files when I did the initial implementation.

The part that is non-compliant is that it doesn't also return a Content-Range including the size (specified as a SHOULD in the HTTP RFC). Unless something else really weird and non-compliant is happening that size will always be 0 so I don't think it's especially important.

(If the server doesn't want to or can't deal with the Range at all it's supposed to return 200 with the full content instead, which will result in correct behavior since we go down the branch that looks at Content-Length. Slightly rude to the server but probably mostly theoretical -- implementations of S3 will almost certainly support Range requests.)

It looks like the HttpResourceHandle also assumes it's at EOF and bails early without trying to look at Content-Range if it gets a 416, so I don't think it's a problem.

@dhirving dhirving merged commit 4a90131 into main Jan 19, 2024
17 checks passed
@dhirving dhirving deleted the tickets/DM-42568 branch January 19, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants