Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42522: Implement exists() and size() for S3 presigned URLs #77

Merged
merged 3 commits into from Jan 17, 2024

Conversation

dhirving
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Presigned S3 URLs are signed for a single method only, so you can't call HEAD on a URL signed for GET.  Instead, we now emulate HEAD using a GET with a 0-byte Range header.
Like exists() in the previous commit, size() was relying on a HEAD call that isn't valid for presigned S3 URLs.
Copy link

codecov bot commented Jan 17, 2024

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (a0c84ae) 86.20% compared to head (06c7646) 86.45%.

Files Patch % Lines
python/lsst/resources/http.py 69.56% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
+ Coverage   86.20%   86.45%   +0.24%     
==========================================
  Files          27       27              
  Lines        4126     4193      +67     
  Branches      842      848       +6     
==========================================
+ Hits         3557     3625      +68     
+ Misses        449      447       -2     
- Partials      120      121       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dhirving
Copy link
Contributor Author

There are a few lines with missing coverage which were already missing coverage in the existing code. I manually verified that the uncovered lines still work.

It turns out that HttpResourceHandle was also doing some parsing of the content-range header, so pull out a function that will work for both this and HttpResourcePath.size().
if total != "*" and end_pos >= int(total) - 1:
self._eof = True
else:
self._log.warning("Requested byte range from server but instead got: %s", content_range)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a slight behavior change here. Previously if the units were anything but bytes it would only log a warning. Now it throws an exception. This should never happen in real life, and if it does the returned data is almost certainly wrong so we want to bail anyway.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

prefix = r"^\s*bytes\s+"

# Content-Range: <unit> <range-start>-<range-end>/<size>
if (case1 := re.match(prefix + r"(\d+)-(\d+)/(\d+)", header)) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully three separate regexes won't turn into a hotspot.

@dhirving dhirving merged commit e6e6904 into main Jan 17, 2024
17 checks passed
@dhirving dhirving deleted the tickets/DM-42522 branch January 17, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants