Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty parameters file in S3 raises a InvalidRange exception #603

Closed
ypsah opened this issue May 26, 2021 · 2 comments · Fixed by #614
Closed

Empty parameters file in S3 raises a InvalidRange exception #603

ypsah opened this issue May 26, 2021 · 2 comments · Fixed by #614
Labels
bug needs:testing Needs testing to reproduce

Comments

@ypsah
Copy link

ypsah commented May 26, 2021

Hi,

When passing S3 URIs to papermills, the S3Handler will always try to access the byte range bytes=0-. If the targeted object is empty though, this will result in the S3 client raising an exception.

At first, I thought it might be a bug in the S3 client, but it turns out it just implements RFC2616#14.35.1 to the letter:

If a syntactically valid byte-range-set includes at least one byte-range-spec whose first-byte-pos is less than the current length of the entity-body, or at least one suffix-byte-range-spec with a non-zero suffix-length, then the byte-range-set is satisfiable. Otherwise, the byte-range-set is unsatisfiable.

For an empty file, 0 (first-bytes-pos) is equal to the length of the object, hence the range is "unsatisfiable", and botocore correctly handles it:

If the byte-range-set is unsatisfiable, the server SHOULD return a response with a status of 416 (Requested range not satisfiable).

A possible fix would be to handle empty objects in a special way.

The affected code:

papermill/papermill/s3.py

Lines 292 to 307 in a2c6a0b

size = 0
bytes_read = 0
err = None
undecoded = ''
if key:
# try to read the file multiple times
for i in range(100):
obj = self.s3.Object(key.bucket.name, key.name)
buffersize = buffersize if buffersize is not None else 2 ** 20
if not size:
size = obj.content_length
elif size != obj.content_length:
raise AwsError('key size unexpectedly changed while reading')
r = obj.get(Range="bytes={}-".format(bytes_read))

My guess is that this patch should be enough to make the exception go away and still have everything else working:

diff --git a/papermill/s3.py b/papermill/s3.py
index 7ce3f20..6311e7e 100644
--- a/papermill/s3.py
+++ b/papermill/s3.py
@@ -304,6 +304,9 @@ class S3(object):
                 elif size != obj.content_length:
                     raise AwsError('key size unexpectedly changed while reading')

+                if size == 0:
+                    return
+
                 r = obj.get(Range="bytes={}-".format(bytes_read))

                 try:

Cheers!

@willingc willingc added bug needs:testing Needs testing to reproduce labels Jun 6, 2021
@willingc
Copy link
Member

willingc commented Jun 6, 2021

Thanks @ypsah

@MSeal Thoughts on the suggested fix?

@MSeal
Copy link
Member

MSeal commented Jun 6, 2021

Hmm, it's been a long time since I read that code -- it's a direct port from some heavily used Netflix internal s3 code. I think a break statement there rather than a return makes sense though. Overall if you're trying to read in an empty file it should / will error later on as it won't be a valid ipynb file but it will give a better message I believe in that case. I'll make a quick patch for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs:testing Needs testing to reproduce
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants