Empty parameters file in S3 raises a InvalidRange exception #603

ypsah · 2021-05-26T21:56:40Z

Hi,

When passing S3 URIs to papermills, the S3Handler will always try to access the byte range bytes=0-. If the targeted object is empty though, this will result in the S3 client raising an exception.

At first, I thought it might be a bug in the S3 client, but it turns out it just implements RFC2616#14.35.1 to the letter:

If a syntactically valid byte-range-set includes at least one byte-range-spec whose first-byte-pos is less than the current length of the entity-body, or at least one suffix-byte-range-spec with a non-zero suffix-length, then the byte-range-set is satisfiable. Otherwise, the byte-range-set is unsatisfiable.

For an empty file, 0 (first-bytes-pos) is equal to the length of the object, hence the range is "unsatisfiable", and botocore correctly handles it:

If the byte-range-set is unsatisfiable, the server SHOULD return a response with a status of 416 (Requested range not satisfiable).

A possible fix would be to handle empty objects in a special way.

The affected code:

papermill/papermill/s3.py

Lines 292 to 307 in a2c6a0b

    
           size = 0 
        
           bytes_read = 0 
        
           err = None 
        
           undecoded = '' 
        
           if key: 
        
               # try to read the file multiple times 
        
               for i in range(100): 
        
                   obj = self.s3.Object(key.bucket.name, key.name) 
        
                   buffersize = buffersize if buffersize is not None else 2 ** 20 
        
                   if not size: 
        
                       size = obj.content_length 
        
                   elif size != obj.content_length: 
        
                       raise AwsError('key size unexpectedly changed while reading') 
        
                   r = obj.get(Range="bytes={}-".format(bytes_read))

My guess is that this patch should be enough to make the exception go away and still have everything else working:

diff --git a/papermill/s3.py b/papermill/s3.py
index 7ce3f20..6311e7e 100644
--- a/papermill/s3.py
+++ b/papermill/s3.py
@@ -304,6 +304,9 @@ class S3(object):
                 elif size != obj.content_length:
                     raise AwsError('key size unexpectedly changed while reading')

+                if size == 0:
+                    return
+
                 r = obj.get(Range="bytes={}-".format(bytes_read))

                 try:

Cheers!

The text was updated successfully, but these errors were encountered:

willingc · 2021-06-06T16:18:04Z

Thanks @ypsah

@MSeal Thoughts on the suggested fix?

MSeal · 2021-06-06T18:10:42Z

Hmm, it's been a long time since I read that code -- it's a direct port from some heavily used Netflix internal s3 code. I think a break statement there rather than a return makes sense though. Overall if you're trying to read in an empty file it should / will error later on as it won't be a valid ipynb file but it will give a better message I believe in that case. I'll make a quick patch for this.

willingc added bug needs:testing Needs testing to reproduce labels Jun 6, 2021

MSeal mentioned this issue Jun 6, 2021

Updated file read to not fail early with a boto empty file exception #614

Merged

willingc closed this as completed in #614 Jun 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty parameters file in S3 raises a InvalidRange exception #603

Empty parameters file in S3 raises a InvalidRange exception #603

ypsah commented May 26, 2021 •

edited

willingc commented Jun 6, 2021 •

edited

MSeal commented Jun 6, 2021

Empty parameters file in S3 raises a InvalidRange exception #603

Empty parameters file in S3 raises a InvalidRange exception #603

Comments

ypsah commented May 26, 2021 • edited

willingc commented Jun 6, 2021 • edited

MSeal commented Jun 6, 2021

ypsah commented May 26, 2021 •

edited

willingc commented Jun 6, 2021 •

edited