fix object mgr _fetch_chunker() to use chunk_size appropriately #449

aztlan2k · 2014-08-19T12:45:45Z

for your consideration...

the get request was using the chunk_size incorrectly and therefore
causing data corruption on fetched objects.
- request was ending up retrieving chunk_size+1 bytes (0 -> chunk_size; inclusive)
- the next request would start at chunk_size, thus overlapping
  the previous chunk by 1 byte. (0-4096, 4096-8192, etc...)
  [last byte (4096) of previous read was re-read as the first byte
  on next fetch]

Here's what my trace of a fetch_object(container, obj, chunk_size=4096) call (within _fetch_chunker()):

pyrax: chunk_size = 4096
pyrax: obj_size = 102400
pyrax: headers = {'Range': 'bytes=0-4096'}
CF: len(chunk) = 4097
CF: chunk[0] = 05 ; chunk[-1] = 68
pyrax: headers = {'Range': 'bytes=4096-8192'}
CF: len(chunk) = 4097
CF: chunk[0] = 68 ; chunk[-1] = 76
pyrax: headers = {'Range': 'bytes=8192-12288'}
...

notice that even though we asked for chunks of 4096 we're actually retrieving 4097 bytes. And, each subsequent read starts at the last byte of the previous read. This causes data corruption with bytes being duplicated. Below is a hexdump of the two files i used to compare. On the left is the original. On the right is the copy that's be uploaded and then downloaded once again using fetch_object():

  0000fe0 1d 8d 59 6a 3e 39 b0 7f d0 c9 b2 72 3c 23 51 70               |0000fe0 1d 8d 59 6a 3e 39 b0 7f d0 c9 b2 72 3c 23 51 70
  0000ff0 25 08 fc c9 33 4f 64 fb 0b cb 2b 00 ec 52 58 fa               |0000ff0 25 08 fc c9 33 4f 64 fb 0b cb 2b 00 ec 52 58 fa
> 0001000 68 a5 7c 45 a7 a9 a2 ed 94 b5 25 78 da f8 f4 e2               |0001000 68 68 a5 7c 45 a7 a9 a2 ed 94 b5 25 78 da f8 f4
  0001010 85 fd d8 b1 40 f4 34 04 8d 23 f6 fa 57 8e 7b cb               |0001010 e2 85 fd d8 b1 40 f4 34 04 8d 23 f6 fa 57 8e 7b
  0001020 ed 2b db ac 25 4d 54 53 0b 1d 6f 6b 24 ee 1d cf               |0001020 cb ed 2b db ac 25 4d 54 53 0b 1d 6f 6b 24 ee 1d

Note that on byte 0x1000 (byte 4096) we see the byte value 68 repeated and then everything shifts over by a single byte.

- the get request was using the chunk_size incorrectly and therefore causing data corruption on fetched objects. - request was ending up retrieving chunk_size+1 bytes (0 - chunk_size) - the next request would start at chunk_size, thus overlapping the previous chunk by 1 byte. (0-4096, 4096-8192, etc...) [last byte (4096) of previous read was re-read as the first byte on next fetch]

coveralls · 2014-08-19T12:48:47Z

Coverage remained the same when pulling 0bf284c on ustudio:fix-fetch-chunk-size-use into ec839d1 on rackspace:working.

fix object mgr _fetch_chunker() to use chunk_size appropriately

EdLeafe · 2014-08-19T13:09:25Z

Nice catch - thanks!

aztlan2k · 2014-08-19T13:12:38Z

Thanx Ed.

aztlan2k · 2014-08-20T19:19:01Z

@EdLeafe Hi Ed,

Considering this was a data corruption bug... can you give me any idea when you think this will actually make it out in an official release?

We're (uStudio) trying to determine how to move forward with regards to this bug and knowing that could potentially influence what stop-gap measure we decide to take until it's out in the open.

Thanx.

EdLeafe · 2014-08-20T21:07:10Z

Well, I'm still managing pyrax releases for the next couple of days, so I'll try to get this out tonight.

aztlan2k · 2014-08-20T21:27:33Z

Thanx Ed. That's good to hear.

Are you no longer going to be managing the pyrax releases? You've been there since the beginning, haven't you? ... well, hope it's on to exciting new areas. 👍

EdLeafe · 2014-08-20T21:39:46Z

Yeah, this is my last week at Rackspace – I'm moving to IBM to hack on OpenStack. @briancurtin will be managing pyrax going forward.

EdLeafe · 2014-08-20T23:04:11Z

OK, I just released 1.9.2, which contains your fix. Thanks again for your contribution!

aztlan2k · 2014-08-21T05:01:06Z

Thanx again Ed. Not only for this but for all your previous contributions as well.

And best of luck on your next endeavor!

EdLeafe added a commit that referenced this pull request Aug 19, 2014

Merge pull request #449 from ustudio/fix-fetch-chunk-size-use

898e105

fix object mgr _fetch_chunker() to use chunk_size appropriately

EdLeafe merged commit 898e105 into pycontribs:working Aug 19, 2014

aztlan2k deleted the fix-fetch-chunk-size-use branch August 22, 2014 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix object mgr _fetch_chunker() to use chunk_size appropriately #449

fix object mgr _fetch_chunker() to use chunk_size appropriately #449

Uh oh!

aztlan2k commented Aug 19, 2014

Uh oh!

coveralls commented Aug 19, 2014

Uh oh!

EdLeafe commented Aug 19, 2014

Uh oh!

aztlan2k commented Aug 19, 2014

Uh oh!

aztlan2k commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

aztlan2k commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

aztlan2k commented Aug 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix object mgr _fetch_chunker() to use chunk_size appropriately #449

fix object mgr _fetch_chunker() to use chunk_size appropriately #449

Uh oh!

Conversation

aztlan2k commented Aug 19, 2014

Uh oh!

coveralls commented Aug 19, 2014

Uh oh!

EdLeafe commented Aug 19, 2014

Uh oh!

aztlan2k commented Aug 19, 2014

Uh oh!

aztlan2k commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

aztlan2k commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

EdLeafe commented Aug 20, 2014

Uh oh!

aztlan2k commented Aug 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants