Skip to content

Conversation

@aztlan2k
Copy link
Contributor

for your consideration...

  • the get request was using the chunk_size incorrectly and therefore
    causing data corruption on fetched objects.
    • request was ending up retrieving chunk_size+1 bytes (0 -> chunk_size; inclusive)
    • the next request would start at chunk_size, thus overlapping
      the previous chunk by 1 byte. (0-4096, 4096-8192, etc...)
      [last byte (4096) of previous read was re-read as the first byte
      on next fetch]

Here's what my trace of a fetch_object(container, obj, chunk_size=4096) call (within _fetch_chunker()):

pyrax: chunk_size = 4096
pyrax: obj_size = 102400
pyrax: headers = {'Range': 'bytes=0-4096'}
CF: len(chunk) = 4097
CF: chunk[0] = 05 ; chunk[-1] = 68
pyrax: headers = {'Range': 'bytes=4096-8192'}
CF: len(chunk) = 4097
CF: chunk[0] = 68 ; chunk[-1] = 76
pyrax: headers = {'Range': 'bytes=8192-12288'}
...

notice that even though we asked for chunks of 4096 we're actually retrieving 4097 bytes. And, each subsequent read starts at the last byte of the previous read. This causes data corruption with bytes being duplicated. Below is a hexdump of the two files i used to compare. On the left is the original. On the right is the copy that's be uploaded and then downloaded once again using fetch_object():

  0000fe0 1d 8d 59 6a 3e 39 b0 7f d0 c9 b2 72 3c 23 51 70               |0000fe0 1d 8d 59 6a 3e 39 b0 7f d0 c9 b2 72 3c 23 51 70
  0000ff0 25 08 fc c9 33 4f 64 fb 0b cb 2b 00 ec 52 58 fa               |0000ff0 25 08 fc c9 33 4f 64 fb 0b cb 2b 00 ec 52 58 fa
> 0001000 68 a5 7c 45 a7 a9 a2 ed 94 b5 25 78 da f8 f4 e2               |0001000 68 68 a5 7c 45 a7 a9 a2 ed 94 b5 25 78 da f8 f4
  0001010 85 fd d8 b1 40 f4 34 04 8d 23 f6 fa 57 8e 7b cb               |0001010 e2 85 fd d8 b1 40 f4 34 04 8d 23 f6 fa 57 8e 7b
  0001020 ed 2b db ac 25 4d 54 53 0b 1d 6f 6b 24 ee 1d cf               |0001020 cb ed 2b db ac 25 4d 54 53 0b 1d 6f 6b 24 ee 1d

Note that on byte 0x1000 (byte 4096) we see the byte value 68 repeated and then everything shifts over by a single byte.

- the get request was using the chunk_size incorrectly and therefore
causing data corruption on fetched objects.
  - request was ending up retrieving chunk_size+1 bytes (0 - chunk_size)
  - the next request would start at chunk_size, thus overlapping
    the previous chunk by 1 byte. (0-4096, 4096-8192, etc...)
    [last byte (4096) of previous read was re-read as the first byte
    on next fetch]
@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 0bf284c on ustudio:fix-fetch-chunk-size-use into ec839d1 on rackspace:working.

EdLeafe added a commit that referenced this pull request Aug 19, 2014
fix object mgr _fetch_chunker() to use chunk_size appropriately
@EdLeafe EdLeafe merged commit 898e105 into pycontribs:working Aug 19, 2014
@EdLeafe
Copy link
Contributor

EdLeafe commented Aug 19, 2014

Nice catch - thanks!

@aztlan2k
Copy link
Contributor Author

Thanx Ed.

@aztlan2k
Copy link
Contributor Author

@EdLeafe Hi Ed,

Considering this was a data corruption bug... can you give me any idea when you think this will actually make it out in an official release?

We're (uStudio) trying to determine how to move forward with regards to this bug and knowing that could potentially influence what stop-gap measure we decide to take until it's out in the open.

Thanx.

@EdLeafe
Copy link
Contributor

EdLeafe commented Aug 20, 2014

Well, I'm still managing pyrax releases for the next couple of days, so I'll try to get this out tonight.

@aztlan2k
Copy link
Contributor Author

Thanx Ed. That's good to hear.

Are you no longer going to be managing the pyrax releases? You've been there since the beginning, haven't you? ... well, hope it's on to exciting new areas. 👍

@EdLeafe
Copy link
Contributor

EdLeafe commented Aug 20, 2014

Yeah, this is my last week at Rackspace – I'm moving to IBM to hack on OpenStack. @briancurtin will be managing pyrax going forward.

@EdLeafe
Copy link
Contributor

EdLeafe commented Aug 20, 2014

OK, I just released 1.9.2, which contains your fix. Thanks again for your contribution!

@aztlan2k
Copy link
Contributor Author

Thanx again Ed. Not only for this but for all your previous contributions as well.

And best of luck on your next endeavor!

@aztlan2k aztlan2k deleted the fix-fetch-chunk-size-use branch August 22, 2014 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants