New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading corruption in Pure Python Unpacker implementation #124
Comments
Note that this affects both 0.4.4 and 0.4.5, I have not tried older versions. |
Which one of the pure and python unpacker gives the correct answer, if one of them does give the correct answer? |
@rbu Thank you for reporting. My email address is |
@methane If it turns out to be a problem with the pure python implementation, than I'm eager to help debug the problem. |
I have emailed them to the two of you. I'm neither an expert on the format nor the Unpacker API's intended behavior, but it looks like the the Python implementation behaves incorrectly. |
When there is not enough data to decode the next object in a sequence of objects, msgpack will raise def _fb_rollback(self):
self._fb_buf_i = 0
self._fb_buf_o = 0 This was ok because after each successful object unpacked, the used space in the buffer is thrown away. This is done by I think the following patch should fix the problem. I will add a testcase for this problem and double check whether my analysis is correct. diff --git a/msgpack/fallback.py b/msgpack/fallback.py
index d1f39d1..0e025c8 100644
--- a/msgpack/fallback.py
+++ b/msgpack/fallback.py
@@ -283,7 +283,7 @@ class Unpacker(object):
def _fb_rollback(self):
self._fb_buf_i = 0
- self._fb_buf_o = 0
+ self._fb_buf_o = self._fb_sloppiness
def _fb_get_extradata(self):
bufs = self._fb_buffers[self._fb_buf_i:] |
@bwesterb, I can confirm that with this patch applied, my reproducers return the expected state in both Pure-Python and C++. It also fixes the Attic crash that caused me to look into this in the first place. Thank you for looking into this and finding a fix so fast. |
When using Unpacker as an iterator, after each yield, the internal buffer (_fb_buffer) was compacted by reallocation (done by _fb_consume). When dealing with a lot of small objects, this is very ineffecient. Thus in commit 7eb371f the pure python fallback only reallocated the complete buffer when the iteration stops. When halfway there happens to be data missing in the buffer, we rollback the buffer to the state before this failed call, and raise an OutOfData. This rollback, done by _fb_rollback, did not consider the possibility that the buffer was *not* reallocated. This commit corrects that.
Rollback to correct position in the case of OutOfData. Fixes #124
6 weeks have gone by and still 0.4.5 on pypi. People using https://attic-backup.org/ heavily use msgpack to create backups... |
To add new problems to what I already said in last comment: Working around this corruption issue by just using msgpack-python 0.4.2 (until a fixed version is released) causes other problems if you want to write safe code (and have to deal with untrusted input data) as the fix for issue #97 is only available in more recent (and thus broken) releases. :( |
After spending several hours debugging an archive corruption in Attic (jborg/attic#185), I came to find:
The Pure Python and C-Optimized implementation of msgpack.Unpacker behave differently when you feed data several times. I have a small (100k) and a large (1.7M) reproducer. Unfortunately, both may contain private data. I can send them to a developer (collaborator) if you keep the files on your machine and do not share it publicly.
To better illustrate what we are dealing with. The small reproducer looks like this:
In Pure-Python, this yields:
In C, this yields:
The large reproducer has this structure:
In Pure-Python, this yields:
In C, this yields:
Note that in the large reproducer, not only does it repeat, in the last iteration it starts returning corrupt values.
The text was updated successfully, but these errors were encountered: