New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow invalid pointers in arrays and tuples #226
Conversation
401a381
to
9f99b5d
Compare
915b739
to
24afe35
Compare
90ee2b7
to
f7fcbd8
Compare
2eeaa82
to
bf5cf4e
Compare
03f069c
to
d285844
Compare
a0ad898
to
04939a6
Compare
descriptive comments
04939a6
to
1681383
Compare
@@ -131,6 +132,13 @@ def __call__(self, stream: ContextFramesBytesIO) -> Any: | |||
|
|||
|
|||
class HeadTailDecoder(BaseDecoder): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
@to_tuple # type: ignore[misc] # untyped decorator | ||
def decode(self, stream: ContextFramesBytesIO) -> Generator[Any, None, None]: | ||
self.validate_pointers(stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use more context here. Could this be called in the loop below and maybe allow removal of the inner decoder loops inside validate_pointers
? I'm also curious if the validation is necessary before decoding? Could validation just be part of the decode
in HeadTailDecoder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to know how long the head section of a dynamic tuple will be until you have stepped through each decoder - if the decoder is for a dynamic type, it will be 32 bytes every time (because it's a pointer), but if it's for a non-dynamic array, there will be a single decoder for multiple chunks of 32 bytes.
I think it would be possible to take the logic from validate_pointers
and put it in decode
to eliminate the second loop through the decoders (where it actually checks the pointer values against the end_of_offsets
). I like the current clarity and separation of concerns, but I can try if you like.
The validation needs to be in the tuple and array decoders, because only they have the context for how long they are. A HeadTailDecoder
only has the info for a single dynamic value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now what the difference means, assuming there may never be more than a few decoders at a time I don't have any concerns.
@to_tuple # type: ignore[misc] # untyped decorator | ||
def decode(self, stream: ContextFramesBytesIO) -> Generator[Any, None, None]: | ||
self.validate_pointers(stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now what the difference means, assuming there may never be more than a few decoders at a time I don't have any concerns.
end_of_offsets = current_location + 32 * len_of_head | ||
total_stream_length = len(stream.getbuffer()) | ||
for decoder in self.decoders: | ||
if isinstance(decoder, HeadTailDecoder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It would be nice to share this logic across decoders, maybe this could become a utility function that could take the stream
and an array_size
, which could be called from here using array_size=1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit heard and politely declined. There is enough required difference in how tuples and arrays are checked that any logic extraction have a lot of if tuple/elseif array
. And I don't foresee any future datastructures being created that would make use of such shared base methods, thus accept code that is ~repeated twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice work tracking it down! 🐞 I like the comments you made in the decoder too. Very helpful.
What was wrong?
Incorrect values for in pointers can cause problems. If a pointer value is not large enough, i.e. it points to an area in the payload that is still within the pointers section, the encoding is malformed. In certain situations, ~infinite loops can occur.
How was it fixed?
When decoding pointers, determine the location in the stream that divides pointers and values and make sure all pointers point past that location. Also check for pointers that point beyond the end of the payload.
Added some code comments to make it easier to remember how
HeadTailDecoder
works.Added
pytest-timeout
to dependencies, as if the new tests are run without the added offset checking, they'll spin for a long time before failing.Todo:
Clean up commit history
Clear any breakpoints
clean up testing
Add or update documentation related to these changes
Add entry to the release notes
Cute Animal Picture