New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB streaming binary content #6371
Conversation
This PR addresses #6364, where the user found out about errors in dumping JSON while streaming dynamodb updates to kinesis. Introduced a specific encoder and wrote a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this @giograno ! Added a small question/comment regarding binary data with non-printable characters..
tests/integration/test_dynamodb.py
Outdated
stream_name = get_kinesis_stream_name(table_name) | ||
wait_for_stream_ready(stream_name) | ||
response = dynamodb_client.put_item( | ||
TableName=table_name, Item={"id": {"S": "id1"}, "data": {"B": b"binary_data"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what would happen if we add non-printable binary data here, e.g., "data": {"B": b"\x90"}
. Added a small parity test in this PR which illustrates that this works (i.e., is a valid request) against real AWS.
Could we try adding this binary input data to this test here as well? This would be to ensure that the downstream logic does not fail when running to_str(..)
on non-printable bytes.. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. to_str()
was indeed failing for non-printable binaries. Added a errors="replace"
while calling the decode method.
Co-authored-by: Waldemar Hummer <waldemar.hummer@gmail.com>
also added @alexrashed as reviewer since this relates to ASF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Also, this shouldn't affect any ASF serializer or parser (since it basically only affects client calls and the parsers and serialziers traverse the hierarchy and pre-process individual fields depending on their type).
This PR addresses #6364 where the user reported error messages in the logs while trying to stream the DDB changes to Kinesis. This was due to errors in serializing bytes content.
Introduced a new encoder for binary content (less broad than the
CustomEncoder
we already have) and introduced/refactored some tests.