Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 196607: invalid start byte #17

Closed
DanielEbert opened this issue Feb 17, 2022 · 9 comments

Comments

@DanielEbert
Copy link

Hello,

I want to remotify a function 'my_function' and return a very large object from my_function.
When I do this, I get the following output:

2022-02-17 12:04:47,842 [ERROR bridge.py:522 run()] 'utf-8' codec can't decode byte 0xaa in position 196607: invalid start byte
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/jfx_bridge/bridge.py", line 504, in run
    msg_dict = json.loads(data.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 196607: invalid start byte

In the bridge.py file in function read_size_and_data_from_socket, I think the size of 'data' is stored in 32 bits. Maybe there is a problem if 'my_function' returns an object that is larger than 2**32 bytes (~4 Gigabytes)?

Let me know if you need more info.

@justfoxing
Copy link
Owner

Yeah, there's a 2^32 limit on how much can be sent in a message - I didn't think people would be shipping more than 4 gb (and if you are coming close to this limit, you may want to try to identify alternatives to doing that - ghidra_bridge is definitely not going to perform well under those conditions, in either memory or network speed).

But! I don't think that's the problem here. The error message is having trouble decoding the received JSON, with the decode fail happening at 196607 - nowhere near a 2^32 limit (but still a way big message - at least 191Kb :o ). Additionally, packing the message on the server would have failed - struct.pack("!I", 2**32) detects the overflow and throws an exception - so it would never even sent the message out.

Tracking down invalid unicode always sucks - it'd help to be able to see the code for your "my_function".

@maxeisele
Copy link

Indeed, that is a lot of data that get's serialized to JSON and definitively not the most efficient way. What's actually done is passing edges from the control flow graph of the program. However, the error does only occur once in a while, but is is always byte 0xaa at different positions. If I find a way to reproduce it deterministically, I will let you know.

What's your recommendation on passing larger data back to python? Maybe a pipe or so?

@justfoxing
Copy link
Owner

justfoxing commented Feb 18, 2022 via email

@justfoxing
Copy link
Owner

Additionally, if you want, you could try patching the jfx_bridge/bridge.py on the receiving side to log the message when it hits the decode issue. This could be helpful in tracking down the source of the issue even if you can't reproduce it deterministically.

This would look something like replacing the line msg_dict = json.loads(data.decode("utf-8")) in BridgeReceiverThread.run() with something like the following:

                try:
                    msg_dict = json.loads(data.decode("utf-8"))
                except Exception:
                    with open("bad_message.bin", "wb") as output:
                        output.write(data)
                    raise

@maxeisele
Copy link

I have printed the file, as you suggested. It really contains non-Unicode characters. I have attached a shorted version for you.
shorted_bad_message.txt

@justfoxing
Copy link
Owner

Hah! Just enough for me to see what's going on. Looks like halfway through a message being received, a second message is jumping in. The unicode error is being caused by the \x00\x00\x00\xaa bytes of the second message size, and there's 0xaa bytes of second message JSON before the initial message resumes. I'm going to have to go hunt through the network dispatching code to see why that can happen...

@justfoxing justfoxing transferred this issue from justfoxing/ghidra_bridge Feb 19, 2022
@justfoxing
Copy link
Owner

Transferred this issue over to jfx-bridge (the underlying comms beneath ghidra_bridge), because I'm pretty sure that's where the problem is.

Here's a braindump of what I think has happened - you can skip down to the bottom for how to upgrade and hopefully fix the issue if you want, this is mostly for historical record.

The problem probably lies in the potential for messages being sent across the bridge to become interleaved - it wouldn't happen often, because most socket.send() calls will drop into native and dispatch the message in one hit, but for very large messages, there's the potential for it to only send part of the message before it returns back to python and loops around to send the rest. If there's another thread waiting with a message when that happens, and python decides to swap threads, the first message will be incomplete when the second message's (including its size header) gets put on the wire. Eventually, when control returns to the first thread, it'll finish sending its message, but the damage is already done.

On the receiving end, it'll see the first message's size header and try to read that many bytes - which will include reading the second message's size header and data, and lose some of the end of the first message. When this gets fed into a unicode decode it'll probably fail with invalid bytes when it hits the binary size header - even if it didn't somehow, the JSON structure would almost certainly be broken, so the json.loads() would fail in the next step.

I've addressed this by gating all the places where data gets written to the socket through a lock. However, I haven't been able to build a testcase that actually replicates the problem, so it's all a guess as to whether this actually fixes your issue. If you did end up with code that reliably replicated the problem, that'd be nice to have so I could try turning it into a testcase to avoid regressions.

TL;DR - I've released version 0.9.1 of jfx-bridge with a fix that I think might sort the problem. Upgrade with pip install ghidra_bridge --upgrade --force-reinstall to get the latest jfx_bridge component, then re-install the server scripts with python -m ghidra_bridge.install_server <script location>. Note that you'll need to make sure you restart the ghidra_bridge server after re-installing the server scripts, since this bug is most likely triggering on the ghidra-side (look for INFO:jfx_bridge.bridge:serving! (jfx_bridge v0.9.1 in the ghidra console to make sure it's upgraded correctly).

Please let me know if you think it's solved the issue, or if it keeps occurring.

@maxeisele
Copy link

Wow, that was fast. For now, the error has not occurred again, so I guess it is fixed. Thanks a lot!

@justfoxing
Copy link
Owner

Sweet! I'll close this now, but if it does reoccur, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants