-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 0: #154
Comments
This error means that the data you received on the connection was not a valid UTF-8 byte string. Are you sure the server is sending UTF-8 and not some other character encoding? You can disable this conversion and just operate on raw bytes by setting the encoding to None when you open the SSH session, or you can switch to some other encoding if you know what the server is sending. I can probably make this a bit cleaner by not raising a nested exception here, but the problem appears to be in the data and not in AsyncSSH. |
It's Juniper router. I am pretty sure it's UTF-8. But let me double check. |
It works most of the time, only some routers cause this issues. |
Whether you get the error or not would depend on the specific text being output, so that might explain why you don't always see it. The byte 0xc2 would be a legal (and fairly common) UTF-8 start byte, but what'll matter is whether the byte that follows it is something in the 0x80-0xbf range or not. Can you enable logging and set the debug level to 3 and capture the specific bytes in the MSG_CHANNEL_DATA message coming from the router? To enable the logging, you'd need to add something like: logging.basicConfig()
asyncssh.set_log_level('DEBUG')
asyncssh.set_debug_level(3) The output would then look something like:
|
|
The full MSG_CHANNEL_DATA message is not shown here, but the portion of it you included appears to be raw binary data, not text. If you are fetching binary data over SSH, you definitely need to set the 'encoding' parameter to None and have your application code expect data of type bytes rather than str. |
the whole message is huge. But let me set the encoding to none. Thank you for your help. |
Sorry for the inconvenience. But I Don't see anything special is being requested. It's just opening the session and fetching the data. No binary mode etc
|
Yes, and that's the problem. If you want to receive binary data, you have to explicitly set encoding to None, as it defaults to UTF-8. This would be done in the call to open_session() above. For instance: self._stdin, self._stdout, self._stderr = \
await self._conn.open_session(term_type='dumb', term_size=(200, 24), encoding=None) |
Ye, I've tried it previously. I got
at the same location. |
Right - as I mentioned above, you need to change the code which handles the data to be expecting a type of bytes rather than str, as you are not dealing with Unicode text. If you are really receiving binary data from the router, there's no way you can convert that data to a Unicode string. |
encoding='latin-1' worked, for now, let see what happens next. Thank you for your help. I really appreciate. |
It looks like Latin-1 will allow all byte values from 0x00 to 0xff to be translated as-is to Unicode code points of U+0000 to U+00FF, so you might get away with this. It's not really the right solution, though, and I'm guessing it will be quite a bit less efficient than operating directly on bytes objects. |
But if I use bytes to parse and save the output I have to decode at some point for the user to view right? what will i do then? |
If you can guide me on how to solve this issue since you have some idea about It I can try. Because according to juniper docs Junos OS escapes and encodes these characters using the equivalent UTF-8 decimal character reference. |
The message you posted here appears to contain binary data, at least at the end. I didn't see any text in that message that you'd be able to output to the user. However, I'm guessing that the output you get back will be a mixture of text and binary data, depending on what commands you run. You may need to parse the output to split apart the text from the binary data, and then you'd be able to do something like data.decode('utf-8') on the portions of the text output that you want to display. Also, if the text portions of the output are encoded as UTF-8, setting the encoding to 'Latin-1' will prevent AsyncSSH from raising an UnicodeDecodeError, but it won't actually return the right Unicode data. So, any non-ASCII characters in the text output won't be displayed correctly to the user. That would be the other reason you'll want to figure out how to split up the text & binary data and then manually decode only the text portion of the output using UTF-8. I didn't see any sign of escaping or other forms of encoding of the binary data in the portion of the message you included. If you were running commands that were the kind of thing a user would run by hand using SSH, I would have expected some kind of conversion to ASCII on the binary data and you wouldn't run into this issue, but I didn't see that here. |
Actually i am able to pin point the problem. But this does not make sense to me. Can you please see if you have some idea. here is the output that should be returned set interfaces xe-9/1/3 apply-groups Unused_Port Here is the output that is return using latin encoding set interfaces xe-9/1/3 apply-groups Unused_Port this looks like normal n why is it causing a problem. This is how i came to know the location. `set interfaces xe-9/1/3 apply-groups Unused_Port Task exception was never retrieved ` |
I don’t know what’s happening on the router to cause this, but the byte sequence 0xc2 0x2d is definitely not valid UTF-8, so the error being returned in correct. It’s possible that the hyphen in your message got mangled while passing through e-mail and it was actually a 0xad originally instead of 0x2d, in which case 0xc2 0xad would translate to Unicode U+00AD, which is legal UTF-8 for a “soft hyphen”. However, if the router had actually sent 0xc2 0xad, you wouldn’t have gotten this error. In your output, you also show an extra “te” there, and I don’t know why that would be added. Do you have the raw hex in the debug output of the MSG_CHANNEL_DATA which contains this response to confirm what bytes are actually being sent? |
is it possible to add errors='ignore' or in general {encoding=''UTF-8", errors="ignore"}. Because I need to ignore these encoding issues because I've talked to a network engineer, he said this above problem is in a description, so probably someone just copypaste some description with Unicode characters. |
Adding the ability to specify the Unicode error handler to use seems like a good improvement, and it should be straightforward to add. I'll try to have something in the 'develop' branch shortly, and reply here when it's ready to test. Note that setting errors='ignore' here will mean invalid Unicode output will be discarded. That should be fine in some cases, but if you were trying to do something like copy configuration from one router to another, you'd be better off with encoding=None and handling everything as bytes, so there's no loss of information. |
Ok - support for an "errors" argument is now ready to test in the "develop" branch (see commit 39ab119). Methods such as SSHClientConnection's create_session(), create_connection(), create_unix_connection(), create_server(), and create_unix_server() now support this, along with SSHServerConnection's create_connection() and create_unix_connection(). The equivalent functions which return stream or process objects also now support this. Callers to create_server_channel(), create_tcp_channel(), and create_unix_channel() can also pass in an "errors" argument when customizing other channel parameters. Support for controlling Unicode error handling is also available via the "session_errors" argument in the top-level AsyncSSH create_server() call (to be used along with "session_encoding"), and whatever is set there will apply to newly created server sessions on that server. When working with SSH process objects, whatever Unicode error handler is set is also automatically used as the error handler for any I/O redirection which is performed on that process. Finally, the get_comment() and set_comment() functions that operate on private/public keys and certificates have been updated to accept an "errors" argument as well. |
awesome thank you. |
This feature is now released in AsyncSSH 1.13.3. |
`future: <Task finished coro=<OutputFetcher.task() done, defined at /home/waqas/PycharmProjects/automation_manager/automation_manager/general_automation/commander/output_fetcher.py:49> exception=DisconnectError('Disconnect Error: Unicode decode error',)>
Traceback (most recent call last):
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/channel.py", line 296, in _deliver_data
data = encdata.decode(self._encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 0: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/waqas/PycharmProjects/automation_manager/automation_manager/general_automation/commander/output_fetcher.py", line 65, in task
await out_file.write(await router.send_command(command))
File "/home/waqas/PycharmProjects/automation_manager/automation_manager/general_automation/netdev/vendors/base.py", line 208, in send_command
output = await self._read_until_prompt_or_pattern(pattern, re_flags)
File "/home/waqas/PycharmProjects/automation_manager/automation_manager/general_automation/netdev/vendors/base.py", line 264, in _read_until_prompt_or_pattern
output += await asyncio.wait_for(fut, self._timeout)
File "/usr/lib/python3.6/asyncio/tasks.py", line 339, in wait_for
return (yield from fut)
File "/usr/lib/python3.6/asyncio/coroutines.py", line 215, in coro
res = yield from res
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/stream.py", line 444, in read
raise recv_buf.pop(0)
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/connection.py", line 504, in data_received
while self._inpbuf and self._recv_handler():
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/connection.py", line 724, in _recv_packet
processed = handler.process_packet(pkttype, seq, packet)
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/packet.py", line 207, in process_packet
self._packet_handlers[pkttype](self, pkttype, pktid, packet)
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/channel.py", line 521, in _process_data
self._accept_data(data)
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/channel.py", line 351, in _accept_data
self._deliver_data(data, datatype)
File "/home/waqas/.virtualenv/automation_manager/lib/python3.6/site-packages/asyncssh/channel.py", line 308, in _deliver_data
'Unicode decode error')
asyncssh.misc.DisconnectError: Disconnect Error: Unicode decode error
`
The text was updated successfully, but these errors were encountered: