Bugfix: an exec request with a non UTF-8 command fails with an UnicodeDecodeError #502
Affected paramiko versions: all.
This pull request fixes a minor bug in the processing of the channel request "exec" on the
If the client sends a command sting, that is not a valid UTF-8 byte sequence, paramiko raises an UnicodeDecodeError.
Here is an example:
According to RFC 4254 sec 6.5 the "command" string of an "exec" channel
The pull request changes the request handling code to read a byte string instead of a unicode string and changes one test case to use a non UTF-8 byte string.
request is a byte-string. Previously paramiko assumed "command" to be UTF-8 encoded. Invalid UTF-8 sequences caused an UnicodeDecodeError. This commit changes a test case to uses a non UTF-8 string and fixes the bug.
Thanks for this, adding to release milestone.
This looked similar to another bugfix recently but I dug and that must have been in another spot (there are many like it, affected by the Py3k support merge). The breaking change is here: akruis@0e4ce37#diff-4dd2c4f2bbcd8b6b482b9e72e5589a42R1037 - going by your analysis hopefully it was just a mistaken choice of which interpretation method to use.
I appreciate the thought re: compatibility; given that the above change only occurred in 1.13 (which is semi-recent-ish) I am not super worried about somebody creating code since 1.13 that is sensitive to this change. Having it documented in this PR as a note, is good enough most likely.
Unfortunately it could be a design problem. RFC 4251 sec 4.5 and 5 are not completely clear. RFC 4251 sec 5: "Strings are allowed to contain arbitrary binary data, including null characters and 8-bit characters." and "Strings are also used to store text. In that case, US-ASCII is used for internal names, and ISO-10646 UTF-8 for text that might be displayed to the user."
Therefore any solid implementation must be able to handle arbitrary binary data, if the other side of an ssh-connection sends such data. (And it is probably fairly simple to send invalid UTF-8 data. An incorrect LC_CTYPE value and a username containing non-ASCII might be enough.)
Probably it would be a good idea to rethink the usage of Message.get_text(). It is used in many places and I'm quite sure, that some cases (i.e. username in auth_handler, filename in sftp_client, path in sftp_server) are problematic. Personally I propose to remove Message.get_text() entirely and to process RFC4251 "strings" as Python bytes instead of Python strings as far as possible.
Re: compatibility concerns & other notes, I think given this only affects Paramiko-as-server (which is the minor use case) and we're not sure just how many users would be impacted by the specifics, it's probably simplest to merge this as-is and then look more closely at e.g. @akruis' inline suggestion if we get additional bug reports.