Bugfix: an exec request with a non UTF-8 command fails with an UnicodeDecodeError #502

Closed
wants to merge 3 commits into
from

Projects

None yet

3 participants

@akruis
Contributor
akruis commented Mar 20, 2015

Affected paramiko versions: all.

This pull request fixes a minor bug in the processing of the channel request "exec" on the
server side of a ssh session.

Problem

If the client sends a command sting, that is not a valid UTF-8 byte sequence, paramiko raises an UnicodeDecodeError.

Here is an example:

paramiko.transport - ERROR - Unknown exception: 'utf8' codec can't decode byte 0xe4 in position 889: invalid continuation byte
paramiko.transport - ERROR - Traceback (most recent call last):
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\transport.py", line 1619, in run
paramiko.transport - ERROR - self._channel_handler_table[ptype](chan, m)
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\channel.py", line 978, in _handle_request
paramiko.transport - ERROR - cmd = m.get_text()
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\message.py", line 199, in get_text
paramiko.transport - ERROR - return u(self.get_bytes(self.get_size()))
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\py3compat.py", line 51, in u
paramiko.transport - ERROR - return s.decode(encoding)
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\arch\win32\libexec\lib\encodings\utf_8.py", line 16, in decode
paramiko.transport - ERROR - return codecs.utf_8_decode(input, errors, True)
paramiko.transport - ERROR - UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 889: invalid continuation byte

According to RFC 4254 sec 6.5 the "command" string of an "exec" channel
request is a byte-string, but paramiko uses the Message.get_text() method to read the command string. get_text() requires an UTF-8 encoded byte string.

Fix

The pull request changes the request handling code to read a byte string instead of a unicode string and changes one test case to use a non UTF-8 byte string.

@akruis akruis According to RFC 4254 sec 6.5 the "command" string of an "exec" channel
request is a byte-string. Previously paramiko assumed "command" to be
UTF-8 encoded. Invalid UTF-8 sequences caused an UnicodeDecodeError.
This commit changes a test case to uses a non UTF-8 string and fixes the
bug.
838e02a
@coveralls

Coverage Status

Coverage decreased (-0.9%) to 76.92% when pulling 838e02a on akruis:1.13-execfix into fac3477 on paramiko:1.13.

@coveralls

Coverage Status

Coverage decreased (-0.92%) to 76.91% when pulling 063c394 on akruis:1.13-execfix into fac3477 on paramiko:1.13.

@akruis akruis Commit 838e02a changed the type of the exec command string on python3
from unicode to bytes. This commit adapts the test suite accordingly.
94c2018
@coveralls

Coverage Status

Coverage increased (+0.08%) to 77.9% when pulling 94c2018 on akruis:1.13-execfix into fac3477 on paramiko:1.13.

@akruis akruis commented on the diff Mar 20, 2015
paramiko/channel.py
@@ -989,7 +989,7 @@ def _handle_request(self, m):
else:
ok = server.check_channel_env_request(self, name, value)
elif key == 'exec':
- cmd = m.get_text()
+ cmd = m.get_string()
if server is None:
ok = False
else:
@akruis
akruis Mar 20, 2015 Contributor

type(cmd) changes from unicode to bytes. If strict compatibility with existing code is a concern, inserting the following code could be a solution:

    from paramiko.py3compat import u   # this line goes to top of the file
    try:
        cmd = u(cmd)
    except UnicodeDecodeError:
        pass

This way the behaviour does not change, if the ssh-client sends valid UTF-8. If cmd is not UTF-8, it will be passed on as bytes. A server, that wants command to be bytes, could use the following ServerInterface.check_channel_exec_request implementation:

    def check_channel_exec_request(self, channel, command):
        if isinstance(command, unicode):
            command = command.encode('utf-8')
        ...
@bitprophet bitprophet added the Bug label Mar 20, 2015
@bitprophet bitprophet added this to the 1.13.4 / 1.14.3 / 1.15.3 milestone Mar 20, 2015
@bitprophet
Member

Thanks for this, adding to release milestone.

This looked similar to another bugfix recently but I dug and that must have been in another spot (there are many like it, affected by the Py3k support merge). The breaking change is here: akruis@0e4ce37#diff-4dd2c4f2bbcd8b6b482b9e72e5589a42R1037 - going by your analysis hopefully it was just a mistaken choice of which interpretation method to use.

I appreciate the thought re: compatibility; given that the above change only occurred in 1.13 (which is semi-recent-ish) I am not super worried about somebody creating code since 1.13 that is sensitive to this change. Having it documented in this PR as a note, is good enough most likely.

@akruis
Contributor
akruis commented Mar 23, 2015

hopefully it was just a mistaken choice of which interpretation method to use.

Unfortunately it could be a design problem. RFC 4251 sec 4.5 and 5 are not completely clear. RFC 4251 sec 5: "Strings are allowed to contain arbitrary binary data, including null characters and 8-bit characters." and "Strings are also used to store text. In that case, US-ASCII is used for internal names, and ISO-10646 UTF-8 for text that might be displayed to the user."

Therefore any solid implementation must be able to handle arbitrary binary data, if the other side of an ssh-connection sends such data. (And it is probably fairly simple to send invalid UTF-8 data. An incorrect LC_CTYPE value and a username containing non-ASCII might be enough.)

Probably it would be a good idea to rethink the usage of Message.get_text(). It is used in many places and I'm quite sure, that some cases (i.e. username in auth_handler, filename in sftp_client, path in sftp_server) are problematic. Personally I propose to remove Message.get_text() entirely and to process RFC4251 "strings" as Python bytes instead of Python strings as far as possible.

@bitprophet
Member

Bumping this to 1.16, it has large enough implications that I don't want to drop it in a bugfix release if possible.

@bitprophet
Member

Re: compatibility concerns & other notes, I think given this only affects Paramiko-as-server (which is the minor use case) and we're not sure just how many users would be impacted by the specifics, it's probably simplest to merge this as-is and then look more closely at e.g. @akruis' inline suggestion if we get additional bug reports.

@bitprophet bitprophet added a commit that referenced this pull request Nov 4, 2015
@bitprophet bitprophet Reword changelog re #502 & add attribution 4565fb5
@bitprophet
Member

Merged to master for the 1.16 release (I still feel this belonged in a feature release and not bugfix, though thanks for targeting the bugfix branch by default :))

@bitprophet bitprophet closed this Nov 4, 2015
@mattclay mattclay referenced this pull request in ansible/ansible Mar 19, 2016
Merged

Add missing to_bytes for cmd. #15049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment