New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: an exec request with a non UTF-8 command fails with an UnicodeDecodeError #502

Closed
wants to merge 3 commits into
base: 1.13
from

Conversation

Projects
None yet
3 participants
@akruis
Contributor

akruis commented Mar 20, 2015

Affected paramiko versions: all.

This pull request fixes a minor bug in the processing of the channel request "exec" on the
server side of a ssh session.

Problem

If the client sends a command sting, that is not a valid UTF-8 byte sequence, paramiko raises an UnicodeDecodeError.

Here is an example:

paramiko.transport - ERROR - Unknown exception: 'utf8' codec can't decode byte 0xe4 in position 889: invalid continuation byte
paramiko.transport - ERROR - Traceback (most recent call last):
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\transport.py", line 1619, in run
paramiko.transport - ERROR - self._channel_handler_table[ptype](chan, m)
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\channel.py", line 978, in _handle_request
paramiko.transport - ERROR - cmd = m.get_text()
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\message.py", line 199, in get_text
paramiko.transport - ERROR - return u(self.get_bytes(self.get_size()))
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\src3rdParty\paramiko\py3compat.py", line 51, in u
paramiko.transport - ERROR - return s.decode(encoding)
paramiko.transport - ERROR - File "F:\fg2\eclipsews\fg2py\arch\win32\libexec\lib\encodings\utf_8.py", line 16, in decode
paramiko.transport - ERROR - return codecs.utf_8_decode(input, errors, True)
paramiko.transport - ERROR - UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 889: invalid continuation byte

According to RFC 4254 sec 6.5 the "command" string of an "exec" channel
request is a byte-string, but paramiko uses the Message.get_text() method to read the command string. get_text() requires an UTF-8 encoded byte string.

Fix

The pull request changes the request handling code to read a byte string instead of a unicode string and changes one test case to use a non UTF-8 byte string.

According to RFC 4254 sec 6.5 the "command" string of an "exec" channel
request is a byte-string. Previously paramiko assumed "command" to be
UTF-8 encoded. Invalid UTF-8 sequences caused an UnicodeDecodeError.
This commit changes a test case to uses a non UTF-8 string and fixes the
bug.
@coveralls

This comment has been minimized.

coveralls commented Mar 20, 2015

Coverage Status

Coverage decreased (-0.9%) to 76.92% when pulling 838e02a on akruis:1.13-execfix into fac3477 on paramiko:1.13.

@coveralls

This comment has been minimized.

coveralls commented Mar 20, 2015

Coverage Status

Coverage decreased (-0.92%) to 76.91% when pulling 063c394 on akruis:1.13-execfix into fac3477 on paramiko:1.13.

Commit 838e02a changed the type of the exec command string on python3
from unicode to bytes. This commit adapts the test suite accordingly.
@coveralls

This comment has been minimized.

coveralls commented Mar 20, 2015

Coverage Status

Coverage increased (+0.08%) to 77.9% when pulling 94c2018 on akruis:1.13-execfix into fac3477 on paramiko:1.13.

@@ -989,7 +989,7 @@ def _handle_request(self, m):
else:
ok = server.check_channel_env_request(self, name, value)
elif key == 'exec':
cmd = m.get_text()
cmd = m.get_string()
if server is None:
ok = False
else:

This comment has been minimized.

@akruis

akruis Mar 20, 2015

Contributor

type(cmd) changes from unicode to bytes. If strict compatibility with existing code is a concern, inserting the following code could be a solution:

    from paramiko.py3compat import u   # this line goes to top of the file
    try:
        cmd = u(cmd)
    except UnicodeDecodeError:
        pass

This way the behaviour does not change, if the ssh-client sends valid UTF-8. If cmd is not UTF-8, it will be passed on as bytes. A server, that wants command to be bytes, could use the following ServerInterface.check_channel_exec_request implementation:

    def check_channel_exec_request(self, channel, command):
        if isinstance(command, unicode):
            command = command.encode('utf-8')
        ...

@bitprophet bitprophet added the Bug label Mar 20, 2015

@bitprophet bitprophet added this to the 1.13.4 / 1.14.3 / 1.15.3 milestone Mar 20, 2015

@bitprophet

This comment has been minimized.

Member

bitprophet commented Mar 20, 2015

Thanks for this, adding to release milestone.

This looked similar to another bugfix recently but I dug and that must have been in another spot (there are many like it, affected by the Py3k support merge). The breaking change is here: akruis@0e4ce37#diff-4dd2c4f2bbcd8b6b482b9e72e5589a42R1037 - going by your analysis hopefully it was just a mistaken choice of which interpretation method to use.

I appreciate the thought re: compatibility; given that the above change only occurred in 1.13 (which is semi-recent-ish) I am not super worried about somebody creating code since 1.13 that is sensitive to this change. Having it documented in this PR as a note, is good enough most likely.

@akruis

This comment has been minimized.

Contributor

akruis commented Mar 23, 2015

hopefully it was just a mistaken choice of which interpretation method to use.

Unfortunately it could be a design problem. RFC 4251 sec 4.5 and 5 are not completely clear. RFC 4251 sec 5: "Strings are allowed to contain arbitrary binary data, including null characters and 8-bit characters." and "Strings are also used to store text. In that case, US-ASCII is used for internal names, and ISO-10646 UTF-8 for text that might be displayed to the user."

Therefore any solid implementation must be able to handle arbitrary binary data, if the other side of an ssh-connection sends such data. (And it is probably fairly simple to send invalid UTF-8 data. An incorrect LC_CTYPE value and a username containing non-ASCII might be enough.)

Probably it would be a good idea to rethink the usage of Message.get_text(). It is used in many places and I'm quite sure, that some cases (i.e. username in auth_handler, filename in sftp_client, path in sftp_server) are problematic. Personally I propose to remove Message.get_text() entirely and to process RFC4251 "strings" as Python bytes instead of Python strings as far as possible.

@bitprophet bitprophet modified the milestones: 2.0, 1.13.4 / 1.14.3 / 1.15.3, 1.16 Sep 30, 2015

@bitprophet

This comment has been minimized.

Member

bitprophet commented Sep 30, 2015

Bumping this to 1.16, it has large enough implications that I don't want to drop it in a bugfix release if possible.

@bitprophet

This comment has been minimized.

Member

bitprophet commented Nov 3, 2015

Re: compatibility concerns & other notes, I think given this only affects Paramiko-as-server (which is the minor use case) and we're not sure just how many users would be impacted by the specifics, it's probably simplest to merge this as-is and then look more closely at e.g. @akruis' inline suggestion if we get additional bug reports.

bitprophet added a commit that referenced this pull request Nov 4, 2015

@bitprophet

This comment has been minimized.

Member

bitprophet commented Nov 4, 2015

Merged to master for the 1.16 release (I still feel this belonged in a feature release and not bugfix, though thanks for targeting the bugfix branch by default :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment