Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31178: Avoid concatenating bytes with str in subprocess error #3066

Merged
merged 9 commits into from
Sep 6, 2017

Conversation

ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Aug 10, 2017

This particular fix makes most sense in my opinion because a decode is only required if we're dealing with the subprocess output. If we're falling back to a default error because that fails there's no need to generate an error message in bytes only to have it be turned it into a str later.

https://bugs.python.org/issue31178

except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
err_msg = (b'Bad exception data from child: ' +
err_msg = ('Bad exception data from child: ' +
repr(errpipe_data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add an "else:" block here and move "err_msg = err_msg.decode(errors="surrogatepass")" there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this more readable, instead of a confusing else block to the try/except I would much rather that the only time a decode happens is when we're dealing with the bytes. In order to behavior you asked for above I think changing this to

err_msg = ('Bad exception data from child: ' +
            errpipe_data.decode(errors="surrogatepass"))

would be far simpler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not able readability but about correctness.

Copy link
Member Author

@ammaraskar ammaraskar Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm missing something but I don't see how this is functionally any different from having an else block.

With this code the sequence looks like:

  1. Try to get the err_msg str by splitting and then decoding the errpipe_data
    • If that fails, err_msg str is errpipe_data decoded and wrapped with 'Bad exception data...'

with the else clause it'll be:

  1. Try to get the err_msg bytes by splitting errpipe_data
    • If that fails, get the err_msg str by wrapping errpipe_data with 'Bad exception data...'
  2. Decode err_msg into a str

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
err_msg = err_msg.decode(errors="surrogatepass")
except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please build err_msg using errpipe_data.decode(errors="surrogatepass") to prevent b'...' from repr(bytes). Example:

err_msg = errpipe_data.decode(errors="surrogatepass")
err_msg = 'Bad exception data from child: %s' % err_msg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that change the current behavior though, highly unlikely that someone is relying on the message looking like
Bad exception data from child: bytearray(b'OSError:asdf')

but this would change it to:
Bad exception data from child: OSError: asdf

If you think that's fine then I don't mind implementing it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect more something like (with quotes):

Bad exception data from child: 'OSError: asdf'

instead of

Bad exception data from child: b'OSError: asdf'

Wouldn't that change the current behavior

Yes it does, but this error is very unlikely and if you read this code, you are already in a bad shape, so it doesn't matter :-D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I've changed it to print out the error itself instead of the repr of the bytearray representing it. I've opted not to put it in an else block because that hurts readability in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if errpipe_data is not valid UTF-8?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safest to assume that anything coming over the errpipe is merely bytes that we cannot interpret in a meaningful manner. This is an error path, and a rare one at that as all POSIX platforms will be using _posixsubprocess which - being posix async signal safe - is unable to safely do any formatting or processing of error messages. I would not decode anything. just leave it as bytes with a repr of the errpipe_data in it.

The details about the error still get across, it doesn't matter if there is an extra b'' in there or not. feel free to call

err_msg = 'Bad exception data from child: ' + repr(bytes(errpipe_data))

It is a very bad thing to have an exception path that was going to give a useful error message instead fail with a unicode encode/decode error. I would not call decode at all in tha handler. The above line is the safe path. the call to bytes() gets rid of bytearrary() showing up in the repr. It leaves an ugly "b''" in there - but so what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @gpshead.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed tests.

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
err_msg = err_msg.decode(errors="surrogatepass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use errors="surrogatepass"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's how data in the format "exception_name:hex_errno:errmsg" is written: 4d07804#diff-cc136486b4a8e112e64b57436a0619ebR1207

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is however a completely valid concern in the except case. If the exception does not fit the standard colon delimited format then we can't make any assumptions about its encoding. Which is probably why it was using repr(errpipe_data) before.

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
err_msg = err_msg.decode(errors="surrogatepass")
except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if errpipe_data is not valid UTF-8?

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the code, I suggest to decode pipe data before the try:

errpipe_data = errpipe_data.decode(errors="surrogatepass")

Since both code paths decode anyway. It would allow to work on Unicode rather than bytes, which is more convenient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But decoding can fail itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But decoding can fail itself.

I don't see how .decode(errors="surrogatepass") can fail. MemoryError, maybe? If you are out of memory, everything will fail anyway :-)

See the current code, we already decode, I only suggest to move the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

b"\xff".decode(errors="surrogatepass")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka, @ammaraskar: Oh sorry, I was thinking at surrogateescape, whereas it's surrogatepeass. Please replace .decode('surrogatepass') with .decode('surrogateescape'), so decoding cannot fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you read my comment here? #3066 (comment)

In the case where _posixsubprocess writes to the errpipe, there should be nothing in there that can't be decoded. And if there is junk in there that wasn't written by _posixsubproccess, the decode will fail and it'll print out the appropriate message saying Bad exception data from child: b'UNDECODABLE:MESS:HERE'

err_msg = (b'Bad exception data from child: ' +
repr(errpipe_data))
err_msg = errpipe_data.decode(errors="surrogatepass")
err_msg = "Bad exception data from child: '%s'" % err_msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep repr() here: replace '%s' with %r.

@bedevere-bot
Copy link

A Python core developer, haypo, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify haypo along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
err_msg = err_msg.decode(errors="surrogatepass")
except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @gpshead.

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But decoding can fail itself.

@bedevere-bot
Copy link

A Python core developer, serhiy-storchaka, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify serhiy-storchaka along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

@ammaraskar
Copy link
Member Author

After taking a closer look, I've found that the pure python implementation that encoded the error message with surrogatepass above has been removed: 59fd1bf

Yet the error handling code wasn't updated. Looking at the current code which writes to the errpipe (windows doesn't support this): https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L522

It looks like all the strings are purely in C and consequently just ASCII, so using a decode() should be safe since UTF-8 should interop with the ascii there and potentially allow for more complicated messages in the future.

I've also added some tests on serhiy's suggestion since this problem was here for a while but never surfaced because like gpshead mentioned this is a rare error path.

I didn't expect the Spanish Inquisition!

@bedevere-bot
Copy link

Nobody expects the Spanish Inquisition!

@serhiy-storchaka, @Haypo: please review the changes made to this pull request.

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But decoding can fail itself.

I don't see how .decode(errors="surrogatepass") can fail. MemoryError, maybe? If you are out of memory, everything will fail anyway :-)

See the current code, we already decode, I only suggest to move the code.

# The encoding here should match the encoding
# written in by the subprocess implementations
# like _posixsubprocess
err_msg = err_msg.decode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep .decode(errors="surrogatepass"). The modified code is called when something goes wrong. I would prefer to avoid a decoding error if possible. It can happen that something writes junk into the pipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why you expect that this junk is UTF-8 with allowed surrogates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure where you're expecting this junk from. If the split managed to succeed then the data is likely written in by _posixsubprocess here https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L522
If it is then there wouldn't be any extra junk and so the decode should never fail unless the junk somehow manages to contain 2 colon characters by accident.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless the junk somehow manages to contain 2 colon characters by accident

I've added UnicodeError to the except block there, so in case this actually happens it'll fall through and simply put the repr of the bytes in the error message.

err_msg = (b'Bad exception data from child: ' +
repr(errpipe_data))
err_msg = "Bad exception data from child: {!s}".format(
bytes(errpipe_data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked you twice to keep the repr(), you didn't do it. Why? Can you at least explain?

I don't understand why you cast a bytes string to bytes: bytes(errpipe_data).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked you twice to keep the repr(), you didn't do it. Why? Can you at least explain?

@ammaraskar, without the repr() the code emits a bytes warning when run Python with the -b option.

I don't understand why you cast a bytes string to bytes: bytes(errpipe_data).

This is explained in the comment by @gpshead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that was supposed to be "!r" not "!s"

def test_exception_errpipe_bad_data(self, fork_exec, destructor):
"""Test error passing done through errpipe_write where its not
in the expected format"""
error_data = b"\xFF\x00\xDE\xAD"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use a simpler string which doesn't contain ":". For example: "bad data".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I chose this is because it is almost certainly not valid in any text encoding. So if anyone accidentally attempts to decode data in the bad path, this will start failing. Why do you suggest using a simple ASCII string?

@bedevere-bot
Copy link

A Python core developer, haypo, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify haypo along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

@ammaraskar
Copy link
Member Author

I didn't expect the Spanish Inquisition!

(also mentioning @gpshead since the bot doesn't)

@bedevere-bot
Copy link

Nobody expects the Spanish Inquisition!

@serhiy-storchaka, @Haypo: please review the changes made to this pull request.

# written in by the subprocess implementations
# like _posixsubprocess
err_msg = err_msg.decode()
except (ValueError, UnicodeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except ValueError: is enough. UnicodeError is a subclass of ValueError.

exception_name = b'SubprocessError'
hex_errno = b'0'
err_msg = (b'Bad exception data from child: ' +
repr(errpipe_data))
err_msg = "Bad exception data from child: {!r}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use f-string here and fit the expression in one line.

In any case please use single quotes. They are default quotes for string. I suppose you used double quotes because the string contained single quotes around %s. But now they are gone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its possible to fit in one line with the conversion to bytes since then it turns into

    err_msg = f'Bad exception data from child: {bytes(errpipe_data)!r}'

which is also too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

fork_exec.side_effect = proper_error

with self.assertRaises(IsADirectoryError):
subprocess.Popen(["cmd"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "cmd"? If this is purposed to be a name of non-existing command, be aware, that cmd.exe exists on Windows (and may be available when run tests in Posix subsystem on Windows).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's supposed to be a non existentant command. I'll replace it with something to better fit that intent

@ammaraskar
Copy link
Member Author

@Haypo ping

"""Test error passing done through errpipe_write in the good case"""
def proper_error(*args):
errpipe_write = args[13]
# 15 is the unix error code for EISDIR: 'is a directory'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use errno.EISDIR instead of hard coding a magic value.

# the destructor. An alternative would be to set _child_created to
# False before the destructor is called but there is no easy way
# to do that
@mock.patch.object(subprocess.Popen, "__del__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can never control when a __del__ destructor is going to be called. So this mock may well have been undone before it does get called by the gc. Never mock out anything on a class that is called outside of the test's control flow such as __del__.

If you want to replace __del__ behavior something different due to the way a test works, one way to do that is using a subclass that overrides it and does not call the parent class's method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't think about the fact that this would tie the test to an implementation detail. The subclassing solution sounds good.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@ammaraskar
Copy link
Member Author

I didn't expect the Spanish Inquisition!

@bedevere-bot
Copy link

Nobody expects the Spanish Inquisition!

@gpshead, @serhiy-storchaka, @Haypo: please review the changes made to this pull request.

os.write(errpipe_write, b"OSError:15:")
# Write the hex for the error code EISDIR: 'is a directory'
err_code = '{:x}'.format(errno.EISDIR).encode()
os.write(errpipe_write, b"OSError:" + err_code + b":")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write it as b"OSError:%x:" % errno.EISDIR.

@gpshead
Copy link
Member

gpshead commented Aug 29, 2017

This change looks good, it just needs a NEWS entry (add a file to Misc/NEWS.d/next/Library per the README.rst instructions in there).

@ammaraskar
Copy link
Member Author

@gpshead added news entry

@gpshead gpshead merged commit 3fc499b into python:master Sep 6, 2017
@miss-islington
Copy link
Contributor

🐍🍒⛏🤖 Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.I'm working now to backport this PR to: 3.6.

gpshead pushed a commit to gpshead/cpython that referenced this pull request Sep 6, 2017
…or (pythonGH-3066)

Avoid concatenating bytes with str in the typically rare subprocess error path (exec failed). Includes a mock based unittest to exercise the codepath.
(cherry picked from commit 3fc499b)
gpshead added a commit that referenced this pull request Sep 6, 2017
…or (GH-3066) (#3388)

Avoid concatenating bytes with str in the typically rare subprocess error path (exec failed). Includes a mock based unittest to exercise the codepath.
(cherry picked from commit 3fc499b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants