bpo-31178: Avoid concatenating bytes with str in subprocess error #3066

ammaraskar · 2017-08-10T19:17:40Z

This particular fix makes most sense in my opinion because a decode is only required if we're dealing with the subprocess output. If we're falling back to a default error because that fails there's no need to generate an error message in bytes only to have it be turned it into a str later.

https://bugs.python.org/issue31178

vstinner · 2017-08-10T21:29:46Z

Lib/subprocess.py

                except ValueError:
                    exception_name = b'SubprocessError'
                    hex_errno = b'0'
-                    err_msg = (b'Bad exception data from child: ' +
+                    err_msg = ('Bad exception data from child: ' +
                               repr(errpipe_data))


I suggest to add an "else:" block here and move "err_msg = err_msg.decode(errors="surrogatepass")" there.

Personally I find this more readable, instead of a confusing else block to the try/except I would much rather that the only time a decode happens is when we're dealing with the bytes. In order to behavior you asked for above I think changing this to

err_msg = ('Bad exception data from child: ' + errpipe_data.decode(errors="surrogatepass"))

would be far simpler

It's not able readability but about correctness.

Sorry if I'm missing something but I don't see how this is functionally any different from having an else block.

With this code the sequence looks like:

Try to get the err_msg str by splitting and then decoding the errpipe_data

If that fails, err_msg str is errpipe_data decoded and wrapped with 'Bad exception data...'

with the else clause it'll be:

Try to get the err_msg bytes by splitting errpipe_data

If that fails, get the err_msg str by wrapping errpipe_data with 'Bad exception data...'

Decode err_msg into a str

vstinner · 2017-08-10T21:30:47Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:
                    exception_name, hex_errno, err_msg = (
                            errpipe_data.split(b':', 2))
+                    err_msg = err_msg.decode(errors="surrogatepass")
                except ValueError:
                    exception_name = b'SubprocessError'
                    hex_errno = b'0'


Please build err_msg using errpipe_data.decode(errors="surrogatepass") to prevent b'...' from repr(bytes). Example:

err_msg = errpipe_data.decode(errors="surrogatepass")
err_msg = 'Bad exception data from child: %s' % err_msg

Wouldn't that change the current behavior though, highly unlikely that someone is relying on the message looking like
Bad exception data from child: bytearray(b'OSError:asdf')

but this would change it to:
Bad exception data from child: OSError: asdf

If you think that's fine then I don't mind implementing it that way.

I expect more something like (with quotes):

Bad exception data from child: 'OSError: asdf'

instead of

Bad exception data from child: b'OSError: asdf'

Wouldn't that change the current behavior

Yes it does, but this error is very unlikely and if you read this code, you are already in a bad shape, so it doesn't matter :-D

That makes sense, I've changed it to print out the error itself instead of the repr of the bytearray representing it. I've opted not to put it in an else block because that hurts readability in my opinion.

What if errpipe_data is not valid UTF-8?

It is safest to assume that anything coming over the errpipe is merely bytes that we cannot interpret in a meaningful manner. This is an error path, and a rare one at that as all POSIX platforms will be using _posixsubprocess which - being posix async signal safe - is unable to safely do any formatting or processing of error messages. I would not decode anything. just leave it as bytes with a repr of the errpipe_data in it.

The details about the error still get across, it doesn't matter if there is an extra b'' in there or not. feel free to call

err_msg = 'Bad exception data from child: ' + repr(bytes(errpipe_data))

It is a very bad thing to have an exception path that was going to give a useful error message instead fail with a unicode encode/decode error. I would not call decode at all in tha handler. The above line is the safe path. the call to bytes() gets rid of bytearrary() showing up in the repr. It leaves an ugly "b''" in there - but so what?

I concur with @gpshead.

serhiy-storchaka

Needed tests.

serhiy-storchaka · 2017-08-11T08:36:09Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:
                    exception_name, hex_errno, err_msg = (
                            errpipe_data.split(b':', 2))
+                    err_msg = err_msg.decode(errors="surrogatepass")


Why use errors="surrogatepass"?

Because that's how data in the format "exception_name:hex_errno:errmsg" is written: 4d07804#diff-cc136486b4a8e112e64b57436a0619ebR1207

This is however a completely valid concern in the except case. If the exception does not fit the standard colon delimited format then we can't make any assumptions about its encoding. Which is probably why it was using repr(errpipe_data) before.

serhiy-storchaka · 2017-08-11T08:37:45Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:
                    exception_name, hex_errno, err_msg = (
                            errpipe_data.split(b':', 2))
+                    err_msg = err_msg.decode(errors="surrogatepass")
                except ValueError:
                    exception_name = b'SubprocessError'
                    hex_errno = b'0'


What if errpipe_data is not valid UTF-8?

vstinner · 2017-08-16T09:32:13Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:


To simplify the code, I suggest to decode pipe data before the try:

errpipe_data = errpipe_data.decode(errors="surrogatepass")

Since both code paths decode anyway. It would allow to work on Unicode rather than bytes, which is more convenient.

But decoding can fail itself.

But decoding can fail itself.

I don't see how .decode(errors="surrogatepass") can fail. MemoryError, maybe? If you are out of memory, everything will fail anyway :-)

See the current code, we already decode, I only suggest to move the code.

For example:

b"\xff".decode(errors="surrogatepass")

@serhiy-storchaka, @ammaraskar: Oh sorry, I was thinking at surrogateescape, whereas it's surrogatepeass. Please replace .decode('surrogatepass') with .decode('surrogateescape'), so decoding cannot fail.

Have you read my comment here? #3066 (comment)

In the case where _posixsubprocess writes to the errpipe, there should be nothing in there that can't be decoded. And if there is junk in there that wasn't written by _posixsubproccess, the decode will fail and it'll print out the appropriate message saying Bad exception data from child: b'UNDECODABLE:MESS:HERE'

vstinner · 2017-08-16T09:32:33Z

Lib/subprocess.py

-                    err_msg = (b'Bad exception data from child: ' +
-                               repr(errpipe_data))
+                    err_msg = errpipe_data.decode(errors="surrogatepass")
+                    err_msg = "Bad exception data from child: '%s'" % err_msg


Please keep repr() here: replace '%s' with %r.

bedevere-bot · 2017-08-16T09:32:40Z

A Python core developer, haypo, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify haypo along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

serhiy-storchaka · 2017-08-16T10:01:49Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:
                    exception_name, hex_errno, err_msg = (
                            errpipe_data.split(b':', 2))
+                    err_msg = err_msg.decode(errors="surrogatepass")
                except ValueError:
                    exception_name = b'SubprocessError'
                    hex_errno = b'0'


I concur with @gpshead.

serhiy-storchaka · 2017-08-16T10:07:48Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:


But decoding can fail itself.

bedevere-bot · 2017-08-16T10:09:30Z

A Python core developer, serhiy-storchaka, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify serhiy-storchaka along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

ammaraskar · 2017-08-17T05:30:03Z

After taking a closer look, I've found that the pure python implementation that encoded the error message with surrogatepass above has been removed: 59fd1bf

Yet the error handling code wasn't updated. Looking at the current code which writes to the errpipe (windows doesn't support this): https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L522

It looks like all the strings are purely in C and consequently just ASCII, so using a decode() should be safe since UTF-8 should interop with the ascii there and potentially allow for more complicated messages in the future.

I've also added some tests on serhiy's suggestion since this problem was here for a while but never surfaced because like gpshead mentioned this is a rare error path.

I didn't expect the Spanish Inquisition!

bedevere-bot · 2017-08-17T05:30:05Z

Nobody expects the Spanish Inquisition!

@serhiy-storchaka, @Haypo: please review the changes made to this pull request.

vstinner · 2017-08-17T09:03:28Z

Lib/subprocess.py

@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
                try:


But decoding can fail itself.

I don't see how .decode(errors="surrogatepass") can fail. MemoryError, maybe? If you are out of memory, everything will fail anyway :-)

See the current code, we already decode, I only suggest to move the code.

vstinner · 2017-08-17T09:05:23Z

Lib/subprocess.py

+                    # The encoding here should match the encoding
+                    # written in by the subprocess implementations
+                    # like _posixsubprocess
+                    err_msg = err_msg.decode()


Please keep .decode(errors="surrogatepass"). The modified code is called when something goes wrong. I would prefer to avoid a decoding error if possible. It can happen that something writes junk into the pipe.

But why you expect that this junk is UTF-8 with allowed surrogates?

Yeah, I'm not sure where you're expecting this junk from. If the split managed to succeed then the data is likely written in by _posixsubprocess here https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L522
If it is then there wouldn't be any extra junk and so the decode should never fail unless the junk somehow manages to contain 2 colon characters by accident.

unless the junk somehow manages to contain 2 colon characters by accident

I've added UnicodeError to the except block there, so in case this actually happens it'll fall through and simply put the repr of the bytes in the error message.

vstinner · 2017-08-17T09:06:04Z

Lib/subprocess.py

-                    err_msg = (b'Bad exception data from child: ' +
-                               repr(errpipe_data))
+                    err_msg = "Bad exception data from child: {!s}".format(
+                                  bytes(errpipe_data))


I asked you twice to keep the repr(), you didn't do it. Why? Can you at least explain?

I don't understand why you cast a bytes string to bytes: bytes(errpipe_data).

I asked you twice to keep the repr(), you didn't do it. Why? Can you at least explain?

@ammaraskar, without the repr() the code emits a bytes warning when run Python with the -b option.

I don't understand why you cast a bytes string to bytes: bytes(errpipe_data).

This is explained in the comment by @gpshead.

Sorry that was supposed to be "!r" not "!s"

vstinner · 2017-08-17T09:07:49Z

Lib/test/test_subprocess.py

+    def test_exception_errpipe_bad_data(self, fork_exec, destructor):
+        """Test error passing done through errpipe_write where its not
+        in the expected format"""
+        error_data = b"\xFF\x00\xDE\xAD"


I suggest to use a simpler string which doesn't contain ":". For example: "bad data".

The reason I chose this is because it is almost certainly not valid in any text encoding. So if anyone accidentally attempts to decode data in the bad path, this will start failing. Why do you suggest using a simple ASCII string?

bedevere-bot · 2017-08-17T09:08:55Z

A Python core developer, haypo, has requested some changes be
made to your pull request before we can consider merging it. If you
could please address their requests along with any other requests in
other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment
on this pull request containing the phrase I didn't expect the Spanish Inquisition!
I will then notify haypo along with any other core developers
who have left a review that you're ready for them to take another look
at this pull request.

ammaraskar · 2017-08-17T11:27:55Z

I didn't expect the Spanish Inquisition!

(also mentioning @gpshead since the bot doesn't)

bedevere-bot · 2017-08-17T11:27:57Z

Nobody expects the Spanish Inquisition!

@serhiy-storchaka, @Haypo: please review the changes made to this pull request.

serhiy-storchaka · 2017-08-17T11:56:04Z

Lib/subprocess.py

+                    # written in by the subprocess implementations
+                    # like _posixsubprocess
+                    err_msg = err_msg.decode()
+                except (ValueError, UnicodeError):


except ValueError: is enough. UnicodeError is a subclass of ValueError.

serhiy-storchaka · 2017-08-17T12:04:33Z

Lib/subprocess.py

                    exception_name = b'SubprocessError'
                    hex_errno = b'0'
-                    err_msg = (b'Bad exception data from child: ' +
-                               repr(errpipe_data))
+                    err_msg = "Bad exception data from child: {!r}".format(


You could use f-string here and fit the expression in one line.

In any case please use single quotes. They are default quotes for string. I suppose you used double quotes because the string contained single quotes around %s. But now they are gone.

I don't think its possible to fit in one line with the conversion to bytes since then it turns into

err_msg = f'Bad exception data from child: {bytes(errpipe_data)!r}'

which is also too long.

serhiy-storchaka · 2017-08-17T12:10:35Z

Lib/test/test_subprocess.py

+        fork_exec.side_effect = proper_error
+
+        with self.assertRaises(IsADirectoryError):
+            subprocess.Popen(["cmd"])


What is "cmd"? If this is purposed to be a name of non-existing command, be aware, that cmd.exe exists on Windows (and may be available when run tests in Posix subsystem on Windows).

Yeah it's supposed to be a non existentant command. I'll replace it with something to better fit that intent

ammaraskar · 2017-08-19T15:49:21Z

@Haypo ping

gpshead · 2017-08-20T06:04:48Z

Lib/test/test_subprocess.py

+        """Test error passing done through errpipe_write in the good case"""
+        def proper_error(*args):
+            errpipe_write = args[13]
+            # 15 is the unix error code for EISDIR: 'is a directory'


use errno.EISDIR instead of hard coding a magic value.

gpshead · 2017-08-20T06:04:52Z

Lib/test/test_subprocess.py

+    # the destructor. An alternative would be to set _child_created to
+    # False before the destructor is called but there is no easy way
+    # to do that
+    @mock.patch.object(subprocess.Popen, "__del__")


You can never control when a __del__ destructor is going to be called. So this mock may well have been undone before it does get called by the gc. Never mock out anything on a class that is called outside of the test's control flow such as __del__.

If you want to replace __del__ behavior something different due to the way a test works, one way to do that is using a subclass that overrides it and does not call the parent class's method.

Good point, I didn't think about the fact that this would tie the test to an implementation detail. The subclassing solution sounds good.

bedevere-bot · 2017-08-20T06:09:34Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

ammaraskar · 2017-08-20T13:12:01Z

I didn't expect the Spanish Inquisition!

bedevere-bot · 2017-08-20T13:12:03Z

Nobody expects the Spanish Inquisition!

@gpshead, @serhiy-storchaka, @Haypo: please review the changes made to this pull request.

serhiy-storchaka · 2017-08-28T08:20:11Z

Lib/test/test_subprocess.py

-            os.write(errpipe_write, b"OSError:15:")
+            # Write the hex for the error code EISDIR: 'is a directory'
+            err_code = '{:x}'.format(errno.EISDIR).encode()
+            os.write(errpipe_write, b"OSError:" + err_code + b":")


You could write it as b"OSError:%x:" % errno.EISDIR.

gpshead · 2017-08-29T13:23:05Z

This change looks good, it just needs a NEWS entry (add a file to Misc/NEWS.d/next/Library per the README.rst instructions in there).

ammaraskar · 2017-09-05T18:56:37Z

@gpshead added news entry

issues resolved

miss-islington · 2017-09-06T06:41:33Z

🐍🍒⛏🤖 Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.I'm working now to backport this PR to: 3.6.

…or (pythonGH-3066) Avoid concatenating bytes with str in the typically rare subprocess error path (exec failed). Includes a mock based unittest to exercise the codepath. (cherry picked from commit 3fc499b)

…or (GH-3066) (#3388) Avoid concatenating bytes with str in the typically rare subprocess error path (exec failed). Includes a mock based unittest to exercise the codepath. (cherry picked from commit 3fc499b)

ammaraskar requested a review from gpshead as a code owner August 10, 2017 19:17

the-knights-who-say-ni added the CLA signed label Aug 10, 2017

vstinner reviewed Aug 10, 2017

View reviewed changes

bpo-31178: Avoid concatenating bytes with str in subprocess error

8f8b86a

ammaraskar force-pushed the subprocess-error branch from 6c0717b to 8f8b86a Compare August 10, 2017 21:56

serhiy-storchaka requested changes Aug 11, 2017

View reviewed changes

vstinner requested changes Aug 16, 2017

View reviewed changes

bedevere-bot added the awaiting changes label Aug 16, 2017

serhiy-storchaka requested changes Aug 16, 2017

View reviewed changes

ammaraskar added 2 commits August 16, 2017 09:36

Do not attempt to decode in exception path

c277e29

Add tests and remove surrogatepass

df3e1d3

bedevere-bot added awaiting change review and removed awaiting changes labels Aug 17, 2017

vstinner previously requested changes Aug 17, 2017

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting change review labels Aug 17, 2017

Use repr instead of str in bad path

7410ae2

bedevere-bot added awaiting change review and removed awaiting changes labels Aug 17, 2017

Add UnicodeError to except in case the junk contains 2 colons

f007582

serhiy-storchaka reviewed Aug 17, 2017

View reviewed changes

Changes for Serhiy's review

f061ff1

serhiy-storchaka approved these changes Aug 17, 2017

View reviewed changes

gpshead requested changes Aug 20, 2017

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting merge labels Aug 20, 2017

ammaraskar added 2 commits August 20, 2017 08:53

Avoid harcoding EISDIR errno

7a36648

Mock __del__ with a subclass instead of mocking the method

2707615

bedevere-bot added awaiting change review and removed awaiting changes labels Aug 20, 2017

ammaraskar mentioned this pull request Aug 20, 2017

Don't re-ping reviewers who have already approved. python/bedevere#54

Closed

serhiy-storchaka reviewed Aug 28, 2017

View reviewed changes

Add news entry

8634c2b

gpshead added the needs backport to 3.6 label Sep 6, 2017

gpshead approved these changes Sep 6, 2017

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Sep 6, 2017

gpshead merged commit 3fc499b into python:master Sep 6, 2017

gpshead mentioned this pull request Sep 6, 2017

[3.6] bpo-31178: Avoid concatenating bytes with str in subprocess error #3388

Merged

Mariatta mentioned this pull request Sep 6, 2017

Notify when backport PR could not be created automatically python/miss-islington#8

Closed

gpshead removed the needs backport to 3.6 label Sep 6, 2017

Mariatta removed the awaiting merge label Oct 8, 2017

		@@ -1307,15 +1307,15 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,
		try:

bpo-31178: Avoid concatenating bytes with str in subprocess error #3066

bpo-31178: Avoid concatenating bytes with str in subprocess error #3066

Conversation

ammaraskar commented Aug 10, 2017 • edited by bedevere-bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ammaraskar Aug 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Aug 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Aug 16, 2017

ammaraskar commented Aug 17, 2017

bedevere-bot commented Aug 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Aug 17, 2017

ammaraskar commented Aug 17, 2017

bedevere-bot commented Aug 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ammaraskar commented Aug 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Aug 20, 2017

ammaraskar commented Aug 20, 2017

bedevere-bot commented Aug 20, 2017

Choose a reason for hiding this comment

gpshead commented Aug 29, 2017

ammaraskar commented Sep 5, 2017

miss-islington commented Sep 6, 2017

ammaraskar commented Aug 10, 2017 •

edited by bedevere-bot

Loading

ammaraskar Aug 10, 2017 •

edited

Loading