New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encoding of workunits under pantsd #6505

Merged
merged 2 commits into from Sep 17, 2018

Conversation

Projects
None yet
3 participants
@stuhood
Copy link
Member

stuhood commented Sep 14, 2018

Problem

On python2 under pantsd, workunits containing unicode are currently not encoded to bytes because the synthetic stdout that is created in that environment does not have "std" in its name.

Solution

Ensure that all output files in use with PlainTextReporter accept writes of raw bytes, and then universally encode non-binary_type values. Mark an integration test @ensure_daemon to cover this.

@stuhood stuhood added bug needs-cherrypick and removed bug labels Sep 14, 2018

@stuhood stuhood added this to the 1.10.x milestone Sep 14, 2018

@Eric-Arellano
Copy link
Contributor

Eric-Arellano left a comment

Are you sure this works with Python 3? I'm concerned that in Py3 sys.stdout expects unicode, not bytes.

Iirc, self.settings.outfile can be sys.stdout, right? Maybe I'm wrong there though, and it's only ever set in reporting.py like you updated.

We don't have integration tests using Py3 yet on CI, so I think we'll have to test Python 3 locally for the moment.

# TODO(python3port): Figure out if there's a better way to do this, like opening `sys.stderr` in different mode.
# Part of https://github.com/pantsbuild/pants/issues/6071.
if PY2 and 'std' in str(self.settings.outfile):
if not isinstance(s, binary_type):

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Sep 14, 2018

Contributor

s = ensure_binary(s) preferred

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 17, 2018

Are you sure this works with Python 3? I'm concerned that in Py3 sys.stdout expects unicode, not bytes.

I am not... the integration test covers usage under python2, but although the contrib shards themselves run under 2 and 3, the integration tests there are going to continue to fork a copy of pants that uses python2.

But I currently need to fix a production bug to unblock 1.10.x, and so I can't account for cases that we don't have tested. I expect that making this case work for python3 will just involve passing sys.stdout.buffer as outfile/errfile.

@Eric-Arellano
Copy link
Contributor

Eric-Arellano left a comment

Okay, fair enough.

Would you be comfortable adding a brief comment suggesting to use sys.stdout.buffer if Py3 is not working and linking to this PR? Might help future debugging for if this does break Py3.

Downside of that though, don't want the comment lingering in code base if we find out it already works in Py3.

@ity

ity approved these changes Sep 17, 2018

Copy link
Member

ity left a comment

lgtm - thanks for adding a test!

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 17, 2018

Would you be comfortable adding a brief comment suggesting to use sys.stdout.buffer if Py3 is not working and linking to this PR? Might help future debugging for if this does break Py3.

I think that because of the explicit check for the ability to write bytes to the given file, someone adapting this code to Python 3 will quickly figure out the right thing to do.

@stuhood stuhood merged commit 5a9b6b9 into pantsbuild:master Sep 17, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stuhood stuhood deleted the twitter:stuhood/properly-encode-emitted-workunits branch Sep 17, 2018

ity added a commit that referenced this pull request Sep 17, 2018

Fix encoding of workunits under pantsd (#6505)
### Problem

On python2 under `pantsd`, workunits containing unicode are currently not encoded to bytes because the synthetic stdout that is created in that environment does not have "std" in its name. 

### Solution

Ensure that all output files in use with `PlainTextReporter` accept writes of raw bytes, and then universally encode non-`binary_type` values. Mark an integration test `@ensure_daemon` to cover this.

@ity ity removed the needs-cherrypick label Sep 26, 2018

Eric-Arellano added a commit that referenced this pull request Dec 20, 2018

Fix plaintext_recorder sometimes being passed TextIO (#6963)
We decided in #6505 that `plaintext_reporter.py` should use bytes, rather than unicode.

The issue is that we were passing it text IO in two instances, rather than bytes IO, which led to a failure when trying to run Pants with Python 3.

This unifies all instances of `PlainTextReporter` to be passed instances of `BytesIO`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment