New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port majority of pants/util #6073

Merged
merged 7 commits into from Jul 9, 2018

Conversation

Projects
None yet
3 participants
@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Jul 6, 2018

Ports everything util/ except for objects.py and meta.py (see #6072), process_handler.py, and tarutil.py.

Part of #6062

@stuhood

stuhood approved these changes Jul 6, 2018

Copy link
Member

stuhood left a comment

Thanks!

@@ -14,10 +14,11 @@
import time
import uuid
import zipfile
from builtins import object

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

Temporarily removing the override of str() because it breaks a bunch of tests. I'll figure out how to fix this in a separate dedicated PR

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

Actually for now I'm going keep the original semantics. I don't understand the ramifications to changing this.

We might run into issues when running Python3 tests the first time - if so, we'll know we have to change the semantics from what we have now.

Below change (line 159) makes it so the semantics is the same in Py2 and Py3.

@@ -6,6 +6,7 @@
unicode_literals, with_statement)

import threading
from builtins import object, str

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

@benjyw can you look over this file please?

This override means that in line 39, during call to _RWBuf.write(), it will call _RWBuf.do_write() with unicode, rather than bytes, which might be a problem. However, do_write() simply raises NotImplementedError, so I don't know if this actually matters?

This comment has been minimized.

@benjyw

benjyw Jul 9, 2018

Contributor

If I understand correctly, from builtins import str is a no-op in py3 but renames py2's unicode type to str in py2? I think this definitely will have consequences. A low-level IO method should always operate on raw bytes (str in py2, bytes in py3), never on encoded strings.

However this file only explicitly uses str in one line, and that's hacky:

self.do_write(str(s))

If s is already binary data, no problem, but if it's text, then in python2 that str(s) would attempt to encode it using the default encoding, which is not guaranteed to work.

So really write() should check if the argument is bytes, and throw an exception of not. That way we'll find out eagerly if any call sites do the wrong thing.

Does that make sense?

This comment has been minimized.

@benjyw

benjyw Jul 9, 2018

Contributor

I imagine something like from builtins import bytes and then in write() check for isinstance(s, bytes).

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

Yes, exactly. from builtins import str modifies Python2's str() function so that it gets a unicode, rather than bytes, representation.

Agreed with that change. I'll break it out into a separate PR since it's a substantial semantic change + we have no explicit tests for this file.

This comment has been minimized.

@benjyw

benjyw Jul 9, 2018

Contributor

Actually what I'm proposing involves no semantic change. This commit, as stands, would be a major semantic change, as it would cast arbitrary bytes into text, which might explode. So you can't merge this as-is.

@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1):
return self._io.read() if size == -1 else self._io.read(size)

def write(self, s):
if not isinstance(s, bytes):

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

@benjyw follow up from prior discussion. Good point about prior version changing semantics.

This does change semantics that now ValueError will be raised whereas earlier we would try to coerce into bytes. If it's safe to make this change, I agree with you this is better

This comment has been minimized.

@benjyw

benjyw Jul 9, 2018

Contributor

True, but that coercion might have failed, so it's best to at least fail predictably and eagerly.

stuhood added a commit that referenced this pull request Jul 9, 2018

Port util/process_handler.py and util/tarutil.py (#6082)
Followup from #6073 port of `util`. These required manual changes. `util` is all ported after this.
@benjyw
Copy link
Contributor

benjyw left a comment

rwbuf.py LGTM mod my comment about making the error message more informative.

@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1):
return self._io.read() if size == -1 else self._io.read(size)

def write(self, s):
if not isinstance(s, bytes):
raise ValueError('Argument not in bytes: {0}'.format(s))

This comment has been minimized.

@benjyw

benjyw Jul 9, 2018

Contributor

Might be more useful to say what type the argument is, e.g., 'Expected bytes, not {}, for argument {}.'.format(type(s), s)

Side note - we typically don't enumerate the format args, unless we're repeating them (in which case we prefer to give them names instead of ordinals).

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 9, 2018

Contributor

Sounds good on both. Can't wait for us to be able to use f-strings!

Thanks for the review

@stuhood stuhood merged commit 1a85b59 into pantsbuild:master Jul 9, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Eric-Arellano Eric-Arellano deleted the Eric-Arellano:port-util branch Jul 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment