New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port util metaprogramming files to Python3 #6072

Merged
merged 3 commits into from Jul 6, 2018

Conversation

Projects
None yet
4 participants
@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Jul 6, 2018

Part of #6062

Problem

ABCMeta expects unicode on Python3, but bytes on Python2.

Solution

We use feature detection to try with idiomatic Py3 and fall back on Py2. This is preferred over version detection (i.e. checking which Python interpreter is used).

Note that the source of this file, twitter.commons.lang, did not have to do this because they don't use from __future__ import unicode_literals.

Eric-Arellano added some commits Jul 6, 2018

@Eric-Arellano Eric-Arellano changed the title Port util/meta.py to Python3 Port util metaprogramming files to Python3 Jul 6, 2018

@@ -50,7 +51,6 @@ def datatype(field_decls, superclass_name=None, **kwargs):

if not superclass_name:
superclass_name = '_anonymous_namedtuple_subclass'
superclass_name = str(superclass_name)

namedtuple_cls = namedtuple(superclass_name, field_names, **kwargs)

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 6, 2018

Contributor

Flagging this line, because it will always be passed unicode no matter the Python version. This seems to be okay per the tests, though.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 7, 2018

Contributor

Thanks for fixing this -- when I wrote that I didn't understand the implications of unicode_literals and was having a lot of issues. There are multiple places I've used datatype with a field that should have been string-typed, and failed to add a type check because I wasn't sure how to handle it -- should have bubbled up the concern then. Will try to keep an eye on this in the future, but any further strangeness you may find with datatypes especially around string handling could be accidental, so please feel free to make these sorts of changes in the future. Thanks!

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 7, 2018

Contributor

It’s tricky code! I was thrown off by ABCMeta requiring bytes in python 2.

I’ll continue to break out any changes we need to make due to meta in separate PRs.

"Type names and field names can only contain alphanumeric characters "
"and underscores: \"<type 'str'>\"")
self.assertEqual(str(cm.exception), expected_msg)
def compare_string_rep(string_type):

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 6, 2018

Contributor

Including from builtins import str at the top of this file changes the behavior of these tests under TypedDatatypeTest. Instead of returning str, datatype returns newstr, which is future's backport of Python3's str class.

As I understand, this change is introduced only because changes made to this test file, not to the underlying objects.datatype() function. When you remove from builtins import str from this file, the old version of the tests pass. That's a good thing - we aren't breaking the interface for a bunch of things using objects.datatype().

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 7, 2018

Contributor

I think your understanding here is correct, in case you were wondering.

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor

Eric-Arellano commented Jul 6, 2018

"""error: in constructor of type AnotherTypedDatatype: type check error:
field 'elements' was invalid: value 'should be list' (with type 'str') must satisfy this type constraint: Exactly(list).""")
self.assertEqual(str(cm.exception), expected_msg)
def compare_string_rep(string_type):

This comment has been minimized.

@stuhood

stuhood Jul 6, 2018

Member

Perhaps rather than making all of these conditional, you could match against regexes instead? Alternatively, maybe have your helper function do something like:

self.assertIn(str(cm).exception, [expected_message_with_str, expected_message_with_newstr])

?

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Jul 6, 2018

Contributor

I originally started that way but changed to these conditionals to align code more with the feature checking idiom. It makes it more explicit this is for the sake of compatibility, and allows us to remove python 2 more easily once we stop supporting it.

Alas it’s not the prettiest code..

This comment has been minimized.

@stuhood

stuhood Jul 6, 2018

Member

Mm, yea... that's reasonable I guess.

@stuhood stuhood requested a review from cosmicexplorer Jul 6, 2018

@mateor

mateor approved these changes Jul 6, 2018

@stuhood stuhood merged commit 2a14a72 into pantsbuild:master Jul 6, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Eric-Arellano Eric-Arellano deleted the Eric-Arellano:port-util-meta branch Jul 6, 2018

stuhood added a commit that referenced this pull request Jul 9, 2018

Port majority of pants/util (#6073)
Ports everything `util/` except for `objects.py` and `meta.py` (see #6072), `process_handler.py`, and `tarutil.py`.

Part of #6062

stuhood added a commit that referenced this pull request Jul 15, 2018

Fix test_objects handling of dataclass() py2-py3 compatibility (#6098)
Followup from #6072, which was an incorrect representation of the changes made to metaprogramming with Python 3 port. Part of #6062.

## Problem
`object.py`'s `datatype()` function expects exactly one type. This poses an issue when we add `from builtins import str` to files - in Py2, the type will become `newstr`, whereas in Py3 the type will become `str`. `type(newstr) != type(str)`, even though the behavior is similar, so `datatype()` will raise an exception.

We ran into this issue originally with `test_objects.py`. The original solution only fixed the tests, it didn't actually fix the underlying problem, which I realized when `engine/fs.py` starting breaking with PRs that touched it like the [`backend/jvm` port](#6092).

## Solution
The solution has two parts:
1) Wherever we define a class using the metaclass of `datatype`, set the parameter to `future.utils.text_type`, rather than `str`. This means in Py2 the type will be `unicode` and in Py3 the type will be `str`. Both have the same unicode semantics, which is a good thing.

```python
from future.utils import text_type

class Foo(datatype([('val', text_type)])): pass
```

2) When instantiating a class that uses `dataclass` as its metaclass, wrap the parameter calls with `text_type(x)`. This will call `unicode(x)` in Py2 and `str()` in Py3, resulting in the correct type for the class definition from the first step. 

```python
from future.utils import text_type

example = Foo(text_type('🐍'))
```

## What this impacts
Anywhere we create a class with `datatype()` that sets a parameter to `str`, we'll have to make this change to both the class definition and all of its instantiations. 

The biggest example of this is in `engine/fs.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment