Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed improvements for typing #439

Merged
merged 14 commits into from
Jul 17, 2017
Merged

Conversation

ilevkivskyi
Copy link
Member

This PR contains two performance improvements based on profiling and inspired by #432:

  • Inline _gorg and _geqv
  • Remove (IMO unnecessary) "duplicates" from MRO.

This gives roughly 5% speed-up for mypy mypy. @JukkaL I will be grateful if you could do more benchmarking with some real-life large codebases.

@ilevkivskyi
Copy link
Member Author

This now only touches Python 3 version. If Jukka's results will be promising, then I will also backport this to Python 2.

@JelleZijlstra
Copy link
Member

I'm also seeing a few % speedup running mypy on my codebase.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense as a speedup as long as you can convince me the semantics are unchanged.

"""
assert isinstance(a, GenericMeta) and isinstance(b, GenericMeta)
# Reduce each to its origin.
return _gorg(a) is _gorg(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is _gorb(b) entirely redundant here? I see that you've inlined this as a._gorg is b everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is just a coincidence, maybe it was different in older versions, but now b (and its ._gorg) is known "statically" in all places where _geqv is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha!

src/typing.py Outdated
new_bases = []
for base in bases:
if isinstance(base, GenericMeta):
bextra = getattr(base, '__extra__', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're micro-optimizing, this call is wasted when extra is falsey or origin is truthy (if I understand the condition below correctly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I reshuffled code here, so that we quickly escape if origin is truthy and then if extra is falsey.

src/typing.py Outdated
if isinstance(base, GenericMeta):
bextra = getattr(base, '__extra__', None)
if (extra and bextra and not origin and bextra not in bases and
type(bextra) is abc.ABCMeta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this is equivalent to the original code (I would have expected something simpler, matching _gorg(b) if isinstance(b, GenericMeta) else b.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is this is not totally equivalent. This makes more "aggressive" erasure for classes with __extra__, this shortens MROs for special typing types. Compare:
Before:

>>> pprint(List[int].__mro__)
(typing.List[int],
 typing.List,
 <class 'list'>,
 typing.MutableSequence,
 <class 'collections.abc.MutableSequence'>,
 typing.Sequence,
 <class 'collections.abc.Sequence'>,
 typing.Reversible,
 <class 'collections.abc.Reversible'>,
 typing.Collection,
 <class 'collections.abc.Collection'>,
 <class 'collections.abc.Sized'>,
 typing.Iterable,
 <class 'collections.abc.Iterable'>,
 typing.Container,
 <class 'collections.abc.Container'>,
 typing.Generic,
 <class 'object'>)

and after:

(typing.List[int],
 typing.List,
 <class 'list'>,
 <class 'collections.abc.MutableSequence'>,
 <class 'collections.abc.Sequence'>,
 <class 'collections.abc.Reversible'>,
 <class 'collections.abc.Collection'>,
 <class 'collections.abc.Sized'>,
 <class 'collections.abc.Iterable'>,
 <class 'collections.abc.Container'>,
 <class 'object'>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually a bit worried maybe this is too aggressive, since sometimes we even lose Generic, for example:

>>> pprint(MutableMapping.__mro__)
(typing.MutableMapping,
 <class 'collections.abc.MutableMapping'>,
 <class 'collections.abc.Mapping'>,
 <class 'collections.abc.Collection'>,
 <class 'collections.abc.Sized'>,
 <class 'collections.abc.Iterable'>,
 <class 'collections.abc.Container'>,
 <class 'object'>)

Although all tests pass, and I tried to play with this and it works as expected (it is still an instance of GenericMeta). Probably, I will add Generic at the end of bases in such cases (just for consistency).

@gvanrossum
Copy link
Member

PS. I'm not sure how we would test this speedup with real code. Did you ask for us to have this patched into the Python version we use to run mypy and then run mypy over our large codebase? That makes sense except the time there is pretty variable (since it runs so long that all sorts of other activities on the same machine affect the timing).

@ilevkivskyi ilevkivskyi changed the title Speed improvements for typing [WIP] Speed improvements for typing Jun 12, 2017
@ilevkivskyi
Copy link
Member Author

I am marking this as WIP until I figure out what is going on with Python 2.

@gvanrossum

PS. I'm not sure how we would test this speedup with real code. Did you ask for us to have this patched into the Python version we use to run mypy and then run mypy over our large codebase? That makes sense except the time there is pretty variable (since it runs so long that all sorts of other activities on the same machine affect the timing).

Yes, even if it is not super-stable it would be great to have few more data points.

@ilevkivskyi ilevkivskyi changed the title [WIP] Speed improvements for typing Speed improvements for typing Jun 12, 2017
@ilevkivskyi
Copy link
Member Author

@gvanrossum

I decided to keep only obvious optimizations, and leave more aggressive things (that might change semantics) out for now. I think you should be happy with this now.

I will probably make another PR soon dealing with (excessively) long MROs.


# remove bare Generic from bases if there are other generic bases
if any(isinstance(b, GenericMeta) and b is not Generic for b in bases):
bases = tuple(b for b in bases if b is not Generic)
namespace.update({'__origin__': origin, '__extra__': extra})
self = super(GenericMeta, cls).__new__(cls, name, bases, namespace)
super(GenericMeta, self).__setattr__('_gorg', self if not origin
else origin._gorg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace nit: I don't love how this line is broken up; either split before the comma or indent the else to align with the if.

"""
assert isinstance(a, GenericMeta) and isinstance(b, GenericMeta)
# Reduce each to its origin.
return _gorg(a) is _gorg(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha!

@@ -1568,7 +1546,7 @@ def no_type_check(arg):
if isinstance(arg, type):
arg_attrs = arg.__dict__.copy()
for attr, val in arg.__dict__.items():
if val in arg.__bases__:
if val in arg.__bases__ + (arg,):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added (arg,) needed? And why not in the PY2 version? (If there's a good reason please add a comment.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is better to have them in both versions, @no_type_chek can be also used in Python 2, fixed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's needed because arg._gorg == arg. Fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's needed because arg._gorg == arg

Exactly. Sorry, I have answered only half of your question.

@@ -1568,7 +1546,7 @@ def no_type_check(arg):
if isinstance(arg, type):
arg_attrs = arg.__dict__.copy()
for attr, val in arg.__dict__.items():
if val in arg.__bases__:
if val in arg.__bases__ + (arg,):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's needed because arg._gorg == arg. Fine.

@gvanrossum
Copy link
Member

I'm torn about whether to merge this. Maybe we should wait until 3.6.2 has actually been released? The RC is supposed to be cut today, I expect the final release in a few weeks (see PEP 494).

@ilevkivskyi
Copy link
Member Author

Maybe we should wait until 3.6.2 has actually been released?

This makes sense. We can keep the master "clean" (i.e. identical with current CPython state) for any emergency fixes (just in case). Then I can merge this right after the release of 3.6.2.final.

@ilevkivskyi
Copy link
Member Author

OK, Python 3.6.2 was released earlier today, so that I am merging this now.

@ilevkivskyi ilevkivskyi merged commit 2b6932a into python:master Jul 17, 2017
@ilevkivskyi ilevkivskyi deleted the speed-up-typing branch July 17, 2017 13:58
Michael0x2a added a commit to Michael0x2a/typing that referenced this pull request Jul 17, 2017
Incorporates change made by python#439,
which was recently merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants