New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

Merged
merged 5 commits into from Apr 13, 2018

Conversation

Projects
None yet
3 participants
@msullivan
Collaborator

msullivan commented Apr 12, 2018

Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.

To fix this, we instead precompute an Options object for each config
section, and in clone_for_module do a search for the most specific
configured Options (so for foo.bar, we try foo.bar, foo.bar.,
foo.
, in that order).

This cuts down the processing time from 23s to about 80ms.

The catch is that this is actually a backwards-incompatible semantics
change, in two ways:

  • Patterns can be of the form foo.bar and foo.bar.*, but can no longer be
    general purpose globs. We produce an error message to detect this misusage.
  • Patterns are now always applied based on specificity and not on the order they
    appear in the file. This means that some poorly-formed configuration files that
    contained options that were always overridden may have their meaning changed.
    This is probably not too common, so we don't do anything to deal with it.
    We could emit a warning when specificity-order and file-order disagree, but
    it seems wrong to lock people into the old behavior, and it seems like a silly
    thing to make configurable.

I had a really pretty trie-based solution that I wrote before realizing I could do it this way and it would probably be good enough. The trie-based solution is faster, but not enough faster to justify the extra complication :(

Rework (and heavily optimize!) mypy.ini per-module configuration
Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.

To fix this, we instead precompute an Options object for each config
section, and in `clone_for_module` do a search for the most specific
configured Options (so for foo.bar, we try foo.bar, foo.bar.*,
foo.*, in that order).

This cuts down the processing time from 23s to about 80ms.

The catch is that this is actually a backwards-incompatible semantics
change, in two ways:
 * Patterns can be of the form `foo.bar` and `foo.bar.*`, but can no longer be
   general purpose globs. We produce an error message to detect this misusage.
 * Patterns are now always applied based on specificity and not on the order they
   appear in the file. This means that some poorly-formed configuration files that
   contained options that were always overridden may have their meaning changed.
   This is probably not too common, so we don't do anything to deal with it.
   We could emit a warning when specificity-order and file-order disagree, but
   it seems wrong to lock people into the old behavior, and it seems like a silly
   thing to make configurable.

@msullivan msullivan requested review from JukkaL, gvanrossum and ilevkivskyi Apr 12, 2018

msullivan added some commits Apr 12, 2018

@JukkaL

JukkaL approved these changes Apr 13, 2018

Looks good! This fixed the performance regression in the S repo for me.

Left a few minor comments. Also, the typeshed update seems unrelated.

<https://docs.python.org/3.6/library/fnmatch.html>`_
separated by commas. These sections specify additional flags that
only apply to *modules* whose name matches at least one of the patterns.
present, where ``PATTERN1``, ``PATTERN2`` etc. are comma-separated

This comment has been minimized.

@JukkaL

JukkaL Apr 13, 2018

Collaborator

Grammar nit: You are missing some commas around 'etc.'. Maybe: PATTERN2, etc., are ....

xx.py:1: error: Function is missing a type annotation
mypy.ini: [mypy-*x*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
mypy.ini: [mypy-*y*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
== Return code: 0

This comment has been minimized.

@JukkaL

JukkaL Apr 13, 2018

Collaborator

Should the return code be non-zero?

This comment has been minimized.

@msullivan

msullivan Apr 13, 2018

Collaborator

Yes, but we don't currently error for most mypy.ini so I've stayed consistent for now.

[file spam/eggs.py]
[out]
Warning: unused section(s) in mypy.ini: [mypy-bar], [mypy-baz.*], [mypy-emarg.*], [mypy-emarg.hatch]
== Return code: 0

This comment has been minimized.

@JukkaL

JukkaL Apr 13, 2018

Collaborator

Again the 0 exit status is unexpected (but this was like this before this PR).

@msullivan

This comment has been minimized.

Collaborator

msullivan commented Apr 13, 2018

Typeshed update was a misfire, will fix

msullivan added some commits Apr 13, 2018

@msullivan msullivan merged commit 0d61fd0 into master Apr 13, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@msullivan msullivan deleted the options-optimiz2 branch Apr 13, 2018

@felixc

This comment has been minimized.

felixc commented May 3, 2018

Hi; just to clarify the change that's needed to existing config files: Does this mean that if you have a directory (say, "repositories"), which contains a bunch of subdirs, (call them "a" through "e"), all of which contain a "migrations" directory that should be ignored, we need to go from:

[mypy-*migrations*]
ignore_errors = True

to

[mypy-repositories.a.migrations.*,repositories.b.migrations.*,repositories.c.migrations.*,repositories.d.migrations.*,repositories.e.migrations.*]
ignore_errors = True

Or is there a more concise option, or one that would be robust to the hypothetical future addition of "f/migrations", "g/migrations", etc? Thanks!

@msullivan

This comment has been minimized.

Collaborator

msullivan commented May 3, 2018

Yeah, that is correct.

enricoghdn added a commit to enricoghdn/pyarmasync that referenced this pull request Sep 25, 2018

Updated mypy configuration for versions >=0.600
With version 0.600 the way mypy uses per-module configuration has changed (see python/mypy#4894) and the glob pattern 'test_*' could not be used to exclude tests. The solution was to add a '__init__.py' file to identify 'tests' as a proper package and exclude it completely from mypy checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment