Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dependencies on conftest and init files via inference #10441

Merged

Conversation

stuhood
Copy link
Sponsor Member

@stuhood stuhood commented Jul 24, 2020

Problem

When attempting to release dev5 internally, many conftest.py files were being included without their dependencies by the pytest_runner conftest.py injection that was added in #10378.

Because that injection was not actually declaring dependencies on those files, it 1) wasn't locating owning targets, or 2) triggering dep inference on those files. This meant that even though we could successfully infer the deps of the conftest.py files, we weren't... just including them verbatim.

Solution

Move to inferring dependencies on conftest.py and __init__.py files, which will cause the dependencies of those files to be included, and cause them to be reported in our graph introspection goals.

Each thing that we can infer is provided by a separate inference @rule in @union InferDependenciesRequest, so this change also enables multiple inference rules per type of Sources.

Result

conftest.py files are included along with their dependencies, fixing all known test failures in dev5.

@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 24, 2020

The commits are useful to review independently, but this is currently a draft for two reasons:

  1. The third commit adds owning targets for __init__.py files that weren't owned (which would otherwise fail due to being partially included in mypy).
    • We should think about whether we want to just delete all of our __init__.py files instead, since they are all empty.
    • Also notable: internally, this change only required 3 BUILD file edits... conftest.py and __init__.py are overwhelmingly already owned by targets.
  2. We probably want options to toggle each inference implementation independently: it's possible that we want __init__.py and conftest.py inference each on by default, but not "python inference" in general.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

I tend to agree that we should nuke our own __init__.py files. But this exposes an issue: in practice existing python_library() targets will glob over __init__.py. If people aren't using dep inference they must now manually add a dep to that target, and they may end up dragging in a TON of stuff they don't need, and possibly cause cycles, even if the __init__.py is empty. With dep inference and file-level deps on I guess this isn't a problem. This seems like an unavoidable consequence of the fact that __init__.py has two roles : "Just a regular Python source file" and "thing that indicates that this dir represents a package, and is executed when you import that package". So really the healthy way to model code in a dir is to always have a separate target for __init__.py.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

Re separate toggles - possibly, yes, otherwise we're requiring people to write manual deps on parent __init__.py, which is unintuitive.

@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 24, 2020

If people aren't using dep inference they must now manually add a dep to that target, and they may end up dragging in a TON of stuff they don't need, and possibly cause cycles, even if the init.py is empty.

Well, see my second point above: I think that this inference rule should be on by default (since we effectively already turned it on by default by landing the "inject" stuff). But also, because this a file-level dependency, 1) it won't drag in any more files than you need, 2) it won't cause cycles because of #10409.

This seems like an unavoidable consequence of the fact that init.py has two roles : "Just a regular Python source file" and "thing that indicates that this dir represents a package, and is executed when you import that package".

Or not have them at all. In our case, there is zero need for them post PEP 420 (afaict: will confirm before landing). If that's true, we can delete all of them in a followup.

EDIT: It looks like __init__.py files are still somewhat necessary for mypy, even with namespace_packages turned on? Will want to revisit whether they can be removed in the future, but doesn't feel like a blocker for now.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

How would you feel about keeping the prior behavior for __init__.py and pytest_runner.py in addition to your change? The way AncestorFiles is implemented, if you have dep inference, we won't end up trying to re-find those missing files because they are already in prepared_sources.snapshot. So, dep inference will behave the same as you're intending.

For people not using dep inference, they'll get the injection. This avoids the problems Benjy is concerned about, like having to create lots of new targets and having new cycles.

@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 24, 2020

How would you feel about keeping the prior behavior for init.py and pytest_runner.py in addition to your change? The way AncestorFiles is implemented, if you have dep inference, we won't end up trying to re-find those missing files because they are already in prepared_sources.snapshot. So, dep inference will behave the same as you're intending.

That wouldn't fix the bug I don't think: see the Problem description. Basically, injecting the conftest.py files where we were was too late to have their dependencies inferred.

I think that __init__.py files that had imports would be fine for the reason you mention (they would be injected before inference), but it doesn't feel worthwhile to have file injection as a special case rather than using the framework that we already have for adding dependencies. And under the hood it means that we have a bunch of secretly multiple-owned files, which is confusing/magical.

The thing that would feel cleanest to me would be to have inference of both conftest.py and __init__.py on by default, but to continue to have the existing python inference off by default. See item 2 from #10441 (comment) ... if that seems reasonable to everyone.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

How would you feel about keeping the prior behavior for __init__.py and pytest_runner.py in addition to your change? The way AncestorFiles is implemented, if you have dep inference, we won't end up trying to re-find those missing files because they are already in prepared_sources.snapshot. So, dep inference will behave the same as you're intending.

I don't think this works - if the __init__.py isn't empty and does imports, we have to follow its imports, so the prior behavior was never correct.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

The thing that would feel cleanest to me would be to have inference of both conftest.py and __init__.py on by default, but to continue to have the existing python inference off by default. See item 2 from #10441 (comment) ... if that seems reasonable to everyone.

This seems reasonable to me.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

Not so sure about there being zero need for __init__.py in pep420. An empty __init__.py is still required in order to specify that a directory is a regular package. A missing __init__.py says that a dir is a namespace package, which may not be what you want.

For example, various parts of Django choke on namespace packages in your code.

@stuhood stuhood marked this pull request as ready for review July 24, 2020 07:08
@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 24, 2020

This is reviewable. The commits are still useful to look at independently.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Good idea to have 3 new options.

class PythonInference(Subsystem):
"""Options controlling which dependencies will be inferred for Python targets."""

options_scope = "python-infer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally we have subsystems be nouns. I'd recommend python-inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to have a dedicated subsystem, btw.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with that, but the option names were so much nicer with the verb... --python-infer-imports, --no-python-infer-imports, etc. Should I change the option names as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but this gets to Benjy's point that 90% of options are only ever set in pants.toml. I strongly suspect this will only be set in pants.toml, outside of a codebase admin experimenting with things.

So, it would read:

[python-inference]
imports = true
inits = true
conftests = true

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like [python-infer] even in config TBH

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaning toward leaving it then.

src/python/pants/engine/internals/graph.py Outdated Show resolved Hide resolved
src/python/pants/engine/internals/graph_test.py Outdated Show resolved Hide resolved
@stuhood stuhood force-pushed the stuhood/conftest-and-init-via-inference branch from 0e4b382 to 55b9a54 Compare July 24, 2020 19:15
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have no idea what happened to isort for you?

src/python/pants/backend/python/rules/pex.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/rules/pex_from_targets.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/rules/pex_test.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/rules/util.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/rules/run_setup_py_test.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/rules/util_test.py Outdated Show resolved Hide resolved
src/python/pants/engine/internals/graph_test.py Outdated Show resolved Hide resolved
src/python/pants/engine/internals/graph_test.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@@ -740,8 +763,8 @@ async def resolve_dependencies(
)

inference_request_types = union_membership.get(InferDependenciesRequest)
inferred = InferredDependencies()
if global_options.options.dependency_inference and inference_request_types:
inferred: Tuple[InferredDependencies, ...] = (InferredDependencies(),)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. It's fine to fix this in a followup, I know we're trying to land this for the release.

Suggested change
inferred: Tuple[InferredDependencies, ...] = (InferredDependencies(),)
inferred = ()

Comment on lines +10 to 14
python_library(
name='lib',
sources = ['__init__.py'],
tags = {"nolint"},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh I see the bug now! So, the __init__ inference rule found that there is indeed an owner of this file, but that owner isn't an actual Python target. It would be more robust if we instead had the inference rule filter out all the resulting owners to check that .has_field(PythonSources) is true.

Without this target definition added, we would have wanted the __init__.py inference rule to error that there is no Python owner of this file.

--

It's fine to fix this in a followup, imo, so long as we do fix it.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eric-Arellano
Copy link
Contributor

Hm, do note that this may result in far more dependencies being pulled in than previously if you are not using --python-infer-imports. Even though this will create a file level dep on the __init__.py file, that generated subtarget will copy over its parent's entire dependencies field. While we will tolerate any possible cycles, our invalidation will be much worse.

I'm not really sure what the remedy would be here. I'm glad that there is an escape hatch with turning off this feature.

@coveralls
Copy link

coveralls commented Jul 24, 2020

Coverage Status

Coverage remained the same at 0.0% when pulling 2a409be on stuhood:stuhood/conftest-and-init-via-inference into 864e77c on pantsbuild:master.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

Hm, do note that this may result in far more dependencies being pulled in than previously if you are not using --python-infer-imports. Even though this will create a file level dep on the __init__.py file, that generated subtarget will copy over its parent's entire dependencies field. While we will tolerate any possible cycles, our invalidation will be much worse.

I'm not really sure what the remedy would be here. I'm glad that there is an escape hatch with turning off this feature.

The only solution that is robust in this case is to have a separate target for __init__.py...

@stuhood stuhood force-pushed the stuhood/conftest-and-init-via-inference branch from 7437cce to d12ffd7 Compare July 27, 2020 04:28
@stuhood
Copy link
Sponsor Member Author

stuhood commented Jul 27, 2020

Hm, do note that this may result in far more dependencies being pulled in than previously if you are not using --python-infer-imports. Even though this will create a file level dep on the __init__.py file, that generated subtarget will copy over its parent's entire dependencies field. While we will tolerate any possible cycles, our invalidation will be much worse.
I'm not really sure what the remedy would be here. I'm glad that there is an escape hatch with turning off this feature.

The only solution that is robust in this case is to have a separate target for __init__.py...

Keep in mind too that under inference, the dep lists of most targets in BUILD files should be nearly empty: in that case, the effect of this change would be getting only the __init__.py file, and any inferred imports it contained.

@Eric-Arellano
Copy link
Contributor

Yes, agreed. I am describing the case where you are not using import inference. Mostly V1 users upgrading to V2.

@stuhood stuhood force-pushed the stuhood/conftest-and-init-via-inference branch from d12ffd7 to fccb6b3 Compare July 27, 2020 15:11
[ci skip-rust-tests]
# Rust tests will be skipped. Delete if not intended.
[ci skip-rust-tests]
# Rust tests will be skipped. Delete if not intended.
[ci skip-rust-tests]
# Rust tests will be skipped. Delete if not intended.
[ci skip-rust-tests]
@stuhood stuhood force-pushed the stuhood/conftest-and-init-via-inference branch from fccb6b3 to 2a409be Compare July 27, 2020 16:18
@@ -4,9 +4,6 @@
from dataclasses import dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is out of date in this file. It should remove all mention of __init__.py.

@stuhood stuhood merged commit 38bc143 into pantsbuild:master Jul 27, 2020
@stuhood stuhood deleted the stuhood/conftest-and-init-via-inference branch July 27, 2020 17:18
stuhood added a commit that referenced this pull request Jul 28, 2020
)

### Problem

#10441 caused a performance regression from about 10s to run `./pants dependencies --transitive ::` with inference enabled, to about 22s.

### Solution

Make independent owners requests per file we'd like to find owners, which allows the lookups for each file to be memoized independently.

### Result

`./pants dependencies --transitive ::` takes 15s. Although we are doing more work than before (before #10441, conftest discovery would only happen at `test` time), this is not ideal, and we should do further optimization before launch. But there are a few variables that will impact this soon that make it not the best time to optimize: 1) an intrinisic `PathGlobs->Paths` operation, 2) possibly enabling inference by default, allowing all of the strategies to use the module_mapper.

[ci skip-rust-tests]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants