Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target generators should not depend on generated targets? #13118

Closed
Eric-Arellano opened this issue Oct 5, 2021 · 13 comments · Fixed by #14727
Closed

Target generators should not depend on generated targets? #13118

Eric-Arellano opened this issue Oct 5, 2021 · 13 comments · Fixed by #14727

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Oct 5, 2021

(Relates to #13086, but this is a more focused question.)

Currently, a target generator depends on all of its generated targets:

# If it's a target generator, inject dependencies on all of its generated targets.
generated_addresses: tuple[Address, ...] = ()
if target_types_to_generate_requests.is_generator(tgt) and not tgt.address.is_generated_target:
generate_request = target_types_to_generate_requests[type(tgt)]
generated_targets = await Get(
GeneratedTargets, GenerateTargetsRequest, generate_request(tgt)
)
generated_addresses = tuple(generated_targets.keys())

This was added in #10511 so that target generators can be used as an alias to their generated targets, that you can use the generator as a CLI spec and in dependencies field as shorthand for everything generated.

But if you disregard that alias use case, it does not make sense for a target generator to depend on a generated target. The role should be reversed, that a generated target depends on its generator.

Concrete example: in Go, a go_mod target generates _go_external_package and (soon) _go_internal_package targets. When you change the file go.mod or go.sum, it would make sense for ./pants --changed-since=HEAD --changed-dependees=transitive to include all the generated targets - but it won't, because generated targets do not depend on their target generator.

It would be problematic if the generator and the generated cyclically depended on each other. Beyond this introducing cycles, it would break TransitiveTargets such that a generated target would depend on all its sibling generated targets.

From first principles, conceptually, why would a target generator depend on what it generates? That doesn't make intuitive sense to me.

Rather, I believe we modeled it this way to facilitate the alias mechanism of #13086. That suggests it's a leaky implementation detail of the alias mechanism, and we should consider how better to model this.

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Oct 5, 2021

I have no idea yet how to fix this with backwards compatibility etc. My focus for now is only determining how we would have done this if we could start over. cc @benjyw @stuhood

@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 5, 2021

But if you disregard that alias use case - the alias use case doesn't actually require that those deps exist, that was (as far as I can tell) just how we chose to implement it. But we can implement it another way, without the deps.

@Eric-Arellano
Copy link
Contributor Author

The role should be reversed, that a generated target depends on its generator....When you change the file go.mod or go.sum, it would make sense for ./pants --changed-since=HEAD --changed-dependees=transitive to include all the generated targets - but it won't, because generated targets do not depend on their target generator.

One nuance with the idea that generated targets should depend on their target generator: that will require adding something like #12953 so that the sources of a target generator are not always resolved by default when using Get(SourceFiles, SourceFilesRequest) on the transitive closure of the generated target. That's specifically necessary for file-based target generation, where python_sources generates python_source targets as a subset of its sources field.

If we don't fix that, then we'd pull in all the sibling files even if they're not needed. Example: thanks to this proposal, the python_source target project:lib#f.py depends on the python_sources target generator project:lib, which has sources=["*.py"]. project/f.py only imports project/dep.py, not other files like project/unrelated.py. Currently, if project:lib#f.py depended on project:lib, then the transitive closure would pull in all of project/*.py because that is the sources field of project:lib.

That is, we do want to express the dependency relationship between the generator and generated. But we don't want it to be a traditional dependency where we resolve the sources field.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 5, 2021

Arguably a generator target should not have a Sources field but something else, like GeneratorSources that are orthogonal and therefore invisible to Get(SourceFiles, SourceFilesRequest).

@Eric-Arellano
Copy link
Contributor Author

Precisely, that's what #12953 is about. I think I agree. Thanks!

@stuhood
Copy link
Sponsor Member

stuhood commented Oct 5, 2021

Arguably a generator target should not have a Sources field but something else, like GeneratorSources that are orthogonal and therefore invisible to Get(SourceFiles, SourceFilesRequest).

If this were the case, it would never make sense for the generated targets to depend on the generator, right? There would be no point.

Given that, I'm not sure that inverting dependencies from "generator -> generated" to "generated -> generator" makes sense.

If there is an alternative implementation of aliases that allows you to depend on all of the files (generated) in a target (generator) without the target depending on them, then maybe it would make sense. But that doesn't seem to be where #13086 is trending.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 6, 2021

Well, it can also still have sources (e.g., requirements.txt is a regular source of a python_requirements), so the right invalidation happens, but it doesn't share sources with the generated targets.

@Eric-Arellano
Copy link
Contributor Author

Well, it can also still have sources (e.g., requirements.txt is a regular source of a python_requirements), so the right invalidation happens, but it doesn't share sources with the generated targets.

Exactly. I think it depends on how the target generator works. For go_mod, it should have normal Sources to claim go.mod and go.sum. It should also probably have a field like internal_pkg_sources=["**/*.go"] which is GeneratingSources and does not claim ownership, only is used to know what to generate. Like ./pants tailor's notion of "triggering sources".

For python_sources, it does not have normal Sources because it doesn't own any of them. It only has generating sources.

For python_requirements, it has normal Sources for the requirements.txt because it owns that. But it does not have generating sources.

Maybe there's a better word than "generating sources", like "triggering sources".

@kaos
Copy link
Member

kaos commented Oct 6, 2021

Or adopted_sources? (As they'd be orphan without the generated target "adopting" them)

@Eric-Arellano
Copy link
Contributor Author

Ah ha! A user just pointed out that python_requirements and poetry_requirements macros are already operating with the semantics I propose: generated targets depend on their generator, rather than the other way around:

$ pants  --changed-since="origin/master" list                               
3rdparty:requirements.txt
build-support/deploy/base.py:lib
build-support/deploy/deploy.py:lib

$ pants  --changed-since="origin/master" --changed-dependees=transitive list
3rdparty:Cerberus
3rdparty:SQLAlchemy
3rdparty:alembic
... <everything> ..

That's happening because it's the old context-aware object factory API, and we used different semantics in 2.0 when adding "file targets" (now called generated targets).

I continue to think that a generated target depending on its generator is conceptually sound. Re the alias mechanism, I have an idea for how we can refactor the implementation so that we decouple the alias from how we set dependencies.

@Eric-Arellano
Copy link
Contributor Author

This is majorly confusing me with trying to achieve the CLI semantics at https://docs.google.com/document/d/1WWQM-X6kHoSCKwItqf61NiKFWNSlpnTC5QNu3ul9RDk/edit#.

Specifically, it confuses me because of this:

# TODO: If you use `Targets` here, then we replace the direct dep on the `go_mod` with all
# of its generated targets...Figure this out.
all_deps = await Get(UnexpandedTargets, DependenciesRequest(target[Dependencies]))
direct_dependencies = await MultiGet(
Get(BuildGoPackageRequest, BuildGoPackageTargetRequest(tgt.address))
for tgt in all_deps
if (
tgt.has_field(GoFirstPartyPackageSourcesField)
or tgt.has_field(GoThirdPartyModulePathField)
)
)

We are using dependency inference so that each generated Go package target depends automatically on its owning go_mod target. I think that makes a ton of sense: if you change go.mod, then --changed-dependees=direct should probably show all the generated targets!

But, at the same time that we add the dependency on the go_mod, our graph.py code adds a dependency on the go_mod to each of the generated targets. So, when resolving the direct dependencies of the go_mod and where you "expand aliases", i.e. replace target generators with all generated targets, then you pull in deps on everything generated by the go_mod, which is breaking and results in building the entire project all at once (causing compilation cycles).

We need to figure that dependency relationship out, I believe it is blocking fixing the CLI specs. Either we stop adding a dep from generated targets on their go_mod, or we stop having the go_mod depend on its generated targets. It seems obvious to me what conceptually makes the most sense.

@stuhood
Copy link
Sponsor Member

stuhood commented Oct 14, 2021

We need to figure that dependency relationship out, I believe it is blocking fixing the CLI specs. Either we stop adding a dep from generated targets on their go_mod, or we stop having the go_mod depend on its generated targets. It seems obvious to me what conceptually makes the most sense.

While I agree that this sounds like it needs to change, I really don't see how it blocks Specs... it seems independent. Specs is about which "roots" are matched, and should not ever effect what actually gets compiled.

So yea, let's fix this. But Specs themselves cannot cause a cycle.

Eric-Arellano added a commit that referenced this issue Jan 5, 2022
…f `sources: list[str]` (#14082)

To minimize behavior changes while implementing #14075, we want to make sure that `--changed-since=HEAD --changed-dependees=direct` will still claim that all generated `python_requirement` targets are impacted if the `requirements.txt` changed.

This would happen naturally if a generated target depended on its target generator: #13118. But that will require a lot more design work and is a big change we're not ready to make.

So we can work around this by still having `_python_requirements_file` even with target generators. But before doing that, we should change to `source: str` to actually represent the expected API of the target.

Because this is a private API, we make a breaking change.

[ci skip-rust]
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Mar 7, 2022

This is messing up codegen for Go. We currently have go_third_party_package directly depend on its go_mod, so that ./pants --changed-since=HEAD --changed-dependees=direct shows third-party packages as having changed if go.mod and go.sum have changed. But that means that we cannot use Targets:

# TODO: If you use `Targets` here, then we replace the direct dep on the `go_mod` with all
# of its generated targets...Figure this out.
all_deps = await Get(UnexpandedTargets, DependenciesRequest(target[Dependencies]))
maybe_direct_dependencies = await MultiGet(
Get(FallibleBuildGoPackageRequest, BuildGoPackageTargetRequest(tgt.address))
for tgt in all_deps
if (
tgt.has_field(GoPackageSourcesField)
or tgt.has_field(GoThirdPartyPackageDependenciesField)
or bool(maybe_get_codegen_request_type(tgt, union_membership))
)
)

Which means that if you depend on protobuf_sources (plural), we do not replace that with protobuf_source like we should be.

I actually no longer think my original proposal is relevant here....even if we fix dependencies of a target generator to not include generated targets, we've decided that we do want to keep the alias mechanism of target generators. Users have also confirmed they like the feature.

Instead, I think the only solution is to do what we do what do already do for python_requirements and poetry_requirements.

# This allows us to work around https://github.com/pantsbuild/pants/issues/13118. Because a
# generated target does not depend on its target generator, `--changed-since --changed-dependees`
# would not mark the generated targets as changing when the `requirements.txt` changes, even though
# it may be impacted. Fixing that will be A Thing and requires design work, so instead we can
# depend on this private target type that owns `requirements.txt` to get `--changed-since` working.
class PythonRequirementsFileTarget(Target):
alias = "_python_requirements_file"
core_fields = (*COMMON_TARGET_FIELDS, PythonRequirementsFileSourcesField)
help = "A private helper target type used by `python_requirement` target generators."

We would rename this to something like _target_generator_sources, then switch Go to use it. This has all the semantics we want! Only it's weird to have this private helper target.

cc @stuhood @tdyas

Eric-Arellano added a commit that referenced this issue Mar 8, 2022
…er` (#14727)

Closes #13118 and #12953. 

Really that first ticket was about getting `--changed-since` to work properly for generated targets that depend on the contents of a source file like `go.mod` or `requirements.txt`. (Note that this is different than `python_sources` -> `python_source` etc, which don't actually consume the contents of their `sources` during generation.)

I originally hypothesized that to do that we would need to stop having target generators depend on generated targets. But that was a red herring.

Instead, the simplest solution is to lean into the fix we've used for Python since #9307: generate a distinct target type to "own" the generating sources. Then add a dependency on that target to all the other generated targets. This gives us the exact semantics we want! It's not the most elegant solution—e.g. boilerplate for rule authors—but it solves a real problem well.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants