New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add targets to re-wrap source files in different SourcesField
types.
#17877
Add targets to re-wrap source files in different SourcesField
types.
#17877
Conversation
|
I don't feel comfortable with this without @stuhood 's approval. I see the pieces and it could work in certain situations for sure, but this has me uneasy... Specifically:
That's not to say I don't love the idea, and the possibilities. But I worry about how much we're signing ourselves up for and wan to make sure it's well thought out. |
If I got my way, we'd support every source type that we can package or compile
The implications for formatting/linting etc are the same as for any other form of non-exporting codegen (since that's what this is). There's nothing fundamentally new here. |
Ahh yes very good point. |
import logging | ||
from dataclasses import dataclass | ||
from typing import Iterable, Union | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a UX perspective, it feels like this should draw inspiration from http_source
instead: https://www.pantsbuild.org/v2.15/docs/reference-resource#codesourcecode ... such that the syntax was more like:
experimental_shell_command(
name='cmd',
..,
)
python_sources(
sources=output_sources(":cmd"),
)
I know that @thejcannon also took advantage of codegen for the purposes of http_source
, but if we need to deepen the support for alternative sources fields types (because due to its implementation, http_source
is currently limited to resource
and file
) that seems like it would still be worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, see I knew roping Stu in was the right call. I like this syntax as well (bikeshed on the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been mulling this over for a while, and having looked at http_source
, I can see how it works, and could apply here. I'm happy to implement it if the discussion lands on implementing it in this way.
I've re-written this comment a few times, to make sure that my understanding of how things work lines up with what's actually going onside Pants. Some of the objections I initially had turned out to not be entirely correct.
I think there's some complication however, that justifies the conceptual separation of our sources
target generators and a corresponding multiple source wrapper targets. Count me as a medium no against adding the new value type.
Most x_sources
file types (which produce x_source
targets) are TargetFilesGenerators
, which currently produce GeneratedTargets
representing one target per file matched by the generator. Eventually, the MultipleSourcesField
on the target is used in the GeneratedTargets
-producing rule to resolve a SourcesPaths
, which does not consider codegen.
It would be possible to amend that -> GeneratedTargets
rule to act as a no-op if the MultipleSourcesField
's value were an output_sources
. I presume it would be possible to implement a GeneratedSources
rule for each of the MultipleSourcesField
s that we end up wanting to support. Rules would need to be registered per x_sources
target type, since codegen requires a single rule per (InputSourceField, OutputSourceField)
pair. This can't be as neat a solution as the TargetFilesGenerator
approach is for generating targets for actual files.
The conceptual gap here is the x_sources
Target would be a Target Generator when operating on source files, but a plain old Target
that resolves to several (codegenned) source files when using an output_sources
value. This makes it difficult to reason about the behaviour of this target type, and similarly difficult to document: This would significantly complicate the documentation for each of these x_sources
targets. It would need to explain that the target behaves significantly differently depending on whether you're using it for real files or to wrap generated files from elsewhere.
It also looks as though the behaviour of moved_fields
and copied_fields
would also be somewhat difficult to reason about -- they'd be neither moved, nor copied if the target is used directly? It seems like parametrize
also behaves weirdly here.
So I think an implementation using the existing target generators with a new output_sources
value type would (at the moment) comfortably land in the "difficult to explain" bucket. We are doing something significantly different to what the target types currently do.
On the other hand, the current implementation produces a bunch of target types (not ideal), but they're easier to reason about, and they're all grouped in help under experimental_wrap_as
, which should make them easy-ish to spot in the help.
This is experimental
, we can change the API later
Currently all of the wrapper types will be marked experimental
, which would allow us time to deal with some of the above items where possible. With that in mind, we could land this as currently proposed, and take the time to solve the above technical blockers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically I would assume this acts more like a single python_source.
Perhaps we can encode that requirement? 1 source and one target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thejcannon I don't think that's a good restriction to make here -- you could have an esc
that generates a pile of files for use as resources
, and it may not be possible to enumerate them all. Alternatively, you could have a process that generates a number of source files (not unreasonable in JVM land), and you need to include all of them. If you had 1 source per target, you'd end up with a lot of individual target definitions, and you'd need to manually map the dependencies between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming my technical understanding of GenerateSourcesRequest
and TargetFilesGenerator
.
Per my previous comment here, I think that needing to know what files will be generated ahead of time is of limited utility and creates a maintenance burden.
Currently, an x_sources
target generator needs to be written once, and will keep up to date as more x
files are added to a directory. Using a TargetFilesGenerator
that subsets the outputs from another target would mean the BUILD
file would need to be updated whenever the contents of the target would change. Admittedly, this would guarantee the best dependency/cache performance, however it would come at the expense of a maintenance burden of the build file, and the build errors that arise due to files not being captured from the outputs would be hard to debug.
On the other hand, the implementation in the PR can glob the outputs of another target, so for cases where all of the outputs are needed, the BUILD
file only needs to be written once, and remains straightforward. If you would benefit from more fine-grained targets, then you can can add new targets for each subset of files. The user has the option of choosing easy maintenance or fine-grained caching.
In my view, dependency inference is a nice-to-have for this use case, but not essential: in the experimental_shell_command
/experimental_run_in_sandbox
use cases, users have to specify dependencies manually.
There's also a firm red flag arising from enabling dependency inference for esc
use cases: Java's dependency inference code in particular would need to run the underlying esc
process to fulfil the SourceFilesRequest
s in its first-party symbol mapping. That would mean every esc
that fulfils a _sources
target would need to run before dependency inference can be performed, regardless of whether that dependency is necessary to fulfil the goal.
My preference here is for this idea to be wrapped up in an experimental
API with minimal surface area, so that we can get a better idea about how this is being used, and then proceed with a deeper change in machinery once we've got a more solid understanding of how it'll be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, an
x_sources
target generator needs to be written once, and will keep up to date as morex
files are added to a directory. Using aTargetFilesGenerator
that subsets the outputs from another target would mean theBUILD
file would need to be updated whenever the contents of the target would change. Admittedly, this would guarantee the best dependency/cache performance, however it would come at the expense of a maintenance burden of the build file, and the build errors that arise due to files not being captured from the outputs would be hard to debug.
I am not proposing that this integration would require explicitly listing the files to generate targets for, as I don't think that that buys you a significant difference in performance. Basically: if you required that all files be explicitly listed, ./pants list
would be able to skip generating code to generate targets, but ./pants dependencies
would need to generate code in order to compute dependencies for inference... and making list
faster isn't worth the API difference probably.
In my view, dependency inference is a nice-to-have for this use case, but not essential: in the
experimental_shell_command
/experimental_run_in_sandbox
use cases, users have to specify dependencies manually.
To be clear: this is about both the dependencies of the wrapped_as_
target, and dependencies on the wrapped_as_
target.
I might be fine with landing the wrappers as experimental, as long as we agree that they are even less stable than experimental_shell_command
itself, and should probably stabilize at a different time.
But I suspect that not having a stable API for this use case is a useful reminder that codegen itself still has some rough edges which might end up impacting the implementation (and maybe API) of experimental_shell_command
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but
./pants dependencies
would need to generate code in order to compute dependencies for inference
Yup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be fine with landing the wrappers as experimental, as long as we agree that they are even less stable than
experimental_shell_command itself
, and should probably stabilize at a different time.
Yes, I agree entirely.
To be clear, I think there's value in supporting alternative values for SourceField
s, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but
./pants dependencies
would need to generate code in order to compute dependencies for inferenceYup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.
Only those actually used in this position: not all codegen targets. But yea.
I could imagine this usecase as differentiated from how we handle "native" codegen: native codegen has a higher implementation difficulty bar, because you need to implement both dependency inference rules and the codegen itself.
To be clear, I think there's value in supporting alternative values for
SourceField
s, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.
I disagree that a PR is necessarily the wrong place to have this discussion. To reduce the need for this type of discussion on PRs it can be useful to ensure that design/UX are discussed before a PR is started... but in many cases, some of that ends up happening during code review.
Part of my difficulty wrapping my head around this might be squelched with a simple real-world use-case. Can you help me understand the problem we're trying to solve and how were falling short today? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless @thejcannon has further comments, I'm fine with this, as long as it is marked:
- as even more experimental than
experimental_shell_command
(which we expect to stabilize sooner) - as not supporting dependency inference
import logging | ||
from dataclasses import dataclass | ||
from typing import Iterable, Union | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but
./pants dependencies
would need to generate code in order to compute dependencies for inferenceYup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.
Only those actually used in this position: not all codegen targets. But yea.
I could imagine this usecase as differentiated from how we handle "native" codegen: native codegen has a higher implementation difficulty bar, because you need to implement both dependency inference rules and the codegen itself.
To be clear, I think there's value in supporting alternative values for
SourceField
s, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.
I disagree that a PR is necessarily the wrong place to have this discussion. To reduce the need for this type of discussion on PRs it can be useful to ensure that design/UX are discussed before a PR is started... but in many cases, some of that ends up happening during code review.
Yeah, nothing blocking, just trying to navigate this feature. I should say too, the idea is really exciting and excellent. That's part of why I'm so active on this PR. Your changes in this space have really exploded what's possible in an easy way. I just want to make sure the UX is as gooey and flexible as possible. |
Thanks all! I appreciate that this is a reasonably contentious decision, so thanks for helping flesh out the alternatives. Before I merge:
After I merge:
|
798064d
to
0ef05d3
Compare
@thejcannon @stuhood see 0ef05d3 for the commit that addresses Stu's remaining concerns |
See #17926 for the discussion of creating new value types for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww shucks this was still pending publishing 😓
@@ -32,10 +33,19 @@ | |||
tests_analysis, | |||
third_party_pkg, | |||
) | |||
from pants.core.util_rules.wrap_source import wrap_source_rule_and_target | |||
|
|||
wrap_golang = wrap_source_rule_and_target(GoPackageSourcesField, "kotlin_sources") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says "go" but the string says "kotlin" 👀
Pull request will be incoming shortly!
…On Fri, Jan 6, 2023 at 6:56 AM Joshua Cannon ***@***.***> wrote:
***@***.**** commented on this pull request.
Aww shucks this was still pending publishing 😓
------------------------------
In src/python/pants/backend/experimental/go/register.py
<#17877 (comment)>:
> @@ -32,10 +33,19 @@
tests_analysis,
third_party_pkg,
)
+from pants.core.util_rules.wrap_source import wrap_source_rule_and_target
+
+wrap_golang = wrap_source_rule_and_target(GoPackageSourcesField, "kotlin_sources")
This says "go" but the string says "kotlin" 👀
—
Reply to this email directly, view it on GitHub
<#17877 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEX5XDB5N3MOXR73WUCUYLWRAXCJANCNFSM6AAAAAATIBHOQM>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
pantsbuild#17877) This is primarily intended to be used with `experimental_shell_command`/`experimental_run_in_sandbox`, but will also work with _any_ target that produces a `SourcesField`/`FileSourceField` that is not applicable for use by dependent rules -- e.g. if your `experimental_run_in_sandbox` produces Python code that you want included as source in a Python distribution. The implementation is hacky at the moment, but it mostly solves a problem. Fixes pantsbuild#15493.
This is primarily intended to be used with
experimental_shell_command
/experimental_run_in_sandbox
, but will also work with any target that produces aSourcesField
/FileSourceField
that is not applicable for use by dependent rules -- e.g. if yourexperimental_run_in_sandbox
produces Python code that you want included as source in a Python distribution.The implementation is hacky at the moment, but it mostly solves a problem.
Fixes #15493.