Add targets to re-wrap source files in different `SourcesField` types. #17877

chrisjrn · 2022-12-23T20:27:37Z

This is primarily intended to be used with experimental_shell_command/experimental_run_in_sandbox, but will also work with any target that produces a SourcesField/FileSourceField that is not applicable for use by dependent rules -- e.g. if your experimental_run_in_sandbox produces Python code that you want included as source in a Python distribution.

The implementation is hacky at the moment, but it mostly solves a problem.

Fixes #15493.

chrisjrn · 2022-12-23T20:28:48Z

~~(Still needs tests, but want to air the idea before committing further :))~~

src/python/pants/core/util_rules/wrap_source.py

thejcannon · 2022-12-27T20:11:22Z

I don't feel comfortable with this without @stuhood 's approval.

I see the pieces and it could work in certain situations for sure, but this has me uneasy... Specifically:

What target source fields would we do this for? All of them? A subset (which subset)?
What does this mean for formatters/fixers/linters? Theoretically their FieldSets only require a PythonSourceField.

That's not to say I don't love the idea, and the possibilities. But I worry about how much we're signing ourselves up for and wan to make sure it's well thought out.

src/python/pants/core/util_rules/wrap_source_intergration_test.py

chrisjrn · 2022-12-27T20:14:30Z

@thejcannon

What target source fields would we do this for? All of them? A subset (which subset)?

If I got my way, we'd support every source type that we can package or compile

What does this mean for formatters/fixers/linters? Theoretically their FieldSets only require a PythonSourceField.

The implications for formatting/linting etc are the same as for any other form of non-exporting codegen (since that's what this is). There's nothing fundamentally new here.

thejcannon · 2022-12-27T20:22:34Z

The implications for formatting/linting etc are the same as for any other form of non-exporting codegen (since that's what this is). There's nothing fundamentally new here.

Ahh yes very good point.

stuhood · 2023-01-03T19:53:14Z

src/python/pants/core/util_rules/wrap_source.py

+import logging
+from dataclasses import dataclass
+from typing import Iterable, Union
+


From a UX perspective, it feels like this should draw inspiration from http_source instead: https://www.pantsbuild.org/v2.15/docs/reference-resource#codesourcecode ... such that the syntax was more like:

experimental_shell_command( name='cmd', .., ) python_sources( sources=output_sources(":cmd"), )

I know that @thejcannon also took advantage of codegen for the purposes of http_source, but if we need to deepen the support for alternative sources fields types (because due to its implementation, http_source is currently limited to resource and file) that seems like it would still be worth doing.

Ah yeah, see I knew roping Stu in was the right call. I like this syntax as well (bikeshed on the name).

So I've been mulling this over for a while, and having looked at http_source, I can see how it works, and could apply here. I'm happy to implement it if the discussion lands on implementing it in this way.

I've re-written this comment a few times, to make sure that my understanding of how things work lines up with what's actually going onside Pants. Some of the objections I initially had turned out to not be entirely correct.

I think there's some complication however, that justifies the conceptual separation of our sources target generators and a corresponding multiple source wrapper targets. Count me as a medium no against adding the new value type.

Most x_sources file types (which produce x_source targets) are TargetFilesGenerators, which currently produce GeneratedTargets representing one target per file matched by the generator. Eventually, the MultipleSourcesField on the target is used in the GeneratedTargets-producing rule to resolve a SourcesPaths, which does not consider codegen.

It would be possible to amend that -> GeneratedTargets rule to act as a no-op if the MultipleSourcesField's value were an output_sources. I presume it would be possible to implement a GeneratedSources rule for each of the MultipleSourcesFields that we end up wanting to support. Rules would need to be registered per x_sources target type, since codegen requires a single rule per (InputSourceField, OutputSourceField) pair. This can't be as neat a solution as the TargetFilesGenerator approach is for generating targets for actual files.

The conceptual gap here is the x_sources Target would be a Target Generator when operating on source files, but a plain old Target that resolves to several (codegenned) source files when using an output_sources value. This makes it difficult to reason about the behaviour of this target type, and similarly difficult to document: This would significantly complicate the documentation for each of these x_sources targets. It would need to explain that the target behaves significantly differently depending on whether you're using it for real files or to wrap generated files from elsewhere.

It also looks as though the behaviour of moved_fields and copied_fields would also be somewhat difficult to reason about -- they'd be neither moved, nor copied if the target is used directly? It seems like parametrize also behaves weirdly here.

So I think an implementation using the existing target generators with a new output_sources value type would (at the moment) comfortably land in the "difficult to explain" bucket. We are doing something significantly different to what the target types currently do.

On the other hand, the current implementation produces a bunch of target types (not ideal), but they're easier to reason about, and they're all grouped in help under experimental_wrap_as, which should make them easy-ish to spot in the help.

This is experimental, we can change the API later

Currently all of the wrapper types will be marked experimental, which would allow us time to deal with some of the above items where possible. With that in mind, we could land this as currently proposed, and take the time to solve the above technical blockers.

Specifically I would assume this acts more like a single python_source.
Perhaps we can encode that requirement? 1 source and one target.

@thejcannon I don't think that's a good restriction to make here -- you could have an esc that generates a pile of files for use as resources, and it may not be possible to enumerate them all. Alternatively, you could have a process that generates a number of source files (not unreasonable in JVM land), and you need to include all of them. If you had 1 source per target, you'd end up with a lot of individual target definitions, and you'd need to manually map the dependencies between them.

Thanks for confirming my technical understanding of GenerateSourcesRequest and TargetFilesGenerator.

Per my previous comment here, I think that needing to know what files will be generated ahead of time is of limited utility and creates a maintenance burden.

Currently, an x_sources target generator needs to be written once, and will keep up to date as more x files are added to a directory. Using a TargetFilesGenerator that subsets the outputs from another target would mean the BUILD file would need to be updated whenever the contents of the target would change. Admittedly, this would guarantee the best dependency/cache performance, however it would come at the expense of a maintenance burden of the build file, and the build errors that arise due to files not being captured from the outputs would be hard to debug.

On the other hand, the implementation in the PR can glob the outputs of another target, so for cases where all of the outputs are needed, the BUILD file only needs to be written once, and remains straightforward. If you would benefit from more fine-grained targets, then you can can add new targets for each subset of files. The user has the option of choosing easy maintenance or fine-grained caching.

In my view, dependency inference is a nice-to-have for this use case, but not essential: in the experimental_shell_command/experimental_run_in_sandbox use cases, users have to specify dependencies manually.

There's also a firm red flag arising from enabling dependency inference for esc use cases: Java's dependency inference code in particular would need to run the underlying esc process to fulfil the SourceFilesRequests in its first-party symbol mapping. That would mean every esc that fulfils a _sources target would need to run before dependency inference can be performed, regardless of whether that dependency is necessary to fulfil the goal.

My preference here is for this idea to be wrapped up in an experimental API with minimal surface area, so that we can get a better idea about how this is being used, and then proceed with a deeper change in machinery once we've got a more solid understanding of how it'll be used.

Currently, an x_sources target generator needs to be written once, and will keep up to date as more x files are added to a directory. Using a TargetFilesGenerator that subsets the outputs from another target would mean the BUILD file would need to be updated whenever the contents of the target would change. Admittedly, this would guarantee the best dependency/cache performance, however it would come at the expense of a maintenance burden of the build file, and the build errors that arise due to files not being captured from the outputs would be hard to debug.

I am not proposing that this integration would require explicitly listing the files to generate targets for, as I don't think that that buys you a significant difference in performance. Basically: if you required that all files be explicitly listed, ./pants list would be able to skip generating code to generate targets, but ./pants dependencies would need to generate code in order to compute dependencies for inference... and making list faster isn't worth the API difference probably.

In my view, dependency inference is a nice-to-have for this use case, but not essential: in the experimental_shell_command/experimental_run_in_sandbox use cases, users have to specify dependencies manually.

To be clear: this is about both the dependencies of the wrapped_as_ target, and dependencies on the wrapped_as_ target.

I might be fine with landing the wrappers as experimental, as long as we agree that they are even less stable than experimental_shell_command itself, and should probably stabilize at a different time.

But I suspect that not having a stable API for this use case is a useful reminder that codegen itself still has some rough edges which might end up impacting the implementation (and maybe API) of experimental_shell_command in the future.

but ./pants dependencies would need to generate code in order to compute dependencies for inference

Yup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.

I might be fine with landing the wrappers as experimental, as long as we agree that they are even less stable than experimental_shell_command itself, and should probably stabilize at a different time.

Yes, I agree entirely.

To be clear, I think there's value in supporting alternative values for SourceFields, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.

but ./pants dependencies would need to generate code in order to compute dependencies for inference

Yup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.

Only those actually used in this position: not all codegen targets. But yea.

I could imagine this usecase as differentiated from how we handle "native" codegen: native codegen has a higher implementation difficulty bar, because you need to implement both dependency inference rules and the codegen itself.

To be clear, I think there's value in supporting alternative values for SourceFields, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.

I disagree that a PR is necessarily the wrong place to have this discussion. To reduce the need for this type of discussion on PRs it can be useful to ensure that design/UX are discussed before a PR is started... but in many cases, some of that ends up happening during code review.

thejcannon · 2023-01-03T23:36:32Z

Part of my difficulty wrapping my head around this might be squelched with a simple real-world use-case. Can you help me understand the problem we're trying to solve and how were falling short today?

stuhood

Unless @thejcannon has further comments, I'm fine with this, as long as it is marked:

as even more experimental than experimental_shell_command (which we expect to stabilize sooner)
as not supporting dependency inference

stuhood · 2023-01-04T22:47:58Z

src/python/pants/core/util_rules/wrap_source.py

+import logging
+from dataclasses import dataclass
+from typing import Iterable, Union
+


but ./pants dependencies would need to generate code in order to compute dependencies for inference

Yup, that's the red flag I had above, you would need to run the processes in order to compute dependencies, and that would have an impact on performance: all source-providing targets would need to be evaluated before the dependency inference would take place. This could have a significant impact on performance.

Only those actually used in this position: not all codegen targets. But yea.

I could imagine this usecase as differentiated from how we handle "native" codegen: native codegen has a higher implementation difficulty bar, because you need to implement both dependency inference rules and the codegen itself.

To be clear, I think there's value in supporting alternative values for SourceFields, but at the moment, but it's probably a bigger design task with riskier scope than we can solve in a PR discussion.

I disagree that a PR is necessarily the wrong place to have this discussion. To reduce the need for this type of discussion on PRs it can be useful to ensure that design/UX are discussed before a PR is started... but in many cases, some of that ends up happening during code review.

thejcannon · 2023-01-04T23:16:34Z

Yeah, nothing blocking, just trying to navigate this feature.

I should say too, the idea is really exciting and excellent. That's part of why I'm so active on this PR. Your changes in this space have really exploded what's possible in an easy way. I just want to make sure the UX is as gooey and flexible as possible.

chrisjrn · 2023-01-05T16:35:46Z

Thanks all! I appreciate that this is a reasonably contentious decision, so thanks for helping flesh out the alternatives.

Before I merge:

This will stabilise separately to experimental_shell_command and friends
Documentation will be added, indicating that these must be specified manually as dependencies
Wrappers will be added for remaining first-party sources types*

After I merge:

Create ticket, linking back to this PR, discussing the proposal for new SourceField values

Noting that this will specifically not work for codegen source types, which is a great argument in favour of adding these additional SourceField values at a later date.

…status in documentation.

chrisjrn · 2023-01-05T18:03:38Z

@thejcannon @stuhood see 0ef05d3 for the commit that addresses Stu's remaining concerns

chrisjrn · 2023-01-05T18:50:18Z

See #17926 for the discussion of creating new value types for source(s) fields.

thejcannon

Aww shucks this was still pending publishing 😓

thejcannon · 2023-01-05T19:02:33Z

src/python/pants/backend/experimental/go/register.py

@@ -32,10 +33,19 @@
    tests_analysis,
    third_party_pkg,
 )
+from pants.core.util_rules.wrap_source import wrap_source_rule_and_target
+
+wrap_golang = wrap_source_rule_and_target(GoPackageSourcesField, "kotlin_sources")


This says "go" but the string says "kotlin" 👀

chrisjrn · 2023-01-06T16:07:38Z

Pull request will be incoming shortly!

…

On Fri, Jan 6, 2023 at 6:56 AM Joshua Cannon ***@***.***> wrote: ***@***.**** commented on this pull request. Aww shucks this was still pending publishing 😓 ------------------------------ In src/python/pants/backend/experimental/go/register.py <#17877 (comment)>: > @@ -32,10 +33,19 @@ tests_analysis, third_party_pkg, ) +from pants.core.util_rules.wrap_source import wrap_source_rule_and_target + +wrap_golang = wrap_source_rule_and_target(GoPackageSourcesField, "kotlin_sources") This says "go" but the string says "kotlin" 👀 — Reply to this email directly, view it on GitHub <#17877 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEX5XDB5N3MOXR73WUCUYLWRAXCJANCNFSM6AAAAAATIBHOQM> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

pantsbuild#17877) This is primarily intended to be used with `experimental_shell_command`/`experimental_run_in_sandbox`, but will also work with _any_ target that produces a `SourcesField`/`FileSourceField` that is not applicable for use by dependent rules -- e.g. if your `experimental_run_in_sandbox` produces Python code that you want included as source in a Python distribution. The implementation is hacky at the moment, but it mostly solves a problem. Fixes pantsbuild#15493.

Christopher Neugebauer added 6 commits December 23, 2022 11:33

Add reexport.py and a single implementation for Python

b015262

Make experimental_export_python more generic

3ad3483

Add rule to reexport ResourcesSourceField

a34085c

Add help text; rename reexport to wrap_as

01ee230

Remove spurious comments

f1961ae

revert shell_command.py

c45f68f

chrisjrn marked this pull request as ready for review December 23, 2022 20:27

chrisjrn added the category:new feature label Dec 23, 2022

chrisjrn commented Dec 23, 2022

View reviewed changes

src/python/pants/core/util_rules/wrap_source.py Outdated Show resolved Hide resolved

Christopher Neugebauer added 2 commits December 23, 2022 13:29

use DigestSubset

4b99fc5

Add integration test for experimental_wrap_as_python_sources

1277839

chrisjrn requested review from benjyw, stuhood, Eric-Arellano and thejcannon December 27, 2022 19:54

fix help

58a899a

thejcannon reviewed Dec 27, 2022

View reviewed changes

src/python/pants/core/util_rules/wrap_source_intergration_test.py Show resolved Hide resolved

chrisjrn requested a review from thejcannon December 29, 2022 17:29

stuhood reviewed Jan 3, 2023

View reviewed changes

stuhood approved these changes Jan 4, 2023

View reviewed changes

Christopher Neugebauer added 2 commits January 5, 2023 09:32

Merge branch 'main' into chrisjrn/generic-file-exports-2

98ac8c4

Add remaining wrappers and address dependency inference/experimental …

0ef05d3

…status in documentation.

chrisjrn force-pushed the chrisjrn/generic-file-exports-2 branch from 798064d to 0ef05d3 Compare January 5, 2023 18:02

chrisjrn mentioned this pull request Jan 5, 2023

Allow source(s) targets to specify sources from places other than the filesystem #17926

Open

chrisjrn merged commit 2c65434 into pantsbuild:main Jan 5, 2023

thejcannon reviewed Jan 6, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add targets to re-wrap source files in different `SourcesField` types. #17877

Add targets to re-wrap source files in different `SourcesField` types. #17877

chrisjrn commented Dec 23, 2022 •

edited

chrisjrn commented Dec 23, 2022 •

edited

thejcannon commented Dec 27, 2022

chrisjrn commented Dec 27, 2022 •

edited

thejcannon commented Dec 27, 2022

stuhood Jan 3, 2023 •

edited

thejcannon Jan 3, 2023

chrisjrn Jan 3, 2023

thejcannon Jan 3, 2023

chrisjrn Jan 3, 2023

chrisjrn Jan 4, 2023 •

edited

stuhood Jan 4, 2023 •

edited

chrisjrn Jan 4, 2023

chrisjrn Jan 4, 2023

stuhood Jan 4, 2023

thejcannon commented Jan 3, 2023

stuhood left a comment

stuhood Jan 4, 2023

thejcannon commented Jan 4, 2023

chrisjrn commented Jan 5, 2023

chrisjrn commented Jan 5, 2023

chrisjrn commented Jan 5, 2023

thejcannon left a comment

thejcannon Jan 5, 2023

chrisjrn commented Jan 6, 2023 via email

Add targets to re-wrap source files in different SourcesField types. #17877

Add targets to re-wrap source files in different SourcesField types. #17877

Conversation

chrisjrn commented Dec 23, 2022 • edited

chrisjrn commented Dec 23, 2022 • edited

thejcannon commented Dec 27, 2022

chrisjrn commented Dec 27, 2022 • edited

thejcannon commented Dec 27, 2022

stuhood Jan 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjrn Jan 4, 2023 • edited

Choose a reason for hiding this comment

stuhood Jan 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thejcannon commented Jan 3, 2023

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thejcannon commented Jan 4, 2023

chrisjrn commented Jan 5, 2023

chrisjrn commented Jan 5, 2023

chrisjrn commented Jan 5, 2023

thejcannon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjrn commented Jan 6, 2023 via email

Add targets to re-wrap source files in different `SourcesField` types. #17877

Add targets to re-wrap source files in different `SourcesField` types. #17877

chrisjrn commented Dec 23, 2022 •

edited

chrisjrn commented Dec 23, 2022 •

edited

chrisjrn commented Dec 27, 2022 •

edited

stuhood Jan 3, 2023 •

edited

chrisjrn Jan 4, 2023 •

edited

stuhood Jan 4, 2023 •

edited