Figure out codegen's `runtime_dependencies` and multiple resolves #14484

Eric-Arellano · 2022-02-15T12:25:49Z

Codegen like [python-protobuf] uses the --runtime-dependencies option and "depndency injection" to dynamically add a dependency on runtime dependencies like protobuf. Those are necessary for the generated code to work properly.

However, those python_requirement targets must belong to a single "resolve".

Semantically, we need to support having multiple different versions of the same runtime deps, one set per resolve. That's key for the feature, that we allow you to have conflicting versions of the same dep.

Logistically, we need to figure this out because you often won't be able to use --runtime-dependencies if you are using multiple resolves. It would mean that only code using the same resolve as the runtime deps could use codegen. Otherwise, you'd have to disable the option and manually add the runtime deps.

--

A very simple fix is to replace runtime_dependencies with runtime_dependencies_per_resolve.

[python-protobuf]
runtime_dependencies_per_resolve = { python-default = ["//:reqs#protobuf"] }

The downside is this makes the default case of only one resolve more complex.

The text was updated successfully, but these errors were encountered:

tdyas · 2022-02-15T12:53:28Z

Maybe something akin to what we do for scala-library would work? To inject the scala-library, the Scala backend searches the resolve for the jvm_artifact that provides scala-library. Imagine if we could assign or define some sort of "ID" for runtime dependencies, then Pants can search the resolve for the language-specific target in that resolve that matches the ID.

Further thought on whether this would be language-specific or could be made generic.

On the other hand, target addresses are also generic enough, around known by users, and good enough for a v1 of this. Making it easier to specify runtime deps can be future work.

tdyas · 2022-02-15T12:53:59Z

Also, I assume all of the other codegen impls to come (e.g., Java/Scala) will have this issue.

Eric-Arellano · 2022-02-15T13:26:04Z

Maybe something akin to what we do for scala-library would work? To inject the scala-library, the Scala backend searches the resolve for the jvm_artifact that provides scala-library.

Great suggestion!! We already know what resolve each python_requirement comes from, too. As discussed on the Scala ticket, it's a bummer this requires consulting AllTargets, but we already are doing that for dep inference. And you can disable this feature if you don't like it.

Check out #14486 for how we now have to associate resolves to the FirstPartyPythonMappingImpl used for Python dep inference on Protobuf and Thrift. The naive thing to do would to claim that codegen can create files for every resolve, so {"a": {"foo_pb2": ...}, "b": {"foo_pb2": ...}}. The fancier thing would be to only add to resolves that have the runtime libraries defined. Not clear to me whether that's good magic or not.

--

Oh huh, that's tricky though. We right now are storing the dependency on the protobuf_source target, not the python_source target.

To know which resolve to use, we need to know which specific python_source target depends on the protobuf_source. The dep will need to move to being on the python_source.

Eric-Arellano · 2022-02-22T23:07:39Z

This is an issue for my original proposal of runtime_dependencies_per_resolve too:

Oh huh, that's tricky though. We right now are storing the dependency on the protobuf_source target, not the python_source target.

To know which resolve to use, we need to know which specific python_source target depends on the protobuf_source. The dep will need to move to being on the python_source.

Any thoughts @stuhood ?

stuhood · 2022-02-23T00:41:13Z

I don't think that I understand the question. But it makes sense to me for the dependency on the runtime library to be injected on the protobuf_source target, and for a Python resolve to be added as a plugin field to protobuf_source. See #14258 (comment) re: plugin fields.

Eric-Arellano · 2022-03-02T22:51:24Z

Indeed: working on Go via #14258 is illuminating here. For the generated Go code to compile, we need to have the Go Protobuf runtime library built first. Note that as a compiled language, we need to first compile that generated Go code before normal first-party code can consume the generated code; so it's too late to grab the Protobuf dependency by relying on a go_package dependee, we need it earlier. Compare this to Python where we only need to make sure the final transitive closure includes a runtime library.

That makes sense conceptually to me. It also avoids the major problem I tried to mention above where we would need to make our dependency inference code much fancier: if we were to move the runtime_dependency from protobuf_source into python_source, then we would now need our Python dependency inference code to realize it's depending on protobuf_source vs apache_source & then have logic to determine which python_requirement to use based on the resolve of the python_source. It would be possible I think, but a complex leaky abstraction.

--

My only hesitation here is the interaction of parametrization with multiple languages. Consider this:

protobuf_source(
   name="users"
   source="users.proto",
   python_resolve=parametrize("py-a", "py-b")
   jvm_resolve="java-default",
)

This will create 2 protobuf_source targets, both with the same resolve for jvm_resolve. For JVM purposes, the two targets are practically identical, and notably this will result in dependency inference ambiguity.

One solution is Tom's suggestion to give up on protobuf_source in favor of protobuf_python_source, protobuf_jvm_source, etc: #14258 (comment). That would work around this, but I agree with Stu that it's not desirable #14258 (comment). A major goal of codegen in v2 is to have a single target that describes the protocol file, and achieving that goal motivated a lot of the Target API. It would be a UX regression to require multiple targets.

Any suggestions @stuhood? I don't think configurations would solve this either iiuc.

The best I can think of is somehow updating our dep inference code to recognize that users.proto:users@python_resolve=a is functionally equivalent to users.proto:users@python_resolve=b for JVM, and arbitrarily choose one...

stuhood · 2022-03-03T00:07:11Z

Any suggestions @stuhood? I don't think configurations would solve this either iiuc.

Yea, interesting. Configurations could address it, but only if you required some other marker field to be present, to explicitly request codegen for a particular instance. For example, as a strawname: generate_java=True (or requiring the resolve to be set explicitly). Then you could use configuration to disable some parametrizations of the target:

configuration(name="py-a", python_resolve="py-a")
configuration(name="py-b", python_resolve="py-b")
configuration(name="java-default", java_resolve="java-default", generate_java=True)

protobuf_source(
   name="users"
   source="users.proto",
   configuration=parametrize(":py-a", ":py-b", ":java-default"),
)

Don't love that though. It's important that the resolve field defaults, but it makes this case tricky because there is no explicit marker.

tdyas · 2022-03-03T00:07:33Z

One solution is Tom's suggestion to give up on protobuf_source in favor of protobuf_python_source, protobuf_jvm_source, etc: #14258 (comment). That would work around this, but I agree with Stu that it's not desirable #14258 (comment).

The developer experience need not have the separate target types exposed to the user. Would it be beneficial to "split off" language-specific targets from the protobuf_source target using target generation?

ADDED: protobuf_source would ask the language backends to generate their applicable targets using the fields on the protobuf_source (using a union or whatever).

stuhood · 2022-03-03T00:09:09Z

For example, as a strawname: generate_java=True (or requiring the resolve to be set explicitly).

Oh: An explicit generators string-list value field would work here, and could be used with configuration to disable in some cases.

Eric-Arellano · 2022-03-03T02:23:30Z

The developer experience need not have the separate target types exposed to the user. Would it be beneficial to "split off" language-specific targets from the protobuf_source target using target generation?

Possibly, but some weird edges.

We'd need to define multiple new "atom" targets like {protobuf,thrift}_{go,python,java,scala}_source, which is generally undesirable to have an explosion of targets.
What does protobuf_source do now? If it generates those N language atom-targets, why have that and protobuf_sources? Why not have protobuf_python_sources plural?
Address scheme: would it be foo/f.proto#python vs foo/f.proto#golang? We could do that, but makes an already complex address even more complex.

--

Oh: An explicit generators string-list value field would work here, and could be used with configuration to disable in some cases.

This?

configuration(name="py-a", python_resolve="py-a", generators=["python"])
configuration(name="py-b", python_resolve="py-b", generators=["python"])
configuration(name="java-default", java_resolve="java-default", generators=["java"])

protobuf_source(
   name="users"
   source="users.proto",
   configuration=parametrize(":py-a", ":py-b", ":java-default"),
)

I'm okay with that! Not the most obvious, but this is also a pretty niche feature most won't use, and it's feasible to document under "Advanced Uses" section. I really like that it's consistent with generic Pants mechanisms.

I think the default for generators=None => all generators. And then you can use generators=[] if you want to disable entirely.

generators field also solves #14041.

stuhood · 2022-03-03T19:03:13Z

I think the default for generators=None => all generators. And then you can use generators=[] if you want to disable entirely.

Yea, exactly.

@tdyas

…discovering the dependency (#14691) @tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well. There's minimal performance hit because we already will have created a third-party module mapping. Two motivations: 1. Ergonomics. 2. Unblock codegen supporting multiple resolves: #14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve. We deprecate Protobuf, but completely update Thrift since it hasn't reached stable yet. [ci skip-rust]

@tdyas

…covering the dependency (pantsbuild#14691) @tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well. There's minimal performance hit because we already will have created a third-party module mapping. Two motivations: 1. Ergonomics. 2. Unblock codegen supporting multiple resolves: pantsbuild#14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve. Note that this was already deprecated in 2.10, so we can outright remove here. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

…o support multiple resolves with codegen (#14693) Closes #14484. ## Problem Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`. Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve. ## Solution Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from #14691 wires this up properly. If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of #14484, but there is a decent workaround via configurations. That will be tackled in a followup. ### FYI: lazy validation of runtime library #14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it. [ci skip-rust] [ci skip-build-wheels]

@tdyas

…covering the dependency (#14695) @tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well. There's minimal performance hit because we already will have created a third-party module mapping. Two motivations: 1. Ergonomics. 2. Unblock codegen supporting multiple resolves: #14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve. Note that this was already deprecated in 2.10, so we can outright remove here. [ci skip-rust]

…o support multiple resolves with codegen (pantsbuild#14693) Closes pantsbuild#14484. ## Problem Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`. Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve. ## Solution Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from pantsbuild#14691 wires this up properly. If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of pantsbuild#14484, but there is a decent workaround via configurations. That will be tackled in a followup. ### FYI: lazy validation of runtime library pantsbuild#14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it. [ci skip-rust] [ci skip-build-wheels]

…o support multiple resolves with codegen (Cherry-pick of #14693) (#14698) Closes #14484. ## Problem Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`. Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve. ## Solution Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from #14691 wires this up properly. If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of #14484, but there is a decent workaround via configurations. That will be tackled in a followup. ### FYI: lazy validation of runtime library #14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it. [ci skip-rust] [ci skip-build-wheels]

Eric-Arellano added gedanken bug and removed gedanken bug labels Feb 15, 2022

Eric-Arellano mentioned this issue Mar 3, 2022

Deprecate [python-protobuf].runtime_dependencies in favor of Pants discovering the dependency #14691

Merged

This was referenced Mar 3, 2022

Add python_resolve field to protobuf_source and thrift_source to support multiple resolves with codegen #14693

Merged

Implement support for codegen with multiple languages x multiple resolves #14694

Open

Eric-Arellano mentioned this issue Mar 3, 2022

Remove [python-protobuf].runtime_dependencies in favor of Pants discovering the dependency #14695

Merged

Eric-Arellano closed this as completed in #14695 Mar 4, 2022

Eric-Arellano mentioned this issue Mar 4, 2022

Add python_resolve field to protobuf_source and thrift_source to support multiple resolves with codegen (Cherry-pick of #14693) #14698

Merged

Eric-Arellano mentioned this issue Aug 8, 2022

go: analyze imports paths by module to enable multiple go_mod targets #16386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out codegen's `runtime_dependencies` and multiple resolves #14484

Figure out codegen's `runtime_dependencies` and multiple resolves #14484

Eric-Arellano commented Feb 15, 2022 •

edited

tdyas commented Feb 15, 2022

tdyas commented Feb 15, 2022

Eric-Arellano commented Feb 15, 2022 •

edited

Eric-Arellano commented Feb 22, 2022

stuhood commented Feb 23, 2022 •

edited

Eric-Arellano commented Mar 2, 2022

stuhood commented Mar 3, 2022 •

edited

tdyas commented Mar 3, 2022 •

edited

stuhood commented Mar 3, 2022

Eric-Arellano commented Mar 3, 2022 •

edited

stuhood commented Mar 3, 2022

Figure out codegen's runtime_dependencies and multiple resolves #14484

Figure out codegen's runtime_dependencies and multiple resolves #14484

Comments

Eric-Arellano commented Feb 15, 2022 • edited

tdyas commented Feb 15, 2022

tdyas commented Feb 15, 2022

Eric-Arellano commented Feb 15, 2022 • edited

Eric-Arellano commented Feb 22, 2022

stuhood commented Feb 23, 2022 • edited

Eric-Arellano commented Mar 2, 2022

stuhood commented Mar 3, 2022 • edited

tdyas commented Mar 3, 2022 • edited

stuhood commented Mar 3, 2022

Eric-Arellano commented Mar 3, 2022 • edited

stuhood commented Mar 3, 2022

Figure out codegen's `runtime_dependencies` and multiple resolves #14484

Figure out codegen's `runtime_dependencies` and multiple resolves #14484

Eric-Arellano commented Feb 15, 2022 •

edited

Eric-Arellano commented Feb 15, 2022 •

edited

stuhood commented Feb 23, 2022 •

edited

stuhood commented Mar 3, 2022 •

edited

tdyas commented Mar 3, 2022 •

edited

Eric-Arellano commented Mar 3, 2022 •

edited