Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out codegen's runtime_dependencies and multiple resolves #14484

Closed
Eric-Arellano opened this issue Feb 15, 2022 · 11 comments · Fixed by #14695 or #14698
Closed

Figure out codegen's runtime_dependencies and multiple resolves #14484

Eric-Arellano opened this issue Feb 15, 2022 · 11 comments · Fixed by #14695 or #14698

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Feb 15, 2022

Codegen like [python-protobuf] uses the --runtime-dependencies option and "depndency injection" to dynamically add a dependency on runtime dependencies like protobuf. Those are necessary for the generated code to work properly.

However, those python_requirement targets must belong to a single "resolve".

Semantically, we need to support having multiple different versions of the same runtime deps, one set per resolve. That's key for the feature, that we allow you to have conflicting versions of the same dep.

Logistically, we need to figure this out because you often won't be able to use --runtime-dependencies if you are using multiple resolves. It would mean that only code using the same resolve as the runtime deps could use codegen. Otherwise, you'd have to disable the option and manually add the runtime deps.

--

A very simple fix is to replace runtime_dependencies with runtime_dependencies_per_resolve.

[python-protobuf]
runtime_dependencies_per_resolve = { python-default = ["//:reqs#protobuf"] }

The downside is this makes the default case of only one resolve more complex.

@tdyas
Copy link
Contributor

tdyas commented Feb 15, 2022

Maybe something akin to what we do for scala-library would work? To inject the scala-library, the Scala backend searches the resolve for the jvm_artifact that provides scala-library. Imagine if we could assign or define some sort of "ID" for runtime dependencies, then Pants can search the resolve for the language-specific target in that resolve that matches the ID.

Further thought on whether this would be language-specific or could be made generic.

On the other hand, target addresses are also generic enough, around known by users, and good enough for a v1 of this. Making it easier to specify runtime deps can be future work.

@tdyas
Copy link
Contributor

tdyas commented Feb 15, 2022

Also, I assume all of the other codegen impls to come (e.g., Java/Scala) will have this issue.

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Feb 15, 2022

Maybe something akin to what we do for scala-library would work? To inject the scala-library, the Scala backend searches the resolve for the jvm_artifact that provides scala-library.

Great suggestion!! We already know what resolve each python_requirement comes from, too. As discussed on the Scala ticket, it's a bummer this requires consulting AllTargets, but we already are doing that for dep inference. And you can disable this feature if you don't like it.

Check out #14486 for how we now have to associate resolves to the FirstPartyPythonMappingImpl used for Python dep inference on Protobuf and Thrift. The naive thing to do would to claim that codegen can create files for every resolve, so {"a": {"foo_pb2": ...}, "b": {"foo_pb2": ...}}. The fancier thing would be to only add to resolves that have the runtime libraries defined. Not clear to me whether that's good magic or not.

--

Oh huh, that's tricky though. We right now are storing the dependency on the protobuf_source target, not the python_source target.

To know which resolve to use, we need to know which specific python_source target depends on the protobuf_source. The dep will need to move to being on the python_source.

@Eric-Arellano
Copy link
Contributor Author

This is an issue for my original proposal of runtime_dependencies_per_resolve too:

Oh huh, that's tricky though. We right now are storing the dependency on the protobuf_source target, not the python_source target.

To know which resolve to use, we need to know which specific python_source target depends on the protobuf_source. The dep will need to move to being on the python_source.

Any thoughts @stuhood ?

@stuhood
Copy link
Sponsor Member

stuhood commented Feb 23, 2022

I don't think that I understand the question. But it makes sense to me for the dependency on the runtime library to be injected on the protobuf_source target, and for a Python resolve to be added as a plugin field to protobuf_source. See #14258 (comment) re: plugin fields.

@Eric-Arellano
Copy link
Contributor Author

Indeed: working on Go via #14258 is illuminating here. For the generated Go code to compile, we need to have the Go Protobuf runtime library built first. Note that as a compiled language, we need to first compile that generated Go code before normal first-party code can consume the generated code; so it's too late to grab the Protobuf dependency by relying on a go_package dependee, we need it earlier. Compare this to Python where we only need to make sure the final transitive closure includes a runtime library.

That makes sense conceptually to me. It also avoids the major problem I tried to mention above where we would need to make our dependency inference code much fancier: if we were to move the runtime_dependency from protobuf_source into python_source, then we would now need our Python dependency inference code to realize it's depending on protobuf_source vs apache_source & then have logic to determine which python_requirement to use based on the resolve of the python_source. It would be possible I think, but a complex leaky abstraction.

--

My only hesitation here is the interaction of parametrization with multiple languages. Consider this:

protobuf_source(
   name="users"
   source="users.proto",
   python_resolve=parametrize("py-a", "py-b")
   jvm_resolve="java-default",
)

This will create 2 protobuf_source targets, both with the same resolve for jvm_resolve. For JVM purposes, the two targets are practically identical, and notably this will result in dependency inference ambiguity.

One solution is Tom's suggestion to give up on protobuf_source in favor of protobuf_python_source, protobuf_jvm_source, etc: #14258 (comment). That would work around this, but I agree with Stu that it's not desirable #14258 (comment). A major goal of codegen in v2 is to have a single target that describes the protocol file, and achieving that goal motivated a lot of the Target API. It would be a UX regression to require multiple targets.

Any suggestions @stuhood? I don't think configurations would solve this either iiuc.

The best I can think of is somehow updating our dep inference code to recognize that users.proto:users@python_resolve=a is functionally equivalent to users.proto:users@python_resolve=b for JVM, and arbitrarily choose one...

@stuhood
Copy link
Sponsor Member

stuhood commented Mar 3, 2022

Any suggestions @stuhood? I don't think configurations would solve this either iiuc.

Yea, interesting. Configurations could address it, but only if you required some other marker field to be present, to explicitly request codegen for a particular instance. For example, as a strawname: generate_java=True (or requiring the resolve to be set explicitly). Then you could use configuration to disable some parametrizations of the target:

configuration(name="py-a", python_resolve="py-a")
configuration(name="py-b", python_resolve="py-b")
configuration(name="java-default", java_resolve="java-default", generate_java=True)

protobuf_source(
   name="users"
   source="users.proto",
   configuration=parametrize(":py-a", ":py-b", ":java-default"),
)

Don't love that though. It's important that the resolve field defaults, but it makes this case tricky because there is no explicit marker.

@tdyas
Copy link
Contributor

tdyas commented Mar 3, 2022

One solution is Tom's suggestion to give up on protobuf_source in favor of protobuf_python_source, protobuf_jvm_source, etc: #14258 (comment). That would work around this, but I agree with Stu that it's not desirable #14258 (comment).

The developer experience need not have the separate target types exposed to the user. Would it be beneficial to "split off" language-specific targets from the protobuf_source target using target generation?

ADDED: protobuf_source would ask the language backends to generate their applicable targets using the fields on the protobuf_source (using a union or whatever).

@stuhood
Copy link
Sponsor Member

stuhood commented Mar 3, 2022

For example, as a strawname: generate_java=True (or requiring the resolve to be set explicitly).

Oh: An explicit generators string-list value field would work here, and could be used with configuration to disable in some cases.

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Mar 3, 2022

The developer experience need not have the separate target types exposed to the user. Would it be beneficial to "split off" language-specific targets from the protobuf_source target using target generation?

Possibly, but some weird edges.

  • We'd need to define multiple new "atom" targets like {protobuf,thrift}_{go,python,java,scala}_source, which is generally undesirable to have an explosion of targets.
  • What does protobuf_source do now? If it generates those N language atom-targets, why have that and protobuf_sources? Why not have protobuf_python_sources plural?
  • Address scheme: would it be foo/f.proto#python vs foo/f.proto#golang? We could do that, but makes an already complex address even more complex.

--

Oh: An explicit generators string-list value field would work here, and could be used with configuration to disable in some cases.

This?

configuration(name="py-a", python_resolve="py-a", generators=["python"])
configuration(name="py-b", python_resolve="py-b", generators=["python"])
configuration(name="java-default", java_resolve="java-default", generators=["java"])

protobuf_source(
   name="users"
   source="users.proto",
   configuration=parametrize(":py-a", ":py-b", ":java-default"),
)

I'm okay with that! Not the most obvious, but this is also a pretty niche feature most won't use, and it's feasible to document under "Advanced Uses" section. I really like that it's consistent with generic Pants mechanisms.

I think the default for generators=None => all generators. And then you can use generators=[] if you want to disable entirely.

generators field also solves #14041.

@stuhood
Copy link
Sponsor Member

stuhood commented Mar 3, 2022

I think the default for generators=None => all generators. And then you can use generators=[] if you want to disable entirely.

Yea, exactly.

Eric-Arellano added a commit that referenced this issue Mar 3, 2022
…discovering the dependency (#14691)

@tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well. 

There's minimal performance hit because we already will have created a third-party module mapping.

Two motivations:

1. Ergonomics.
2. Unblock codegen supporting multiple resolves: #14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve.

We deprecate Protobuf, but completely update Thrift since it hasn't reached stable yet.

[ci skip-rust]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this issue Mar 3, 2022
…covering the dependency (pantsbuild#14691)

@tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well.

There's minimal performance hit because we already will have created a third-party module mapping.

Two motivations:

1. Ergonomics.
2. Unblock codegen supporting multiple resolves: pantsbuild#14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve.

Note that this was already deprecated in 2.10, so we can outright remove here.

[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Mar 3, 2022
…o support multiple resolves with codegen (#14693)

Closes #14484.

## Problem

Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`.

Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve.

## Solution

Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from #14691 wires this up properly.

If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of #14484, but there is a decent workaround via configurations. That will be tackled in a followup.

### FYI: lazy validation of runtime library

#14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. 

Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Mar 4, 2022
…covering the dependency (#14695)

@tdyas had the idea that rather than you having to tell Pants where to load `protobuf`, `grpcio`, and `thrift`, we can simply discover it! We do this for Scala already and it works well.

There's minimal performance hit because we already will have created a third-party module mapping.

Two motivations:

1. Ergonomics.
2. Unblock codegen supporting multiple resolves: #14484. Otherwise we would need to add `[python-protobuf].runtime_dependencies_per_resolve`, which violates our goal for resolves to not make things more complex for the average user who only wants one resolve.

Note that this was already deprecated in 2.10, so we can outright remove here.

[ci skip-rust]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this issue Mar 4, 2022
…o support multiple resolves with codegen (pantsbuild#14693)

Closes pantsbuild#14484.

## Problem

Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`.

Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve.

## Solution

Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from pantsbuild#14691 wires this up properly.

If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of pantsbuild#14484, but there is a decent workaround via configurations. That will be tackled in a followup.

### FYI: lazy validation of runtime library

pantsbuild#14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. 

Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Mar 4, 2022
…o support multiple resolves with codegen (Cherry-pick of #14693) (#14698)

Closes #14484.

## Problem

Generated code needs associated runtime libraries, e.g. `protobuf`. Those must come from a `python_requirement` which by definition have a single `resolve`.

Before this PR, it was impossible to have >1 `python_requirement` target used for each runtime library and every `protobuf_source` target would depend on the same `python_requirement`. Practically, this means codegen could only work with code belonging to a single resolve.

## Solution

Add a `python_resolve` field to codegen targets so that they can indicate which `python_requirement` they should be using. Our new dependency inference from #14691 wires this up properly.

If you want the same generated code to work with multiple resolves, you will need to use `parametrize`. That gets tricky when you consider a codegen target generating for multiple languages, as explained at the bottom of #14484, but there is a decent workaround via configurations. That will be tackled in a followup.

### FYI: lazy validation of runtime library

#14691 asserts that there is exactly 1 runtime library in the project; now we assert that for each resolve. 

Key nuance: this check is lazy. If you have 2 resolves, but your codegen targets only use 1 of the 2, then we will never end up checking that there is a runtime library defined for the 2nd resolve. That is important so that, for example, if you have a `pants-plugins` resolve we don't force you to add `protobuf` unnecessarily to it.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment