Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__defaults__(all=...) is a future-compatibility hazard for adding new field #20925

Open
huonw opened this issue May 14, 2024 · 4 comments
Open
Labels

Comments

@huonw
Copy link
Contributor

huonw commented May 14, 2024

Describe the bug

The all functionality for __default__ is somewhat of a future-compatibility hazard:

  1. In a code base using Pants 2.34, there's a target t1 with a field called f (this could be in any backend, even a private in-repo one), and a second built-in target called t2 without that field.
  2. Someone has an existing BUILD file like:
    __defaults__(all=dict(f="abc"))
    t1(name="one") # after applying __defaults__: t1(name="one", f="abc")
    t2(name="two") # after applying __defaults__: t2(name="two")
  3. In Pants 2.35, we now add a new field to t2 also called f.
  4. When the code above upgrades to Pants 2.35, the t2 target will now become t2(name="two", f="abc") after applying __defaults__, and this could either:
    1. silently change behaviour
    2. fail with confusing error messages (if abc doesn't make sense for t2's f field)

With Pants' current behaviour, if we wanted to be strict about not breaking existing code, this means we could never add new fields, because someone could have an existing field of the same name, and use it in __defaults__(all=...). Not ever adding new fields is clearly untenable, so either we have to:

  1. allow breaking existing user code (but we could do things like improve diagnostics, or have a convention of calling it out in release notes or something)
  2. change how all works (e.g. deprecate and remove it(!), have some notion of "all fields as of a particular release", so people need to bump that )

I think we should make an explicit policy decision (even if it's just "users just have to tolerate any breakage due to all").


For instance, in #20894 (comment), a user had:

__defaults__(
    extend=True,
    all=dict(
        # This will set PYTHON_IMAGE build arg in all docker images by default.
        extra_build_args=["PYTHON_IMAGE=python:3.11"],
        # This will tell pants that all python sources need to use python 3.11 by default.
        interpreter_constraints=["==3.11.*"],
    ),
)

In 2.20 and earlier (before #20737), the only built-in target with the extra_build_args field was docker_image, and so this all was applying that docker-formatted build arg to all.

In 2.21 (which includes #20737), pex_binary also has this field. It has completely different meaning (a list of CLI args to pass to a pex invocation). Thus, that __default__ configuration starts setting pex_binary(..., extra_build_args=["PYTHON_IMAGE=python:3.11"]) and understandably an invocation like pex ... PYTHON_IMAGE=python:3.11 doesn't work.

The fix the user identified is to move the field from all to docker_image:

__defaults__(
    {
        (docker_image,): dict(
            # This will set PYTHON_IMAGE build arg in all docker images by default.
            extra_build_args=["PYTHON_IMAGE=python:3.11"],
        )
    },
    all=dict(
        # This will tell pants that all python sources need to use python 3.11 by default.
        interpreter_constraints=["==3.11.*"],
    ),
    extend=True,
)

Pants version
All, although specific example is in 2.21

OS
All

Additional info
N/A

@kaos
Copy link
Member

kaos commented May 15, 2024

This is a good find, and thanks for the write up, Huon.

I think all serves a good purpose (consider the case of optionally enable/disable backends, you couldn't have defaults in BUILD files for any targets that are disabled, making that much more difficult without all)

With that, I fully agree better diagnostics is a must, to help identify what failed (in this example, providing an invalid value for extra_build_args for the pex_binary target, which was not obvious from the error message as it were.)

Another observation is, that we may want to consider doing less up-front validation of the value during target creation, and post-pone it to when it's actually being used. In this case, the extra_build_args field could simply accept that the structure is good (a list of string values), and not attempt to parse the string values until being consumed. With this, the issue surface becomes less of an issue, as long as fields with the same name share the same structure they're good.

The downside being that pants peek wouldn't trigger all the proper validations that we might want to get.. which could result in configurations that seems to be accepted and valid, only to fail during some other operation.

Which leads me to conclude that we may want to introduce an intermediate step for creating field values, where the first step is merely structural, and would be used for __defaults__ and the current behavior kept for when loading a target with all associated field values.

--
Also, having a note about all being potentially hazardous could be a good idea.

--
My argument above about not being able to provide defaults for targets from a temporarily disabled backend could also potentially be alleviated by supporting using string value for the targets (this may actually already work, I haven't checked):

__defaults__({("python_source", "pex_binary"): dict(tags=["foo"]})

@huonw
Copy link
Contributor Author

huonw commented May 15, 2024

I think all serves a good purpose (consider the case of optionally enable/disable backends, you couldn't have defaults in BUILD files for any targets that are disabled, making that much more difficult without all)

Ah, yes. Strings seems reasonable to solve this use-case (although I note there's the risk that someone makes a typo like "python_suorce" and accidentally has the __default__ never apply 🤔 ).

Another observation is, that we may want to consider doing less up-front validation of the value during target creation, and post-pone it to when it's actually being used. In this case, the extra_build_args field could simply accept that the structure is good (a list of string values), and not attempt to parse the string values until being consumed. With this, the issue surface becomes less of an issue, as long as fields with the same name share the same structure they're good.

I think the error observed #20894 (comment) was from pex, because it was being passed an incorrect argument.

That is, Pants already didn't do much validation, and so the argument was passed through to the underlying pex ... invocation. pex understandably couldn't make sense of the PYTHON_IMAGE=python:3.11 argument and so threw an error.

Also, having a note about all being potentially hazardous could be a good idea.

👍

@kaos
Copy link
Member

kaos commented May 16, 2024

I think all serves a good purpose (consider the case of optionally enable/disable backends, you couldn't have defaults in BUILD files for any targets that are disabled, making that much more difficult without all)

Ah, yes. Strings seems reasonable to solve this use-case (although I note there's the risk that someone makes a typo like "python_suorce" and accidentally has the __default__ never apply 🤔 ).

👍🏽

Another observation is, that we may want to consider doing less up-front validation of the value during target creation, and post-pone it to when it's actually being used. In this case, the extra_build_args field could simply accept that the structure is good (a list of string values), and not attempt to parse the string values until being consumed. With this, the issue surface becomes less of an issue, as long as fields with the same name share the same structure they're good.

I think the error observed #20894 (comment) was from pex, because it was being passed an incorrect argument.

That is, Pants already didn't do much validation, and so the argument was passed through to the underlying pex ... invocation. pex understandably couldn't make sense of the PYTHON_IMAGE=python:3.11 argument and so threw an error.

Right. I misread the comment thinking the error message came up-front, not when running a goal that invoked pex. So for that case, having extra_build_args as a default for all is prob. a bad idea.. given build args are likely not compatible across multiple targets.

With that line of thought, one approach could be to have a flag on the field types to indicate if they're unsuitable for all and could warn/err if used as such. That too, isn't perfect however, as interpreter_constraints could be seen as OK, and is as long as it's applied to a tree of only one language, but if applied at the top level and there are multiple languages in use, and both of these have the notion of interpreter constraints, that would become incompatible.

All of the above, to open up the idea of providing defaults per field type, rather than per target field. So instead of all, it could be something like:

__defaults__({python.interpreter_constraints: "==3.10.*"})
# This would be equivalent to
__defaults__(all={"interpreter_constraints": "==3.10.*"})
# except, only apply to interpreter_constraints for targets using the Field from the Python backend.

I actually kind of like this, and we could deprecate all. Another example, for completeness.

# Example to set default tags. Being a default field type present on practically all targets.
__defaults__({core.tags: ["tag1", "tag2"]})

@huonw
Copy link
Contributor Author

huonw commented May 17, 2024

All of the above, to open up the idea of providing defaults per field type, rather than per target field. So instead of all, it could be something like:

Ah, yeah, great idea, so scope/group/namespace the fields somehow, and then defaulting within a namespace much reduces the risk here! Nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants