Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It should be possible to consume multiple default_outputs in an attrs.list(attrs.source()) #561

Open
zjturner opened this issue Feb 6, 2024 · 5 comments

Comments

@zjturner
Copy link
Contributor

zjturner commented Feb 6, 2024

I have a genrule that outputs multiple files by returning a DefaultInfo() whose default_outputs is set to a list with more than 1 item, it would be really nice if I could have these consumed by another rule as follows:

foo(
    srcs = [":other_rule"]   # srcs is an `attrs.list(attrs.source())`

If you look at this documentation, it even explicitly states that it should work, because of this line:

genrule(name = "gen_stuff", ...., default_outs = ["foo.cpp"])

# ":gen_stuff" pulls the default_outputs for //subdir:gen_stuff
foo_binary_wrapper(name = "foo", srcs = glob(["*.cpp"]) + [":gen_stuff"])

I understand that it might be difficult to get the type checking to pass here, since technically the target label :gen_stuff is one thing, and expanding it would really expand to a list, at which point you have a list of lists. But I think if you introduced a global function called default_outputs, this would be very elegant. Then you'd write my original example as:

foo(
    srcs = default_outputs(":other_rule")

Just like glob returns a list, this too could return a list.

What are the chances of seeing something like this implemented?

@JakobDegen
Copy link
Contributor

Let me quote some of my comment here:

The possibility of having multiple outputs was added to buck2 because we thought it was generally a good idea. However, based on something I heard from cjhopman , we never really used it internally because it turned out that too much stuff broke if --show-output returned more than one output. Instead, we've mostly gone with subtargets to make any additional outputs available.

The result of that is that the "multiple outputs" support is only partially complete, and I don't think we ever fully figured out what we want from it. If you want to stay on the well-trodden path, not using it would probably be your best bet. On the other hand, the fact that this isn't really figured out yet also means that we'd be happy to take PRs that build this support out and clarify how things should work

It's possible that someone on the team has stronger opinions on this then I do, but at least for me, this case is an example of it not being entirely clear what we should do.

@zjturner
Copy link
Contributor Author

zjturner commented Feb 8, 2024

Thanks for the context. If you were to add a function like default_outputs() as I proposed in the first post, that would leave all existing behavior unchanged, while also being flexible enough to enable a bunch of new usage simplification scenarios. Right now I have to do something like this

        srcs=[":{}[out_files][generated_srcs][{}]".format(internal_name, src) for src in generated_srcs],

But with the suggested change, I could change bit to this:

        srcs=default_outputs(":" + internal_name),

which is a huge improvement not just in that line of code, but in the implementation of the rule itself, since I wouldn't even have to create any subtargets. Even beyond this use case though, just being able to access the default_outputs of a target via a global function like this could be useful for other scenarios I'm not even thinking of. And it works just as well for single output providers too. You could even deprecate usages like srcs = [":foo"] and say that the correct way going forward is srcs = default_outputs(":foo") since that works everywhere and is composable (e.g. srcs = glob("*.cpp") + default_outputs(":foo"))

@JakobDegen
Copy link
Contributor

but in the implementation of the rule itself, since I wouldn't even have to create any subtargets. Even beyond this use case though, just being able to access the default_outputs of a target via a global function like this could be useful for other scenarios I'm not even thinking of

So one of the difficulties with this kind of default_outputs thing is that you can't actually know what the outputs of analysis are at loading time (or even how many they will be) - that's just an ordering problem, loading happens strictly before analysis. That means that you couldn't actually return a list from that function, you'd have to return some kind of a proxy object. The ergonomics of that quickly get pretty grim, so that's probably not a great direction to go in

The best thing that I can think of right now is that maybe we should just try and make multiple default outputs on a single item in an attr.list(attr.source()) magically work (and error when it's not in a list I suppose?). That's really not ideal, but I can't think of anything better

@zjturner
Copy link
Contributor Author

zjturner commented Feb 9, 2024

I trust you when you say the ergonomics of the proxy object would be grim, but it helps me understand what's going on under the hood, so apologies if I keep digging :)

Are you talking about ergonomics from the user side, from the buck2 core runtime side, from the rule implementation side, or something else? I'm imaging that if you did do the proxy object, then all of the complexity of reaching into this proxy object could be silently handled by attrs.list(). For example, maybe attrs.list() is taught to handle a proxy object of AnalysisList (a list whose length isn't known until analysis time), and which resolves to a collection of Artifact objects. So on the rule side, it doesn't have to be updated and ctx.attrs.srcs just magically contains all the right stuff.

Does this not work?

@JakobDegen
Copy link
Contributor

I'm more worried about the macro side, not the rule side. You still need to make this compose with other things that people do in macros. For example, someone is going to write srcs = default_outputs(":foo") + ["x.txt", "y.txt"]. Meaning all the standard list operations have to keep working. But of course they have to return proxy objects too. Now some macro that does len(srcs) which previously worked will break in weird and surprising ways. (also, maybe a macro wants to inspect the insides of that default_outputs? Might have to support that)

Something like this exists already for another function widely used in macros: select, which also supports + for combining, map, and a bunch of other stuff. The amount of complexity that handling selects introduces on both macro authors and on the core implementation is extremely high. My instinct is that this use case doesn't meet the bar for introducing a second copy of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants