Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Support for .note.package linker scripts #505

Closed
sicherha opened this issue May 9, 2022 · 29 comments
Closed

[Discussion] Support for .note.package linker scripts #505

sicherha opened this issue May 9, 2022 · 29 comments

Comments

@sicherha
Copy link
Contributor

sicherha commented May 9, 2022

People at Red Hat / Fedora are interested in experimenting with mold as part of the toolchain for building RPM packages. There is one major obstacle, though: the .note.package annotation mechanism that will be introduced in Fedora 36.

In a nutshell, .note.package is a new section that gets added to every binary at link time. The section contains metadata about the program (e.g. package name, version, distribution) encoded as a small JSON document. This metadata can then be read out by various tools, for example by coredumpctl to display useful information about crashed programs to the user.

This is all done by generating a small linker script (say, notes.ld) and then passing -Wl,-dT notes.ld to the compiler driver. Here's an example of what such a script looks like:

SECTIONS
{
    .note.package (READONLY) : ALIGN(4) {
        LONG(0x0004)                                /* Length of Owner including NUL */
        LONG(0x0032)                                /* Length of Value including NUL */
        LONG(0xcafe1a7e)                            /* Note ID */
        BYTE(0x46) BYTE(0x44) BYTE(0x4f) BYTE(0x00) /* Owner: 'FDO\x00' */
        BYTE(0x7b) BYTE(0x22) BYTE(0x74) BYTE(0x79) /* Value: '{"type":"package","os":"fedora","osVersion":"35"}\x00\x00\x00' */
        BYTE(0x70) BYTE(0x65) BYTE(0x22) BYTE(0x3a)
        BYTE(0x22) BYTE(0x70) BYTE(0x61) BYTE(0x63)
        BYTE(0x6b) BYTE(0x61) BYTE(0x67) BYTE(0x65)
        BYTE(0x22) BYTE(0x2c) BYTE(0x22) BYTE(0x6f)
        BYTE(0x73) BYTE(0x22) BYTE(0x3a) BYTE(0x22)
        BYTE(0x66) BYTE(0x65) BYTE(0x64) BYTE(0x6f)
        BYTE(0x72) BYTE(0x61) BYTE(0x22) BYTE(0x2c)
        BYTE(0x22) BYTE(0x6f) BYTE(0x73) BYTE(0x56)
        BYTE(0x65) BYTE(0x72) BYTE(0x73) BYTE(0x69)
        BYTE(0x6f) BYTE(0x6e) BYTE(0x22) BYTE(0x3a)
        BYTE(0x22) BYTE(0x33) BYTE(0x35) BYTE(0x22)
        BYTE(0x7d) BYTE(0x00) BYTE(0x00) BYTE(0x00)
    }
}
INSERT AFTER .note.gnu.build-id;
/* HINT: add -Wl,-dT,/path/to/this/file to $LDFLAGS */

To be able to use mold as a drop-in replacement for the GNU linker, the following two missing features would have to be implemented:

  1. The -dT, --default-script command-line option:

    This option is similar to the --script option except that processing of the script is delayed until after the rest of the command line has been processed. This allows options placed after the --default-script option on the command line to affect the behaviour of the linker script, which can be important when the linker command line cannot be directly controlled by the user. (eg because the command line is being constructed by another tool, such as gcc).

    Since -T, --script is already there, I reckon this wouldn't be too much effort.

  2. Support for non-trivial linker scripts (such as the one above) containing SECTIONS, constant values, INSERT AFTER et cetera. I'm aware that you have previously taken the deliberate stance that mold will support only the absolute minimum set of features necessary for linking against /usr/lib64/libc.so. Still, I would like to put this topic up for discussion as the feature set used by these particular scripts still seems relatively small.

Is this something you consider worth pursuing?

@rui314
Copy link
Owner

rui314 commented May 10, 2022

Given that linker script, you know the exact contents of the .note.package section, and you want to embed that section into a linker's output. In that situation, linker script is overkill. Instead of linker script, you can create an object file containing a desired .note.package section and pass that file to the linker instead of the linker script.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

Apparently I'm not the first person to think about passing an object file instead of a linker script. I found this discussion: https://reviews.llvm.org/D118490

It's not clear to me how and why it is difficult to pass an object file to the linker to include a ".note.package" section, though. If you can pass a -dT to the linker, can't you pass an object file instead?

But since it's a very obvious idea, I guess I may be missing something.

From the mold's point of view, supporting a linker script feature is still overkill. Especially, the notion of "default linker script" is GNU linker's implementation specific. The GNU linker as a whole is implemented as an interpreter of the linker script, and -dT is an option to swap the internal linker script with a given script. Since mold isn't built on top of linker script interpreter, -dT doesn't fit well to mold. Also, the linker script looks just too hacky if it is intended to be used not only by the Fedora packaging system but by various parties for various systems.

Given this situation, I'd suggest we implement a new linker command line option, --package-info=<arbitrary string>. The option works as follows: if you pass that option to the linker, it creates a .note.package section and copy a given string into the section.

The new option should be much more straightforward than implementing it with some linker script hacking. Since this feature is intended to be used cross-distribution, and if Fedora and Debian wanted to use it for all packages, adding a native support to the linker can easily be justified IMO. And honestly it's not a big feature anyway. We can implement it pretty easily.

How does this sound to you?

@rui314
Copy link
Owner

rui314 commented May 10, 2022

Here is an experimental implementation of the --package-info option: 3a88295

Note that it's not submitted to the main branch, it's in package-info branch.

@sicherha
Copy link
Contributor Author

sicherha commented May 10, 2022

Wow, the --package-info implementation looks amazingly straightforward. Unfortunately, Fedora needs to consider the constraint that a solution should work out-of-the-box with all linkers - so I'm not sure this is the best way forward.

After reading through your deliberations and the lld discussion thread you linked above, I agree with you that a linker script is probably not the right thing to do here. I've toyed around with the assembler a bit and come up with this:

.p2align 2
.section .note.package, "a"
  .long 0x0004
  .long .Lend - .Lbegin
  .long 0xcafe1a7e
  .string "FDO"
.Lbegin:
  .string "{\"type\":\"package\",\"os\":\"fedora\",\"osVersion\":\"35\"}"
.Lend:
.p2align 2

When dumping this into a file named notes.s and adding that file to the LDFLAGS - instead of -Wl,-dT notes.ld - the resulting note section in the output binary looks identical at first glance. I may be missing something, though - this is just a superficial first look, and I wasn't involved in the discussions that led the Fedora team to pursue the linker-script approach.

Maybe @bluca or @keszybz, who participated in the lld discussion on the .note.package topic, can shed some light on this?

@rui314
Copy link
Owner

rui314 commented May 10, 2022

A problem with an assembly language is that it's not cross platform even if your assembly contains only data directives and not code. I believe it's safer to create an empty object file and then copy a text to a .note.package using objcopy command if you take that route.

@bluca
Copy link

bluca commented May 10, 2022

Maybe @bluca or @keszybz, who participated in the lld discussion on the .note.package topic, can shed some light on this?

The plan is to propose adding a command line like --build-id to BFD once we demonstrate that the feature is stable and useful - adding new fixed command line options to tools like BFD is expensive since once they are there, they are there forever, you can't just remove them as it would break compatibility. But you can't really prove it's useful until after it's shipped, so bit of a chicken-and-egg problem. The upcoming Fedora 36 will ship with this, and should be good enough to prove it's a nice idea to add the specific command line, just like --build-id.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

Then, maybe it's time to propose a new option to GNU perhaps with my implementation as FYI?

@bluca
Copy link

bluca commented May 10, 2022

F36 hasn't even shipped yet, so a bit too soon :-) But certainly in the coming months. Note that the linked patch doesn't do any parsing, so it needs some rework - the spec clearly says the payload is json, so the input should be validated for correctness

@X547
Copy link

X547 commented May 10, 2022

I see no problem with using object file (*.o) with .note.package section. It can be added to toolchain stdlibs like crti.o, crtn.o.

Haiku already does it. Every Haiku executable or shared library should export symbols _gSharedObjectHaikuABI and _gSharedObjectHaikuVersion to detect ABI and API used by executable. For example this is needed to choose between legacy GCC2 C++ ABI and modern Itanium C++ ABI. Symbols are defined in object file init_term_dyn.o that is part of toolchain stdlibs. All packages are built with that symbols without problems.

Update: .note.package section contents can be different for each executable so some kind of dynamic generation is needed.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

@bluca Input data validation is intentionally omitted in this patch and left it as a user's responsibility. Even if we do some validation, there's still lots of wrong ways to use the --package-info option. For example, --package-info='{}' specifies a valid JSON, but I guess most .note.package processors don't expect such JSON. There are many more edge cases, and I think it is better for the middleman (the linker) to handle .note.package data as an opaque string rather than trying to be too clever.

@bluca
Copy link

bluca commented May 10, 2022

Yes, processors can handle nul json, that's not a problem (if you are aware of one that cannot, please let me know and I'll file a bug). The point is not to cover edge cases, the point is to ensure minimal validation as provided by a common library of choice. IE, the format needs to be validated, not the semantics of the content.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

There are still too many edge cases to test even if we just verify an input as a JSON string.

  • An input string must be a valid UTF-8 string
  • A string literal cannot contain invalid \u characters (e.g. \uffff is not a valid Unicode codepoint I believe)
  • A Unicode codepoint that's not in the basic plane must be represented as two surrogate pairs in the \u format, and the validator must ensure that they collectively refer a valid Unicode codepoint
  • A number can be encoded in the E notation

and so on and so on. And as I wrote, that format-level data validation doesn't prevent users from making a semantic error such as passing an empty JSON object {} as a parameter to --package-info. If a "validation" means printing out a warning message if the first character is not {, then I'm fine with that, but implementing a complete JSON validator in the linker is too much.

@bluca
Copy link

bluca commented May 10, 2022

There's no need to reimplement anything, there are existing libraries for this. The value proposition of a specific option that follows the open specification is to provide minimal validation of input, to verify adherence to the specifications. If it just shoves in anything passed through, it's really not adding much value. Anyway, on your tool you are free to do as you please of course, this is simply the approach I'll take when working on BSD, to provide something more useful.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

We don't want to add a dependency to a new library too casually. I believe that's also a stance of the GNU guys too. I think none of mold, lld or the GNU linkers wants to add a new dependency to libjson or something just to verify an input string.

@llunak
Copy link
Contributor

llunak commented May 10, 2022

Can't you simply use objcopy --add-section to add whatever section you want? You're presumably already going to post-process binaries to e.g. split out debuginfo, so it's not going to cost anything, and then you don't need to care about linkers, LDFLAGS or anything like that.

And I don't see much point in linker trying to do the validation. If neither knows nor cares what the correct content is supposed to be, and your packaging system is going to be the one generating it, so it may just as well make sure it's correct. If somebody else is going to add the section, the linker is not going to stop them from messing it up. You also didn't seem to care about any validation with the linker script approach.

@rui314
Copy link
Owner

rui314 commented May 10, 2022

@llunak I don't think you can use objcopy --add-sections to add .note.package because .note.package is at the beginning of the memory-mapped image of a process. You need to know the .note.package's size beforehand to assign addresses to following segments such as .text. So you can't insert a .note.package in the middle of existing, already-relocated segments.

@sicherha
Copy link
Contributor Author

A problem with an assembly language is that it's not cross platform even if your assembly contains only data directives and not code. I believe it's safer to create an empty object file and then copy a text to a .note.package using objcopy command if you take that route.

This may be true in general, but I wonder whether it would really be relevant for our concrete limited use case. I'm not an expert on all the nitty gritty details of the GNU assembler, but from my semi-limited experience I would assume the following:

  • .section, .p2align, labels + label arithmetic should work the same for all relevant target architectures.
  • .string should work the same everywhere, as long as we only deal with ASCII (which .note.package hopefully restricts itself to, right?).
  • (unsure; gut feeling) .long and .int both generally correspond to the target architecture's int C type, i.e. to a 32-bit integer for all relevant targets.
  • .long/.int is represented in a target-specific byte order, which is good - we want the native byte order.

Please correct me if I'm getting something wrong or missing something.

The point I'm trying to make is this: if there is a realistic chance to get the functionality we need without having to introduce changes to four linkers (bfd, gold, lld, mold) in a coordinated manner, that's probably an option worth looking into.

@bluca
Copy link

bluca commented May 10, 2022

The point I'm trying to make is this: if there is a realistic chance to get the functionality we need without having to introduce changes to four linkers (bfd, gold, lld, mold) in a coordinated manner, that's probably an option worth looking into.

There is no chance, none of that works in reality, outside of 'hello world' tests that don't mean much, it's all been tried on real systems. What we have right now is what can actually work across ~30,000 packages using a thousand different build systems on 22 different binary architectures. Adding a new specific command line option, just like for --build-id, is the right thing for long-term purposes.

@Alcaro
Copy link

Alcaro commented May 10, 2022

  • .section, .p2align, labels + label arithmetic should work the same for all relevant target architectures.

.section works differently on non-ELF targets, but non-ELF targets are of no interest to Fedora. p2align and labels do, to my knowledge, work the same on every target.

  • .string should work the same everywhere, as long as we only deal with ASCII (which .note.package hopefully restricts itself to, right?).

Like .section, .string has failure cases; in this case, EBCDIC platforms. But, like .section, I don't think Fedora supports any non-ASCII-superset charsets. And if you do, https://sourceware.org/binutils/docs/as/ lists a .ascii command.

  • (unsure; gut feeling) .long and .int both generally correspond to the target architecture's int C type, i.e. to a 32-bit integer for all relevant targets.
  • .long/.int is represented in a target-specific byte order, which is good - we want the native byte order.

Let's check the docs.

7.49 .int expressions
Expect zero or more expressions, of any section, separated by commas. For each expression, emit a number that, at run time, is the value of that expression. The byte order and bit size of the number depends on what kind of target the assembly is for.
7.62 .long expressions
.long is the same as ‘.int’. See .int.

They're probably something other than 32 bits if C int isn't 32 bits. But, again, I don't think that's relevant to anything Fedora supports. The 'at run time' segment probably means 'can be subject to relocations', but subtracting pointers in the same section and translation unit will give a proper constant on every plausible platform.

Byte order is indeed target specific.

There is no chance, none of that works in reality, outside of 'hello world' tests that don't mean much

sicherha's code is hello world sized, and as such, has a quite reasonable probability of success.

If you're aware of any counterexample, do tell.

@rui314
Copy link
Owner

rui314 commented May 11, 2022

(unsure; gut feeling) .long and .int both generally correspond to the target architecture's int C type, i.e. to a 32-bit integer for all relevant targets.

There's also .4byte directive which is guaranteed to be 4 bytes long. https://sourceware.org/binutils/docs/as/4byte.html

@rui314
Copy link
Owner

rui314 commented May 11, 2022

Adding a new specific command line option, just like for --build-id, is the right thing for long-term purposes.

I agree with this.

@MaskRay
Copy link

MaskRay commented May 24, 2022

I agree with the comment #505 (comment) and question the necessity of a new linker option. See also https://sourceware.org/pipermail/binutils/2022-May/120985.html

A problem with an assembly language is that it's not cross platform even if your assembly contains only data directives and not code. I believe it's safer to create an empty object file and then copy a text to a .note.package using objcopy command if you take that route.

.section .package.meta,"a",%note, .balign 4, and .long are portable assemblyin GNU as. To work with more obscure targets (but not needed for Debian or Fedora since they don't have such ports), add a tab before .section.

In binutils-gdb, gas/read.c:potable lists portable directives.

See also https://src.fedoraproject.org/rpms/package-notes/pull-request/2 , there is a claim that adding a linker option makes more packages build than using a synthesized relocatable object file.
This has been repeated many times but I have never seen evidence.
Note: if some packages don't respect CFLAGS/LDFLAGS or use it incorrectly, from distribution maintenance's perspective, it would be better fixing these packages so that future tuning of system CFLAGS/LDFLAGS will be smoother.

@rui314
Copy link
Owner

rui314 commented May 25, 2022

Even though I don't think --package-metadata is absolutely necessary, I find it convenient. With that option, we don't think about assembler directives that work on all platforms and instead just tell the linker to insert a given string to a specific section. And that option wouldn't do any harm; it wouldn't slows down the linker nor complicate its internal structure. So, adding this option doesn't seems like a bad idea.

@bluca
Copy link

bluca commented May 25, 2022

Even though I don't think --package-metadata is absolutely necessary, I find it convenient. With that option, we don't think about assembler directives that work on all platforms and instead just tell the linker to insert a given string to a specific section. And that option wouldn't do any harm; it wouldn't slows down the linker nor complicate its internal structure. So, adding this option doesn't seems like a bad idea.

Indeed - you had an implementation for mold right? Could you please open a PR? Happy to review it - thanks!

@rui314
Copy link
Owner

rui314 commented May 25, 2022

@bluca We do have 3a88295, but I'd like to wait for GNU ld to settle down on the option name, etc.

@bluca
Copy link

bluca commented May 25, 2022

@bluca We do have 3a88295, but I'd like to wait for GNU ld to settle down on the option name, etc.

Got it, will ping here once that goes through - note I picked --package-metadata as the name of the spec is Package Metadata, so that it's more search engine-friendly

@X547
Copy link

X547 commented May 25, 2022

Is .note.package used only by RedHat-like distros?

@bluca
Copy link

bluca commented May 25, 2022

No, it's a cross-distro specification, currently also used in CBL-Mariner. We plan to push for adoption in other distributions too in the near future.

@bluca
Copy link

bluca commented May 26, 2022

@rui314 this is now merged in ld: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=9e2bb0cb5e74aed4158f08495534922d7108f928
I'll also work to add it to gold in the next few days

@rui314 rui314 closed this as completed in e9f6715 May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants