Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[used] attribute #39987

Merged
merged 10 commits into from Apr 7, 2017
Merged

#[used] attribute #39987

merged 10 commits into from Apr 7, 2017

Conversation

japaric
Copy link
Member

@japaric japaric commented Feb 20, 2017

(For an explanation of what this feature does, read the commit message)

I'd like to propose landing this as an experimental feature (experimental as in:
no clear stabilization path -- like asm!, #[linkage]) as it's low
maintenance (I think) and relevant to the "Usage in resource-constrained
environments" exploration area.

The main use case I see is running code before main. This could be used, for
instance, to cheaply initialize an allocator before main where the alternative
is to use lazy_static to initialize the allocator on its first use which it's
more expensive (atomics) and doesn't work on ARM Cortex-M0 microcontrollers (no
AtomicUsize on that platform)

Here's a std example of that:

unsafe extern "C" fn before_main_1() {
    println!("Hello");
}

unsafe extern "C" fn before_main_2() {
    println!("World");
}

#[link_section = ".init_arary"]
#[used]
static INIT_ARRAY: [unsafe extern "C" fn(); 2] = [before_main_1, before_main_2];

fn main() {
    println!("Goodbye");
}
$ rustc -C lto -C opt-level=3 before_main.rs
$ ./before_main
Hello
World
Goodbye

In general, this pattern could be used to let dependencies run code before
main (which sounds like it could go very wrong in some cases). There are
probably other use cases; I hope that the people I have cc-ed can comment on
those.

Note that I'm personally unsure if the above pattern is something we want to
promote / allow and that's why I'm proposing this feature as experimental. If
this leads to more footguns than benefits then we can just axe the feature.

cc @nikomatsakis ^ I know you have some thoughts on having a process for
experimental features though I'm fine with writing an RFC before landing this.

  • dead_code lint will have to be updated to special case #[used] symbols.

  • Should we extend #[used] to work on non-generic functions?

cc rust-lang/rfcs#1002
cc rust-lang/rfcs#1459
cc @dpc @JinShil

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @arielb1 (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@japaric
Copy link
Member Author

japaric commented Feb 20, 2017

cc @frehberg who also wanted this

@durka
Copy link
Contributor

durka commented Feb 20, 2017

Can you explain that code example some more? You didn't actually put #[used] in it anywhere, and those functions seem to be running even though they were never called. What dark magic are you proposing to add here?

@TimNN
Copy link
Contributor

TimNN commented Feb 20, 2017

@japaric: I'm a bit confused after reading the PR description:

  • What does this PR actually do? After looking at the diff, this PR adds the #[used] attribute which marks symbols as llvm.used?

  • (what @durka said (Especially the dark magic bit!))

  • Should we extend #[used] to work on non-generic functions?

    Was this meant to be ... on generic functions?? (Otherwise I'm a bit confused). (Or is the attribute applied to statics and not functions?)

@durka
Copy link
Contributor

durka commented Feb 20, 2017

The answer to my question is that #[link_section = ".init_array"] (OSX equivalent #[link_section = "__DATA,__mod_init_func"]) should be applied to the static in the example code. Then the claimed output is achieved. Optimizations throw away the static, presumably #[used] (on the static) would preserve it.

@durka
Copy link
Contributor

durka commented Feb 20, 2017

This seems to be an implementation of a closed RFC, which is weird. Shouldn't we go through the regular procedure for adding features?

@parched
Copy link
Contributor

parched commented Feb 20, 2017

@japaric
Copy link
Member Author

japaric commented Feb 20, 2017

@durka Sorry about the confusion, added the missing attributes to the example. (I opened the pull request from the command line and the interface to write the PR message strips lines that start with # -- I forgot about that detail)

@TimNN It used the mechanism described here. The commit message has another example.

@nagisa
Copy link
Member

nagisa commented Feb 20, 2017

In general, this pattern could be used to let dependencies run code before
main (which sounds like it could go very wrong in some cases). There are
probably other use cases; I hope that the people I have cc-ed can comment on
those.
Note that I'm personally unsure if the above pattern is something we want to
promote / allow and that's why I'm proposing this feature as experimental. If
this leads to more footguns than benefits then we can just axe the feature.

Life before main is very consciously not supported by Rust. Being able to put arbitrary stuff into arbitrary sections for sure seems useful to me, though in general I feel like it should be more prominently unsafe.

@dpc
Copy link
Contributor

dpc commented Feb 20, 2017

I'm happy to someone made an effort to come up with implementation. I understand reluctance to include it, and maybe it should go through formal RFE, but I hope it will at least give a green light to purse it.

Lack of mechanism like this makes some embedded (and not only embedded) things impossible in Rust, as compiler keeps removing stuff we want to retain for various reasons. There's a reason why both GCC and LLVM have a method to retain unused symbols.

The main use case I see is running code before main

Some people need it to preserve build ids, or other artefacts, otherwise not directly referenced from the code. I needed it to implement a self-test for embedded OS, where test functions marked with a macro are gather in one section, and called sequentially on "boot" (so not before main, but in similar fashion). The same mechanism is used by Linux kernel (and other kernels) to implement init system or module initialization.

@dpc
Copy link
Contributor

dpc commented Feb 20, 2017

Being able to put arbitrary stuff into arbitrary sections for sure seems useful to me, though in general I feel like it should be more prominently unsafe.

It's not unsafe according to Rust definition. And just to be clear: link_section is already implemented in Rust (and quite necessary for embedded software), but severely crippled by the fact that one can't make sure what has been put there, will actually be there. :)

@nagisa
Copy link
Member

nagisa commented Feb 20, 2017

It's not unsafe according to Rust definition.

Oh, of course it is unsafe – it allows putting arbitrary bytes into executable sections. This includes calling unsafe functions without explicit unsafe annotation (as in example above), dereferencing raw pointers without explicit unsafe annotation, or pretty much anything else, really.

I feel its more of an oversight that link_section is stable in its current form than anything else really. But that’s not really related to the PR at hand anyway.

@dpc
Copy link
Contributor

dpc commented Feb 20, 2017

Oh, of course it is unsafe – it allows putting arbitrary bytes into executable sections.

I guess technically it is unsafe to allow putting arbitrary things into initialization section (.init_arary) because they get called sequentially by the platform on start. Putting arbitrary data into other sections, like executable (.code) wouldn't be a problem. You can put bunch of random data between every function in the executable and it just won't be ever called or referenced. To touch it you would have to use some unsafe methods.

However this PR is about adding #[used] as #[linked_section(...)] is already implemented.

@durka
Copy link
Contributor

durka commented Feb 21, 2017

@japaric from your commit message:

Note that the linker knows nothing about #[used] and will drop LIVE
because no other object references to it.

Now I'm confused again. What is the point if this still doesn't defeat the linker?

@dpc
Copy link
Contributor

dpc commented Feb 21, 2017

Interesting. I'm confused as well. From http://llvm.org/docs/LangRef.html#the-llvm-used-global-variable:

"a symbol appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the symbol that it cannot see(...)"

@durka
Copy link
Contributor

durka commented Feb 21, 2017

Putting test::black_box(&INIT_ARRAY); at the beginning of main is another way to preserve the symbol, but I know that's implemented with ASM so I don't know if that covers the embedded use case.

@japaric
Copy link
Member Author

japaric commented Feb 21, 2017

@durka

What is the point if this still doesn't defeat the linker?

The point is not to "defeat" the linker but to collaborate with it. That's why you put the #[used] static in a section that the linker will preserve (i.e. KEEP in the linker script). Without #[used] the statics won't even make it to the linker.

Putting test::black_box(&INIT_ARRAY); at the beginning of main

The example in the description is contrived. The proper use case is that you have multiple crates and you want some symbols to make it to the final binary (you mark those as #[used]) but you want those symbols, which would be part of dependencies/libraries, to not be part of the public API of those dependencies/libraries. Thus you can't use the black_box trick in the top (binary) crate.

@dpc

"a symbol appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the symbol that it cannot see(...)"

I can't speak for LLVM's documentation but the current implementation is the same as clang's. clang's __attribute((used))__-ed items are not treated specially by the linker and will be removed if unused / not referenced. For example:

__attribute__((used))
static const int USED;

static const int UNUSED;

int main() {}
#  NOTE(--gc-sections) matches rustc default
$ clang -Wl,--gc-sections foo.c

$ nm -C a.out | grep USED
# USED does make it to the object file (which is the point of this feature)
$ clang -O3 -c foo.c

$ nm -C foo.o | grep USED
0000000000000000 r USED

@codyps
Copy link
Contributor

codyps commented Feb 21, 2017

@dpc are you thinking of gcc's externally_visible, perhaps? avr-llvm/clang#5 https://bugs.llvm.org//show_bug.cgi?id=16683 gcc docs .

Seems it might not be necessary due to the lack of more aggressive LTO in LLVM vs gcc? Or maybe it's just clang?

@dpc
Copy link
Contributor

dpc commented Feb 21, 2017

I can't speak for LLVM's documentation but the current implementation is the same as clang's. clang's attribute((used))-ed items are not treated specially by the linker and will be removed if unused / not referenced.

Oh. Last time I was using this gathering stuff in section mechanism was C compiled with GCC, and I don't remember the details now. Maybe it was working due to the fact we were not using LTO, so linker was not that aggressive, or we marked it in the linker script. With Rust I never got this to work, as I was blocked on not being able to emit symbols marked with llvm.used. I think it's OK to assume that #[used] by itself does not solve all the pieces of the puzzle, and might be only one of necessary steps.

@japaric Is KEEP in linker + #[used] enough?

@cbiffle
Copy link
Contributor

cbiffle commented Feb 21, 2017 via email

@nikomatsakis
Copy link
Contributor

@japaric I have a question about your initial example. Is this .init_array section something known to the embedded linker or runtime or something? Or just a handy hook that some other bit of user code will invoke before calling main()?

Regarding unstable features, I have had thoughts about having such a process, but I never got as far as writing it down. I think the general idea was going to be that one writes an RFC defining your goals and motivation. If we agree that these cannot be achieved out of tree, we would allow you to experiment in tree, but a proper (and full) RFC is required for the final result of that experimentation. i.e., you don't go directly from "experiment" to "stable feature", but rather you go from "experimental" to "unstable, on path to stabilization" and then "stable". In a sense this is always the path, but typically this "experimental" phase takes place out of tree (e.g., in a library).

@dpc
Copy link
Contributor

dpc commented Feb 21, 2017

@nikomatsakis .init_array is a standard mechanism in ELF files, I believe. It will be traversed and executed on by one, by linker on dynamic linking or executable start. It is platform and executable format specific, I'd say. Note - I am not 100% sure about the details, so I reserve a right to be wrong. :)

@nikomatsakis
Copy link
Contributor

@dpc ok =) I thought it must be something like that.

@codyps
Copy link
Contributor

codyps commented Feb 21, 2017

.init_array: The typical name for the section storing DT_INIT_ARRAY. Part of SysV ABI, and this piece is very widely used even in cases without dynamic linking.

Just an array of function pointers. Only as arch-specific as function pointers are.

@durka
Copy link
Contributor

durka commented Feb 21, 2017 via email

as it's specific to ELF and won't pass on macOS / Windows
@japaric
Copy link
Member Author

japaric commented Apr 6, 2017

Thanks for the info, @eddyb.

I'm just going to ignore the test because I don't have hardware to tests the other methods for macOS / Windows.

@bors r=arielb1

@bors
Copy link
Contributor

bors commented Apr 6, 2017

📌 Commit f4f79c3 has been approved by arielb1

@bors
Copy link
Contributor

bors commented Apr 7, 2017

⌛ Testing commit f4f79c3 with merge cc966cf...

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 7, 2017
(For an explanation of what this feature does, read the commit message)

I'd like to propose landing this as an experimental feature (experimental as in:
no clear stabilization path -- like `asm!`, `#[linkage]`) as it's low
maintenance (I think) and relevant to the "Usage in resource-constrained
environments" exploration area.

The main use case I see is running code before `main`. This could be used, for
instance, to cheaply initialize an allocator before `main` where the alternative
is to use `lazy_static` to initialize the allocator on its first use which it's
more expensive (atomics) and doesn't work on ARM Cortex-M0 microcontrollers (no
`AtomicUsize` on that platform)

Here's a `std` example of that:

``` rust

unsafe extern "C" fn before_main_1() {
    println!("Hello");
}

unsafe extern "C" fn before_main_2() {
    println!("World");
}

static INIT_ARRAY: [unsafe extern "C" fn(); 2] = [before_main_1, before_main_2];

fn main() {
    println!("Goodbye");
}
```

```
$ rustc -C lto -C opt-level=3 before_main.rs
$ ./before_main
Hello
World
Goodbye
```

In general, this pattern could be used to let *dependencies* run code before
`main` (which sounds like it could go very wrong in some cases). There are
probably other use cases; I hope that the people I have cc-ed can comment on
those.

Note that I'm personally unsure if the above pattern is something we want to
promote / allow and that's why I'm proposing this feature as experimental. If
this leads to more footguns than benefits then we can just axe the feature.

cc @nikomatsakis ^ I know you have some thoughts on having a process for
experimental features though I'm fine with writing an RFC before landing this.

- `dead_code` lint will have to be updated to special case `#[used]` symbols.

- Should we extend `#[used]` to work on non-generic functions?

cc rust-lang/rfcs#1002
cc rust-lang/rfcs#1459
cc @dpc @JinShil
@frewsxcv
Copy link
Member

frewsxcv commented Apr 7, 2017

Prioritizing the rollup that includes these changes

@bors retry

@bors
Copy link
Contributor

bors commented Apr 7, 2017

⌛ Testing commit f4f79c3 with merge ccdf891...

@bors
Copy link
Contributor

bors commented Apr 7, 2017

💔 Test failed - status-travis

@durka
Copy link
Contributor

durka commented Apr 7, 2017

[01:21:10] ---- [run-make] run-make/used stdout ----
[01:21:10] 	
[01:21:10] error: make failed
[01:21:10] status: exit code: 2
[01:21:10] command: "make"
[01:21:10] stdout:
[01:21:10] ------------------------------------------
[01:21:10] DYLD_LIBRARY_PATH="/Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/test/run-make/used.stage2-x86_64-apple-darwin:/Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/stage2/lib:" '/Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/stage2/bin/rustc' --out-dir /Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/test/run-make/used.stage2-x86_64-apple-darwin -L /Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/test/run-make/used.stage2-x86_64-apple-darwin  -C opt-level=3 --emit=obj used.rs
[01:21:10] nm -C /Users/travis/build/rust-lang/rust/build/x86_64-apple-darwin/test/run-make/used.stage2-x86_64-apple-darwin/used.o | grep FOO
[01:21:10] 
[01:21:10] ------------------------------------------
[01:21:10] stderr:
[01:21:10] ------------------------------------------
[01:21:10] warning: static item is never used: `FOO`
[01:21:10]   --> used.rs:15:1
[01:21:10]    |
[01:21:10] 15 | static FOO: u32 = 0;
[01:21:10]    | ^^^^^^^^^^^^^^^^^^^^
[01:21:10]    |
[01:21:10]    = note: #[warn(dead_code)] on by default
[01:21:10] 
[01:21:10] warning: static item is never used: `BAR`
[01:21:10]   --> used.rs:17:1
[01:21:10]    |
[01:21:10] 17 | static BAR: u32 = 0;
[01:21:10]    | ^^^^^^^^^^^^^^^^^^^^
[01:21:10]    |
[01:21:10]    = note: #[warn(dead_code)] on by default
[01:21:10] 
[01:21:10] nm: Unknown command line argument '-C'.  Try: '/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/nm -help'
[01:21:10] nm: Did you mean '-A'?
[01:21:10] make[1]: *** [all] Error 1

OSX nm doesn't seem to support demangling, apparently you need to use c++filt directly.

the nm in our macOS bots don't support that flag and it's not really required
@japaric
Copy link
Member Author

japaric commented Apr 7, 2017

Thanks for pointing out the exact error, @durka. I was having trouble spotting it.

@bors r=arielb1

@bors
Copy link
Contributor

bors commented Apr 7, 2017

📌 Commit 98037ca has been approved by arielb1

@bors
Copy link
Contributor

bors commented Apr 7, 2017

⌛ Testing commit 98037ca with merge b9c5197...

bors added a commit that referenced this pull request Apr 7, 2017
#[used] attribute

(For an explanation of what this feature does, read the commit message)

I'd like to propose landing this as an experimental feature (experimental as in:
no clear stabilization path -- like `asm!`, `#[linkage]`) as it's low
maintenance (I think) and relevant to the "Usage in resource-constrained
environments" exploration area.

The main use case I see is running code before `main`. This could be used, for
instance, to cheaply initialize an allocator before `main` where the alternative
is to use `lazy_static` to initialize the allocator on its first use which it's
more expensive (atomics) and doesn't work on ARM Cortex-M0 microcontrollers (no
`AtomicUsize` on that platform)

Here's a `std` example of that:

``` rust

unsafe extern "C" fn before_main_1() {
    println!("Hello");
}

unsafe extern "C" fn before_main_2() {
    println!("World");
}

#[link_section = ".init_arary"]
#[used]
static INIT_ARRAY: [unsafe extern "C" fn(); 2] = [before_main_1, before_main_2];

fn main() {
    println!("Goodbye");
}
```

```
$ rustc -C lto -C opt-level=3 before_main.rs
$ ./before_main
Hello
World
Goodbye
```

In general, this pattern could be used to let *dependencies* run code before
`main` (which sounds like it could go very wrong in some cases). There are
probably other use cases; I hope that the people I have cc-ed can comment on
those.

Note that I'm personally unsure if the above pattern is something we want to
promote / allow and that's why I'm proposing this feature as experimental. If
this leads to more footguns than benefits then we can just axe the feature.

cc @nikomatsakis ^ I know you have some thoughts on having a process for
experimental features though I'm fine with writing an RFC before landing this.

- `dead_code` lint will have to be updated to special case `#[used]` symbols.

- Should we extend `#[used]` to work on non-generic functions?

cc rust-lang/rfcs#1002
cc rust-lang/rfcs#1459
cc @dpc @JinShil
@bors
Copy link
Contributor

bors commented Apr 7, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: arielb1
Pushing b9c5197 to master...

@Centril
Copy link
Contributor

Centril commented Feb 23, 2018

Triaging rust-lang/rfcs#1002 - @japaric, what's the status of this wrt. stabilization and an RFC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet