Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for the start feature #29633

Open
aturon opened this issue Nov 5, 2015 · 45 comments
Open

Tracking issue for the start feature #29633

aturon opened this issue Nov 5, 2015 · 45 comments
Labels
B-unstable Feature: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. S-tracking-design-concerns Status: There are blocking ❌ design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@aturon
Copy link
Member

aturon commented Nov 5, 2015

Tracking issue for #[start], which indicates that a function should be used as the entry point, overriding the "start" language item. In general this forgoes a bit of runtime setup that's normally run before and after main.

Open questions

@aturon aturon added T-lang Relevant to the language team, which will review and decide on the PR/issue. B-unstable Feature: Implemented in the nightly compiler and unstable. labels Nov 5, 2015
@alexcrichton
Copy link
Member

I believe the semantics of this today are that the compiler will generate a function with the symbol main which calls the #[start] function, if present, in an executable. This skips the #[lang = "start"] implementation, if any, in an upstream library (the standard library provides this to set up the first catch_panic among a few other minor things).

The signature for this function is also fn(isize, *const *const u8) -> isize, where the isize may no longer be "the most correct". Additionally, the *const *const u8 may not be the most appropriate option for Windows (although it works). I'm not 100% sure what "the best" signature on Windows is.

@retep998
Copy link
Member

retep998 commented Nov 9, 2015

On Windows the executable entry point does not take any arguments. Currently we let the CRT act as the executable entry point which then calls our Rust entry wrapper which invokes the start function which is either #[lang = "start"] which then invokes the user's main function pointer provided to it or a user provided #[start] function. Which executable entry point the linker decides to use depends on the /SUBSYSTEM and which main function it can find (https://msdn.microsoft.com/en-us/library/f9t8842e.aspx). All information provided to the main function by the CRT can be obtained through alternative means. Note that if we eventually provide an option for /SUBSYSTEM:Windows that main function takes a very different set of (useless) arguments than the traditional main (https://msdn.microsoft.com/en-us/library/windows/desktop/ms633559%28v=vs.85%29.aspx).

@SimonSapin
Copy link
Contributor

html5ever uses in an ugly hack that overrides the main function used by cargo test in order to generate thousands of tests dynamically. (Tests with the same code but parameterized on (input, expected result).)

This would be better solved by some way to override the test harness used by cargo test (Is there an issue/RFC for that already?)

@alexcrichton
Copy link
Member

@SimonSapin the use case for that with Cargo should in theory be harness = false, but I'm curious how that interacts with #[start]?

@SimonSapin
Copy link
Contributor

The #[start] trick doesn’t work anymore, but it looked like this: servo/html5ever@df8e749

Is there a tracking issue for harness = false?

@alexcrichton
Copy link
Member

Oh that's already implemented today, if a test target is listed as harness = false then Cargo just won't pass --test when compiling it and expects it to be a binary.

(this may be a bit off-topic from #[start] though so feel free to ping me on IRC)

@mahkoh
Copy link
Contributor

mahkoh commented Jan 7, 2016

The current signature for the lang item is

fn lang_start(main: *const u8, argc: isize, argv: *const *const u8) -> isize {

which is called by a generated main function. Instead, the signatures of both should be arbitrary and the symbols translate to main directly. This allows the main function to be platform dependent. A pointer to the user's main function can be obtained via an intrinsic.

@retep998
Copy link
Member

retep998 commented Jan 7, 2016

Note that on windows it really shouldn't always be main. If the user sets the subsystem to windows instead of console, then the CRT expects to find WinMain which results in a linker error because it wasn't defined.

@steveklabnik
Copy link
Member

#20064 suggests that the signature here is wrong, we should consider this before making this feature stable.

@comex
Copy link
Contributor

comex commented Jun 7, 2016

Just to clarify, it's not just a question of what signature lang_start should have; rustc currently generates C entry points (main) that only work "by accident" (because on 32-bit platforms isize = c_int and on 64-bit the calling conventions happen to work out). On my Mac, I get:

define i64 @main(i64, i8**) unnamed_addr {

Of course this would be an easy fix.

@retep998
Copy link
Member

retep998 commented Jun 7, 2016

Just don't stabilize this until consideration is taken for subsystems, which change the entry point completely from main to WinMain and are really important for Rust to support if it wants to be used in the Windows world (several people have spoken to me and said this is one of the issues getting in the way of them using Rust in production on Windows)

@pravic
Copy link
Contributor

pravic commented Jun 7, 2016

Entry point name is irrelevant for windows apps actually. Ability to specify subsystem is one of important things to create application, because most of them are gui with "windows" subsystem.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
bors added a commit that referenced this issue Oct 1, 2017
Fix native main() signature on 64bit

Hello,

in LLVM-IR produced by rustc on x86_64-linux-gnu, the native main() function had incorrect types for the function result and argc parameter: i64, while it should be i32 (really c_int). See also #20064, #29633.

So I've attempted a fix here. I tested it by checking the LLVM IR produced with --target x86_64-unknown-linux-gnu and i686-unknown-linux-gnu. Also I tried running the tests (`./x.py test`), however I'm getting two failures with and without the patch, which I'm guessing is unrelated.
@AndrewGaspar
Copy link
Contributor

Pinging this thread since it seems inactive. I wanted to express interest in this feature being stabilized. I was really excited when I discovered that you could replace the entry point in Rust, and then real bummed when I found out that you could only do it on nightly.

@glandium
Copy link
Contributor

glandium commented Mar 2, 2018

It's also necessary to write #![no_std] programs.

@glandium
Copy link
Contributor

glandium commented Mar 2, 2018

I would expect #[start] to pass the /ENTRY argument to link.exe, but it apparently doesn't. Although it's worth noting that the function signature for an entry point on Windows is different anyways, so one would need a different #[start] function there.

@clarfonthey
Copy link
Contributor

clarfonthey commented Mar 5, 2018

It's also necessary to write #![no_std] programs.

Speaking of which… is there any reason for this? We could make a version of Termination for libcore and remove all of the system-specific stuff from the shim. We'd also want some form of env::Args for libcore, but I'd argue that this is reasonable, especially if it's very bare-bones.

Personally, instead of offering #[start], I think that it makes more sense to be able to prune down the existing start shim by removing some of its guarantees. For example, aborting on panics, disallowing multithreading, and disabling stack protection would be enough.

@retep998
Copy link
Member

retep998 commented Mar 5, 2018

@glandium The binary entry point is completely different from #[start]. Unless you don't want to link to the CRT at all, you'll probably want the binary entry point to remain the mainCRTStartup provided by the CRT, otherwise the C RunTime won't be initialized!

@clarcharr Adding some form of env::Args to libcore is a bad idea, as on Windows it requires calling system functions and heap allocating which is firmly in the realm of libstd. On the other hand, because it is only a system function, Windows users don't need to ask libcore/libstd for the args and can just go through winapi themselves. Unix users would still need to get argc/argv somehow...

@Amanieu
Copy link
Member

Amanieu commented Mar 5, 2018

I've just been using #[no_main] in my no_std programs. This completely eliminates the default entry point logic.

I just define my main function with #[no_mangle] and have it called by my initialization code.

@clarfonthey
Copy link
Contributor

@retep998 Oh, I didn't know that. I think that in that case, it makes sense to simply have a MainArgs opaque struct which encapsulates argc and argv or nothing if they're not available.

That was initially the idea but I didn't realise how windows did things.

@glandium
Copy link
Contributor

glandium commented Mar 15, 2018

I've just been using #[no_main] in my no_std programs. This completely eliminates the default entry point logic.

Indeed, it surprisingly works on linux, mac and even windows, with a #[no_mangle], pub extern "C" fn main(...) -> isize. (edit: maybe actually i32?)

@japaric
Copy link
Member

japaric commented Mar 15, 2018

#[no_mangle] is not type safe so I don't consider it a proper user facing way to set the entry point. It also doesn't require unsafe which makes it not obvious that it's extremely dangerous to get the type signature wrong. This is "safe" and segfaults:

#![no_main]

#[no_mangle]
pub fn main(args: Vec<String>) {
    for arg in args {
        println!("{}", arg);
    }
}

@ketsuban
Copy link
Contributor

The current signature for #[start] functions isn't great in embedded contexts either. There's nothing to pass arguments to a Game Boy Advance game or read a return value—the only way execution is going to end is if the player switches the game off.

@nacaclanga
Copy link

On way to implement a plattform specific design is to have a #[bin_entry = "main"] attribute that must be attached to exactly one item in the dependency tree, for a crate to be compilable as bin (unless the no_main feature is also provided). The function signature can be arbitry, the uses is responsible for giving the right signature. When the crate is compiled as libary, it is ignored. When the crate is compiled as binary, it acts as a #[link_name = "main"] attribute. Rust main function is provided as an intrinsic so it can be called from the bin_entry function. A set implementation for this is given in std for common targets

@bitwalker
Copy link

bitwalker commented Oct 26, 2021

I think this issue is the best place to discuss something I haven't really seen brought up anywhere, in fact @nacaclanga is the first person I've seen mention the idea of an intrinsic to initialize the runtime from a custom main function, which would certainly have solved an issue I've encountered.

My use case is in an ahead-of-time compiler, written in Rust, that compiles Erlang source code to a native executable (or if desired, to a static/dynamic lib), and links against a runtime, also written in Rust, that provides the "real" entry point of the executable. The compiler is based on LLVM, and its linker is largely based on the Rust linker. The issue I ran into relatively far into the implementation, was the question of how to link my Rust runtime to the executable generated by the compiler while ensuring that the Rust standard library runtime gets properly initialized. Put another way, I want the executable to start as if it was compiled by rustc with my runtime crate as the entry point. I found a way to make it work, but dear lord is it disgusting:

NOTE: This is currently being compiled against a slightly older version of Rust, and I imagine there has been enough movement upstream that this no longer compiles as-is on the most recent nightly without a few changes, but the point is just to convey what I've had to do to handle this. I'm just getting back into it recently after having been away for most of this year.

///! This code is from the core runtime crate, it is linked in to compiled executables to act as the main entry point
#![feature(main)]

mod atoms;
mod symbols;

extern "Rust" {
    /// We support linking against different high-level runtime implementations, they are required to
    /// export an `rt_entry` function.
    #[link_name = "rt_entry"]
    fn rt_entry() -> i32;

    /// The symbol `__rt_lang_start_internal` is generated by our compiler by first discovering the
    /// name of the real Rust lang_start_internal symbol of the Rust toolchain it is built against, and then generating
    /// code to act as a shim for calling that symbol, but exported with a consistent name
    #[link_name = "__rt_lang_start_internal"]
    fn lang_start(main: &dyn Fn() -> i32, argc: isize, argv: *const *const i8) -> isize;
}

#[no_mangle]
pub extern "C" fn main(argc: i32, argv: *const *const std::os::raw::c_char) -> i32 {
    unsafe { lang_start(&move || main_internal(), argc as isize, argv) as i32 }
}

/// The primary entry point for our runtime
///
/// This function is responsible for setting up any core functionality required
/// by the higher-level runtime, e.g. initializing the atom table. Once initialized,
/// this function invokes the platform-specific entry point which handles starting
/// up the scheduler and other high-level runtime functionality.
#[main]
pub fn main_internal() -> i32 {
    use crate::atoms::*;
    use crate::symbols::*;

    // Initialize atom table
    if unsafe { InitializeAtomTable(ATOM_TABLE, NUM_ATOMS) } == false {
        return 102;
    }

    // Initialize the dispatch table
    if unsafe { InitializeDispatchTable(SYMBOL_TABLE, NUM_SYMBOLS) } == false {
        return 103;
    }

    // Invoke platform-specific entry point
    unsafe { rt_entry() }
}

This honestly isn't even the worst part, that award probably goes to what I do to extract the real lang_start_internal symbol name, or maybe the code generation to allow calling that function as _rt_lang_start_internal. It would be awesome if instead of having to do any of that, I could just call the lang start function directly from my own main.

I'm a bit surprised this isn't a more commonly requested use case, particular for ahead-of-time compiler projects. I wouldn't be surprised if there are other situations in which you want to compile a Rust crate as a library that can be linked together with other code to form an executable while retaining the ability to properly initialize the standard library. I'm entirely unclear on how this can work at all today with Rust crates that are linked into C/C++ programs, perhaps it doesn't, or maybe the functionality used by those programs don't depend on standard library features that require initialization (is that even possible?). In my case, I rely on things like the ability to ask for args passed to the executable, the panic handler, the main thread setup + stack guard, etc., for my runtime.

Right now (AIUI) the situation is basically all or nothing, you either compile your binary with rustc (obviously not an option for my project), or you completely opt out of using the standard library, i.e. compiling with #[no_std] and providing your own main, just like I've seen documented in the various embedded projects that do similar things. Since one has full control over startup in the latter case, I see no reason why I can't link in the standard library and invoke the lang start function directly just like rustc does (and which I'm effectively doing via the above gross hack), as long as the invariants are documented and upheld by callers. I'm absolutely fine with that always being feature-gated, or an unsafe intrinsic; I just want a less perverse way of doing something that, at least to me, seems pretty reasonable.

I hope this is the right place for bringing this up, and that having an example of a concrete use case is helpful in further discussions of this type. If you have noticed something that I seem to be missing, definitely let me know, if I can do what I want to do within the bounds of currently supported features, I'm all ears. For now, I'm just crossing my fingers that my dirty hack continues to work until a more permanent solution becomes available.

@joshtriplett joshtriplett added the S-tracking-design-concerns Status: There are blocking ❌ design concerns. label Dec 8, 2021
@joshtriplett
Copy link
Member

We'd like to stabilize this, but we need to evaluate the function signature and what commitments we make for which parts of the standard library work.

@Ayush1325
Copy link
Contributor

I would also like to have a way to define a custom function that replaces the crt0 and basically allows calling the lang_start after the platform stuff is done.

I am currently trying to port the std to UEFI targets. It expects a function with the following signature:

#[export_name = "efi_main"]
pub extern "C" fn main(_h: efi::Handle, st: *mut efi::SystemTable) -> efi::Status {
    ...
}

I would like the users to be able to just use the normal main(). But since even lang_start has a predefined type signature, I need to run a function even before that to get the SystemTable and setup some stuff. At the same time, I would prefer that people using std at least do not have to do all this themselves.

So I do hope that we can get a standardised way to achieve this, since most of the firmware development probably doesn't care about the arguments passed/

@RalfJung
Copy link
Member

The following interesting testcase came up in #97049 related to this feature:

#![feature(start)]

#[start]
fn start(_: isize, _: *const *const u8) -> isize { panic!(); }

I think this program has UB, since there is no Rust runtime here to catch the panic, meaning we are unwinding past the top of the stack. So this should be noted somewhere with the documentation of this feature.

@Amanieu
Copy link
Member

Amanieu commented Jun 19, 2022

This can easily be solved by making the #[start] function nounwind like extern "C".

@RalfJung
Copy link
Member

And then the automatic abort-on-unwind logic will kick in? Yeah, that makes sense -- but would also need documentation.

bors bot added a commit to intellij-rust/intellij-rust that referenced this issue Feb 3, 2023
10066: ANN: Add support for E0131, E0197, E0203 r=vlad20012 a=kuksag

<!--
Hello and thank you for the pull request!

We don't have any strict rules about pull requests, but you might check
https://github.com/intellij-rust/intellij-rust/blob/master/CONTRIBUTING.md
for some hints!

Also, please write a short description explaining your change in the following format: `changelog: %description%`
This description will help a lot to create release changelog. 
Drop these lines for internal only changes

:)
-->

changelog:

* Add support for E0131
Error code reference: https://doc.rust-lang.org/error_codes/E0131.html
There's a feature that might be connected to this error code: rust-lang/rust#29633

* Add support for E0197
Error code reference: https://doc.rust-lang.org/error_codes/E0197.html

* Add support for E0203
Error code reference: https://doc.rust-lang.org/error_codes/E0203.html
Compiler implementation: https://github.com/rust-lang/rust/blob/master/compiler/rustc_hir_analysis/src/astconv/mod.rs#L877


Co-authored-by: kuksag <georgiy.kuksa@gmail.com>
@mikeleany
Copy link
Contributor

mikeleany commented Apr 4, 2023

From the original post:

Tracking issue for #[start], which indicates that a function should be used as the entry point, overriding the "start" language item. In general this forgoes a bit of runtime setup that's normally run before and after main.

Has anyone actually used this successfully for the purpose stated above? From my testing (using bare-metal targets), it doesn't seem to define the entry point at all. It silences the following error: "error: requires start lang_item", but doesn't define an entry point.

What advantage does this (as currently implemented) even provide over using #![no_main] and defining the entry point with #[no_mangle] or #[export_name = "_start"]?

@bjorn3
Copy link
Member

bjorn3 commented Apr 4, 2023

From my testing (using bare-metal targets), it doesn't seem to define the entry point at all.

It defines the main function. The CRT defines _start and is expected to call main after libc has been initialized. On bare metal targets there is no libc, so directly defining _start makes sense unless you need an assembly trampoline to setup eg the stack pointer. And even then you can use #![no_main] + #[no_mangle] to define the function that the trampoline will call.

@mikeleany
Copy link
Contributor

From my testing (using bare-metal targets), it doesn't seem to define the entry point at all.

It defines the main function. The CRT defines _start and is expected to call main after libc has been initialized. On bare metal targets there is no libc, so directly defining _start makes sense unless you need an assembly trampoline to setup eg the stack pointer. And even then you can use #![no_main] + #[no_mangle] to define the function that the trampoline will call.

So, if I understand you correctly, you're saying that the description of this feature is wrong — that it was never intended to override the entry point at all, but to override the main function instead, on the assumption that you are linking to the C runtime (or something else that defines the entry point and calls a C-like main function).

Also, if it's meant to be called from the CRT, shouldn't it require an extern "C" function instead of requiring the Rust ABI as it currently does? I guess changing that would also solve the unwinding issue mentioned by @RalfJung.

@bjorn3
Copy link
Member

bjorn3 commented Apr 5, 2023

that it was never intended to override the entry point at all, but to override the main function instead, on the assumption that you are linking to the C runtime (or something else that defines the entry point and calls a C-like main function).

main is the user facing entry point in C. _start is an implementation detail on ELF systems and doesn't exist on Windows at all.

Also, if it's meant to be called from the CRT, shouldn't it require an extern "C" function instead of requiring the Rust ABI as it currently does?

Rustc actually won't rename the function annotated with #[start] to main. Instead it generates a main function with extern "C" which calls this function. Also on some targets like UEFI it generates a differently named function with a different calling concention as appropriate when the entrypoint on that target has a different name and/or calling convention.

@mikeleany
Copy link
Contributor

main is the user facing entry point in C.

That still makes wording like "indicates a function should be used as the entry point" ambiguous at best, and very confusing (as can be seen in previous discussions in this tracking issue). My suggestion, now that I understand what this feature is really intended for, is simply that the description of this feature needs to be clarified.

_start is an implementation detail on ELF systems and doesn't exist on Windows at all.

As far as I understand, Windows still has an entry point equivalent to _start, but just names it differently. In fact, even when using ELF object files, the entry point doesn't have to be called _start, though that's the common default. But yes, the naming of such startup routines is just an implementation detail.

Also, if it's meant to be called from the CRT, shouldn't it require an extern "C" function instead of requiring the Rust ABI as it currently does?

Rustc actually won't rename the function annotated with #[start] to main. Instead it generates a main function with extern "C" which calls this function. Also on some targets like UEFI it generates a differently named function with a different calling concention as appropriate when the entrypoint on that target has a different name and/or calling convention.

Ah, I see.

@Nilstrieb
Copy link
Member

Nilstrieb commented May 1, 2024

I think this issue should be closed and #[start] should be deleted. It's nothing but an accidentally leaked implementation detail that's a not very useful mix between "portable" entrypoint logic and bad abstraction.

I think the way the stable user-facing entrypoint should work (and works today on stable) is pretty simple:

  • std-using cross-platform programs should use fn main(). the compiler, together with std, will then ensure that code ends up at main (by having a platform-specific entrypoint that gets directed through lang_start in std to main - but that's just an implementation detail)
  • no_std platform-specific programs should use #![no_main] and define their own platform-specific entrypoint symbol with #[no_mangle], like main, _start, WinMain or my_embedded_platform_wants_to_start_here. most of them only support a single platform anyways, and need cfg for the different platform's ways of passing arguments or other things anyways

#[start] is in a super weird position of being neither of those two. It tries to pretend that it's cross-platform, but its signature is a total lie. Those arguments are just stubbed out to zero on Windows, for example. It also only handles the platform-specific entrypoints for a few platforms that are supported by std, like Windows or Unix-likes. my_embedded_platform_wants_to_start_here can't use it, and neither could a libc-less Linux program.
So we have an attribute that only works in some cases anyways, that has a signature that's a total lie (and a signature that, as I might want to add, has changed recently, and that I definitely would not be comfortable giving any stability guarantees on), and where there's a pretty easy way to get things working without it in the first place.

@RalfJung
Copy link
Member

RalfJung commented May 1, 2024

Miri currently relies on #[start] to support running no-std binaries, but that could fairly easily be switched to a different scheme like the one described here. (I don't think I want to implement support for all the platform-specific start functions in Miri...)

@scottmcm
Copy link
Member

scottmcm commented May 1, 2024

Lang-nominated for the proposal in #29633 (comment)

@scottmcm scottmcm added the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label May 1, 2024
@bitwalker
Copy link

@Nilstrieb how would you propose projects that need to initialize the Rust runtime as I described in #29633 (comment) and subsequently raised by @Ayush1325? As far as I can tell, there is still no solution for this, and because much of what is handled by lang_start is also private implementation detail, there is no effective way to support use of the Rust standard library without it.

I'm a bit surprised that there has been zero discussion around use cases like this, as rare as they may be, it is still something that someone providing their own crt0/start function must do in order to play well with Rust code that builds against libstd. I'm not suggesting that start needs to exist, but I would like to have some "official" guidance on how the Rust runtime can be initialized when providing your own start function. The gross hack I described in my original comment is perhaps the only option, but it really seems like this is something that would be straightforward to support properly, but I'm sure there are aspects of this that I'm not considering.

Anyhow, wanted to bring this up one last time before this thread effectively dies.

@Lokathor
Copy link
Contributor

Lokathor commented May 1, 2024

That is probably best handled with a new RFC cycle to hammer out the details.

@Nilstrieb
Copy link
Member

Nilstrieb commented May 2, 2024

@bitwalker it you're an AOT compiler you can just hook C main by overriding it with your own and then calling the C main defined by Rust (with some trickery). No language feature required.
I don't think removing this feature warrants an RFC, it has never had one to be added either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Feature: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. S-tracking-design-concerns Status: There are blocking ❌ design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests