From 23ffba214c7681f142eadc8d3454472f71a83474 Mon Sep 17 00:00:00 2001 From: kennytm Date: Tue, 1 Aug 2017 00:00:54 +0800 Subject: [PATCH 01/12] RFC: Semantic inlining (another try to the unwrap/expect line info problem) --- text/0000-inline-semantic.md | 993 +++++++++++++++++++++++++++++++++++ 1 file changed, 993 insertions(+) create mode 100644 text/0000-inline-semantic.md diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md new file mode 100644 index 00000000000..d39d1d65c73 --- /dev/null +++ b/text/0000-inline-semantic.md @@ -0,0 +1,993 @@ +- Feature Name: `inline_semantic`, `caller_location` +- Start Date: 2017-07-31 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +---- + +# Summary +[summary]: #summary + +Enable accurate caller location reporting during panic in `{Option, Result}::{unwrap, expect}` with +the following changes: + +1. Support the `#[inline(semantic)]` function attribute, which guarantees a function is inlined + before reaching LLVM. +2. Adds lang-item consts which retrieves the caller's source location. + +Example: + +```rust +#![feature(inline_semantic, caller_location)] +use core::caller; + +#[inline(semantic)] +fn unwrap(self) -> T { + panic!("{}:{}:{}: oh no", caller::FILE, caller::LINE, caller::COLUMN); +} + +let n: Option = None; +let m = n.unwrap(); +``` + + + +- [Summary](#summary) +- [Motivation](#motivation) +- [Guide-level explanation](#guide-level-explanation) + - [Let's reimplement `unwrap()`](#lets-reimplement-unwrap) + - [Semantic-inlining](#semantic-inlining) + - [Caller location](#caller-location) + - [Why do we use semantic-inlining](#why-do-we-use-semantic-inlining) +- [Reference-level explanation](#reference-level-explanation) + - [Survey of panicking standard functions](#survey-of-panicking-standard-functions) + - [Semantic-inlining MIR pass](#semantic-inlining-mir-pass) + - [Caller location lang-item](#caller-location-lang-item) + - [Source-location-forwarding methods](#source-location-forwarding-methods) + - [Runtime-free backtrace for `?` operator](#runtime-free-backtrace-for--operator) +- [Drawbacks](#drawbacks) + - [Code bloat](#code-bloat) + - [Does not support dynamic call](#does-not-support-dynamic-call) + - [Narrow solution scope](#narrow-solution-scope) +- [Rationale and alternatives](#rationale-and-alternatives) + - [Rationale](#rationale) + - [Alternatives](#alternatives) + - [🚲 Name of everything 🚲](#🚲-name-of-everything-🚲) + - [Avoid introducing new public items](#avoid-introducing-new-public-items) + - [Inline MIR](#inline-mir) + - [Default function arguments](#default-function-arguments) + - [Non-viable alternatives](#non-viable-alternatives) + - [Macros](#macros) + - [Backtrace](#backtrace) + - [`SourceContext` generic parameter](#sourcecontext-generic-parameter) +- [Unresolved questions](#unresolved-questions) + + + +# Motivation +[motivation]: #motivation + +It is well-known that the error message reported by `unwrap()` is useless. + +```text +thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335 +note: Run with `RUST_BACKTRACE=1` for a backtrace. +``` + +There have been numerous discussions ([a], [b], [c]) that wants `unwrap()` and friends to provide +the better information to locate the panic. Previously, [RFC 1669] attempted to address this by +introducing the `unwrap!(x)` macro to the standard library, but it was closed since the `x.unwrap()` +convention is too entrenched. + +This RFC tries to introduce line numbers into `unwrap()` without requiring users to adapt a new +idiom, i.e. the user should be able to see the precise location without changing any of the source +code. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +## Let's reimplement `unwrap()` + +`unwrap()` and `expect()` are two methods on `Option` and `Result` that are commonly used when you +are *absolutely sure* they only contains a successful value and you want to extract it. + +```rust +// 1.rs +use std::env::args; +fn main() { + println!("args[1] = {}", args().nth(1).unwrap()); + println!("args[2] = {}", args().nth(2).unwrap()); + println!("args[3] = {}", args().nth(3).unwrap()); +} +``` + +If the assumption is wrong, they will panic and tell you that an error is unexpected. + +```text +$ ./1 +thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', 1.rs:4:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./1 arg1 +args[1] = arg1 +thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', 1.rs:5:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./1 arg1 arg2 +args[1] = arg1 +args[2] = arg2 +thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', 1.rs:6:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./1 arg1 arg2 arg3 +args[1] = arg1 +args[2] = arg2 +args[3] = arg3 +``` + +Let's say you are unhappy with these built-in functions, e.g. you want to provide an alternative +error message: + +```rust +// 2.rs +use std::env::args; +pub fn my_unwrap(input: Option) -> T { + match input { + Some(t) => t, + None => panic!("nothing to see here, move along"), + } +} +fn main() { + println!("args[1] = {}", my_unwrap(args().nth(1))); + println!("args[2] = {}", my_unwrap(args().nth(2))); + println!("args[3] = {}", my_unwrap(args().nth(3))); +} +``` + +This trivial implementation, however, will only report the panic happens inside `my_unwrap`. This is +pretty useless, since it is the caller of `my_unwrap` that have made the wrong assumption! + +```text +$ ./2 +thread 'main' panicked at 'nothing to see here, move along', 2.rs:5:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./2 arg1 +args[1] = arg1 +thread 'main' panicked at 'nothing to see here, move along', 2.rs:5:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./2 arg1 arg2 +args[1] = arg1 +args[2] = arg2 +thread 'main' panicked at 'nothing to see here, move along', 2.rs:5:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./2 arg1 arg2 arg3 +args[1] = arg1 +args[2] = arg2 +args[3] = arg3 +``` + +The trivial solution would be requiring user to provide `file!()`, `line!()` and `column!()`. A +slightly more ergonomic solution would be changing `my_unwrap` to a macro, thus these constants can +be automatically provided. + +```rust +macro_rules! my_unwrap { + ($input:expr) => { + match $input { + Some(t) => t, + None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), + } + } +} +println!("args[1] = {}", my_unwrap!(args().nth(1))); +// ^ tell user to add an `!`. +... +``` + +But what if you have already published the `my_unwrap` crate that has thousands of users, and you +want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` has the same problem! + +## Semantic-inlining + +The reason a `my_unwrap!` macro works is because it copy-and-paste the entire content of its macro +definition everytime it is used. + +```rust +println!("args[1] = {}", my_unwrap!(args().nth(1))); +println!("args[2] = {}", my_unwrap!(args().nth(2))); +... + +// is equivalent to: + +println!("args[1] = {}", match args().nth(1) { + Some(t) => t, + None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), +}); +println!("args[2] = {}", match args().nth(2) { + Some(t) => t, + None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), +}); +... +``` + +What if we allow *normal functions* be able to copy-and-paste its content as well? This is called +**[inlining]**, which Rust and many other languages have supported this as an optimization step. If +the function is inlined, the compiler can also tell the inlined function the location it is inlined, +and thus able to report this to the user. + +Rust allows developers to use the `#[inline]` attribute to *hint* the optimizer that a function +should be inlined. However, if we want the precise caller location, a hint is not enough, it needs +to be a requirement. Therefore, the `#[inline(semantic)]` attribute is introduced. + +```rust +#![feature(inline_semantic)] + +#[inline(semantic)] // <-- new +pub fn my_unwrap(input: Option) -> T { + match input { + Some(t) => t, + None => panic!("nothing to see here, move along"), + } +} + +println!("args[1] = {}", my_unwrap(args().nth(1))); +println!("args[2] = {}", my_unwrap(args().nth(2))); + +// almost equivalent to: + +println!("args[1] = {}", match args().nth(1) { + Some(t) => t, + None => panic!("nothing to see here, move along"), +}); +println!("args[2] = {}", match args().nth(2) { + Some(t) => t, + None => panic!("nothing to see here, move along"), +}); +``` + +If you try this code above, you will find that the panic still occurs inside `my_unwrap`, not at the +caller's position. `#[inline(semantic)]` is still only a very-insistent inliner, so its behavior is +still like a normal function — the syntactic location of the `panic!` is still inside the `my_wrap`. +The Rust language provides a different way to obtain the caller location after the function is +inlined. + +## Caller location + +The core crate provides three magic constants `core::caller::{FILE, LINE, COLUMN}` which resolves to +the caller's location one the function is inlined. + +```rust +#![feature(inline_semantic, caller_location)] + +extern crate core; + +#[inline_semantic] +pub fn get_caller_loc() -> (&'static str, u32) { + (core::caller::FILE, core::caller::LINE) +} + +assert_eq!(get_caller_loc(), (file!(), line!())); +assert_eq!(get_caller_loc(), (file!(), line!())); +assert_eq!(get_caller_loc(), (file!(), line!())); +``` + +There is also a `caller_location!()` macro to return all three information as a single tuple. + +```rust +// 3.rs +#![feature(inline_semantic)] +use std::env::args; +#[inline(semantic)] // <-- +pub fn my_unwrap(input: Option) -> T { + match input { + Some(t) => t, + None => panic!("nothing to see at {:?}, move along", caller_location!()), // <-- + } +} +fn main() { + println!("args[1] = {}", my_unwrap(args().nth(1))); + println!("args[2] = {}", my_unwrap(args().nth(2))); + println!("args[3] = {}", my_unwrap(args().nth(3))); +} +``` + +Now we do get the caller locations, but the location in `my_unwrap` is also shown, which looks very +strange. + +```text +$ ./3 +thread 'main' panicked at 'nothing to see at ("3.rs", 12, 29), move along', 3.rs:8:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./3 arg1 +args[1] = arg1 +thread 'main' panicked at 'nothing to see at ("3.rs", 13, 29), move along', 3.rs:8:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./3 arg1 arg2 +args[1] = arg1 +args[2] = arg2 +thread 'main' panicked at 'nothing to see at ("3.rs", 14, 29), move along', 3.rs:8:16 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./3 arg1 arg2 arg3 +args[1] = arg1 +args[2] = arg2 +args[3] = arg3 +``` + +There is a more specialized macro, `panic_at_source_location!`, which allows you to customize where +the panic occurs. Now we have truly reproduced how the built-in `unwrap()` is implemented. + +```rust +// 4.rs +#![feature(inline_semantic)] +use std::env::args; +#[inline(semantic)] +pub fn my_unwrap(input: Option) -> T { + match input { + Some(t) => t, + None => panic_at_source_location!(caller_location!(), "nothing to see here, move along"), // <-- + } +} +fn main() { + println!("args[1] = {}", my_unwrap(args().nth(1))); + println!("args[2] = {}", my_unwrap(args().nth(2))); + println!("args[3] = {}", my_unwrap(args().nth(3))); +} +``` + +```text +$ ./4 +thread 'main' panicked at 'nothing to see here, move along', 4.rs:12:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./4 arg1 +args[1] = arg1 +thread 'main' panicked at 'nothing to see here, move along', 4.rs:13:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./4 arg1 arg2 +args[1] = arg1 +args[2] = arg2 +thread 'main' panicked at 'nothing to see here, move along', 4.rs:14:29 +note: Run with `RUST_BACKTRACE=1` for a backtrace. + +$ ./4 arg1 arg2 arg3 +args[1] = arg1 +args[2] = arg2 +args[3] = arg3 +``` + +## Why do we use semantic-inlining + +If you are learning Rust alongside other languages, you may wonder why Rust obtain the caller +information in such a strange way. There are two restrictions that forces us to adapt this solution: + +1. Programmatic access to the stack backtrace is often used in interpreted or runtime-heavy + languages like Python and Java. However, the stack backtrace is not suitable for systems + languages like Rust, because optimization often collapses multiple levels of function calls, and + in some embedded system the backtrace may even be unavailable. + +2. Rust does not (yet) support default function arguments or function overloading, because it badly + interferes with type inference. Therefore, solutions that uses default function arguments are + also ruled out. Default-function-arguments-based solutions are often used in languages that does + not perform inference higher than statement level, e.g. Swift and C#. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Survey of panicking standard functions + +Many standard functions may panic. These are divided into three categories, on whether they should +receive caller information despite the inlining cost associated with it. + +(The list of functions is not exhaustive. Only those with a "Panics" section in the documentation +are included.) + +1. **Must have.** These functions are designed to generate a panic, or used so often that indicating + a panic happening from them often gives no useful information. + + | Function | Panic condition | + |:---------|:----------------| + | `Option::expect` | self is None | + | `Option::unwrap` | self is None | + | `Result::expect_err` | self is Ok | + | `Result::expect` | self is Err | + | `Result::unwrap_err` | self is Ok | + | `Result::unwrap` | self is Err | + | `[T]::index_mut` | range out of bounds | + | `[T]::index` | range out of bounds | + | `BTreeMap::index` | key not found | + | `HashMap::index` | key not found | + | `str::index_mut` | range out of bounds or off char boundary | + | `str::index` | range out of bounds or off char boundary | + | `VecDeque::index_mut` | index out of bounds | + | `VecDeque::index` | index out of bounds | + +2. **Nice to have.** These functions are not commonly used, or the panicking condition is pretty + rare. Often the panic information contains enough clue to fix the error without backtrace. + Inlining them would bloat the binary size without much benefit. + +
List of category 2 functions + + | Function | Panic condition | + |:---------|:----------------| + | `std::env::args` | non UTF-8 values | + | `std::env::set_var` | invalid key or value | + | `std::env::vars` | non UTF-8 values | + | `std::thread::spawn` | OS failed to create the thread | + | `[T]::clone_from_slice` | slice lengths differ | + | `[T]::copy_from_slice` | slice lengths differ | + | `[T]::rotate` | index out of bounds | + | `[T]::split_at_mut` | index out of bounds | + | `[T]::swap` | index out of bounds + | `BinaryHeap::reserve_exact` | capacity overflow | + | `BinaryHeap::reserve` | capacity overflow | + | `Duration::new` | arithmetic overflow | + | `HashMap::reserve` | capacity overflow | + | `HashSet::reserve` | capacity overflow | + | `i32::overflowing_div` | zero divisor | + | `i32::overflowing_rem` | zero divisor | + | `i32::wrapping_div` | zero divisor | + | `i32::wrapping_rem` | zero divisor | + | `Instance::duration_since` | time travel | + | `Instance::elapsed` | time travel | + | `Iterator::count` | extremely long iterator | + | `Iterator::enumerate` | extremely long iterator | + | `Iterator::position` | extremely long iterator | + | `Iterator::product` | arithmetic overflow in debug build | + | `Iterator::sum` | arithmetic overflow in debug build | + | `LinkedList::split_off` | index out of bounds | + | `LocalKey::with` | TLS has been destroyed | + | `RawVec::double_in_place` | capacity overflow | + | `RawVec::double` | capacity overflow | + | `RawVec::reserve_exact` | capacity overflow | + | `RawVec::reserve_in_place` | capacity overflow | + | `RawVec::reserve` | capacity overflow | + | `RawVec::shrink_to_fit` | given amount is larger than current capacity | + | `RawVec::with_capacity` | capacity overflow | + | `RefCell::borrow_mut` | a borrow or mutable borrow is active | + | `RefCell::borrow` | a mutable borrow is active | + | `str::split_at_mut` | range out of bounds or off char boundary | + | `str::split_at` | range out of bounds or off char boundary | + | `String::drain` | range out of bounds or off char boundary | + | `String::insert_str` | index out of bounds or off char boundary | + | `String::insert` | index out of bounds or off char boundary | + | `String::remove` | index out of bounds or off char boundary | + | `String::reserve_exact` | capacity overflow | + | `String::reserve` | capacity overflow | + | `String::splice` | range out of bounds or off char boundary | + | `String::split_off` | index out of bounds or off char boundary | + | `String::truncate` | off char boundary | + | `Vec::append` | capacity overflow | + | `Vec::drain` | range out of bounds | + | `Vec::insert` | index out of bounds | + | `Vec::push` | capacity overflow | + | `Vec::remove` | index out of bounds | + | `Vec::reserve_exact` | capacity overflow | + | `Vec::reserve` | capacity overflow | + | `Vec::splice` | range out of bounds | + | `Vec::split_off` | index out of bounds | + | `Vec::swap_remove` | index out of bounds | + | `VecDeque::append` | capacity overflow | + | `VecDeque::drain` | range out of bounds | + | `VecDeque::insert` | index out of bounds | + | `VecDeque::reserve_exact` | capacity overflow | + | `VecDeque::reserve` | capacity overflow | + | `VecDeque::split_off` | index out of bounds | + | `VecDeque::swap` | index out of bounds | + | `VecDeque::with_capacity` | capacity overflow | + +
+ +3. **Not needed.** Panics from these indicate silly programmer error and the panic itself have + enough clue to let programmers figure out where did the error comes from. + +
List of category 3 functions + + | Function | Panic condition | + |:---------|:----------------| + | `std::atomic::fence` | using invalid atomic ordering | + | `std::char::from_digit` | radix is outside `2 ..= 36` | + | `std::env::remove_var` | invalid key | + | `std::format!` | the `fmt` method returns Err | + | `std::panicking::set_hook` | called in panicking thread | + | `std::panicking::take_hook` | called in panicking thread | + | `[T]::chunks_mut` | chunk size == 0 | + | `[T]::chunks` | chunk size == 0 | + | `[T]::windows` | window size == 0 | + | `AtomicUsize::compare_exchange_weak` | using invalid atomic ordering | + | `AtomicUsize::compare_exchange` | using invalid atomic ordering | + | `AtomicUsize::load` | using invalid atomic ordering | + | `AtomicUsize::store` | using invalid atomic ordering | + | `BorrowRef::clone` | borrow counter overflows, see [issue 33880] | + | `BTreeMap::range_mut` | end of range before start of range | + | `BTreeMap::range` | end of range before start of range | + | `char::encode_utf16` | dst buffer smaller than `[u16; 2]` | + | `char::encode_utf8` | dst buffer smaller than `[u8; 4]` | + | `char::is_digit` | radix is outside `2 ..= 36` | + | `char::to_digit` | radix is outside `2 ..= 36` | + | `compiler_fence` | using invalid atomic ordering | + | `Condvar::wait` | waiting on multiple different mutexes | + | `Display::to_string` | the `fmt` method returns Err | + | `ExactSizeIterator::len` | size_hint implemented incorrectly | + | `i32::from_str_radix` | radix is outside `2 ..= 36` | + | `Iterator::step_by` | step == 0 | + +
+ +This RFC only advocates adding the `#[inline(semantic)]` attribute to the `unwrap` and `expect` +functions. The `index` and `index_mut` functions should also have it if possible, but currently +blocked by lack of post-monomorphized MIR pass. + +## Semantic-inlining MIR pass + +A new inline level `#[inline(semantic)]` is introduced. This is like `#[inline(always)]`, but is +guaranteed that inlining happens before LLVM kicks in. With this guarantee, Rust can maintain a +deterministic set of situation where an `#[inline(semantic)]` function can know the information of +its caller or compile-time call stack. + +`#[inline(semantic)]` functions must not be recursive, as direct inlining is guaranteed as much as +possible, i.e. the following would be a compile-time error: + +```rust +#[inline(semantic)] +fn factorial(x: u64) -> u64 { + if x <= 1 { + 1 + } else { + x * factorial(x - 1) //~ ERROR: `#[inline(semantic)]` function cannot call itself. + } +} +``` + +Currently semantic-inlining is performed as a MIR pass. Since MIR pass are run *before* +monomorphization, `#[inline(semantic)]` currently **cannot** be used on trait items: + +```rust +trait Trait { + fn unwrap(&self); +} +impl Trait for u64 { + #[inline(semantic)] //~ ERROR: `#[inline(semantic)]` is not supported for trait items yet. + fn unwrap(&self) {} +} +``` + +To support trait items, the semantic-inlining pass must be run as post-monomorphized MIR pass (which +does not exist yet), or a custom LLVM inlining pass which can extract the caller's source location. +This prevents the `Index` trait from having `#[inline(semantic)]` yet. + +We cannot hack the impl resolution method into pre-monomorphization MIR pass because of deeply +nested functions like + +```rust +f1::(); + +fn f1() { f2::(); } +fn f2() { f3::(); } +fn f3() { f4::(); } +... +fn f100() { + T::unwrap(); // No one will know T is u32 before monomophization. +} +``` + +`rustc` already has a MIR inlining pass, which is disabled by default. The semantic-inlining pass +should be run before or concurrently with the MIR inlining pass. The semantic-inlining pass is +currently as a separate pass from the normal MIR inlining pass, so that they can become +post-monomorphized MIR passes independently. If the semantic-inlining pass is run after the normal +MIR inlining pass, the normal MIR inliner must treat `#[inline(semantic)]` as `#[inline(never)]`. + +`#[inline(semantic)]` functions may stay uninlined. This happens when one takes the address of such +functions. This must be allowed due to backward compatibility. (If post-monomorphized MIR pass +exists, methods via trait objects would be another case of calling `#[inline(semantic)]` functions +without it being inline.) + +```rust +let f: fn(Option) -> u32 = Option::unwrap; +let g: fn(Option) -> u32 = Option::unwrap; +assert!(f == g); // This must remain `true`. +f(None); +g(None); // The effect of these two calls must be the same. +``` + +## Caller location lang-item + +Once a function is semantically-inlined, the content of the function will know the span about the +original call site, which can be used to derive the filename, line, column, mod-path, etc. We would +define the following lang items to retrieve these information: + +```rust +pub mod caller { + #[lang = "caller_file"] + pub const FILE: &str = ""; + #[lang = "caller_line"] + pub const LINE: u32 = 0; + #[lang = "caller_file"] + pub const COLUMN: u32 = 0; +} +``` + +After the semantic-inlining pass, these constants will be replaced by the corresponding call site +information: + +```rust +#[inline(semantic)] +fn line() -> u32 { core::caller::LINE } +assert_eq!(3, line()); +assert_eq!(4, line()); +assert_eq!(5, line()); +``` + +These constants only make sense when used directly inside an `#[inline(semantic)]` function. This +excludes closures and const/static items inside the function, since they are represented as separate +MIR items. + +```rust +use core::caller::LINE; + +const L: u32 = LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function + +#[inline(always)] +fn not_inline_semantic() -> u32 { + LINE //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function +} + +#[inline(semantic)] +fn inline_semantic() -> u32 { + const L: u32 = LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function + let closure = || LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function + LINE // (only this one is ok.) +} +``` + +Const-folding pass before the semantic-inliner must recognize these special constants and *not* fold +them. + +These values are provided as `const` lang items to make MIR transformation as simple as possible. +They could be intrinsic functions, but it would mean changing the `Call` terminator in MIR into an +`Assign` statement followed by a `Goto`, which is a bit complex. + +## Source-location-forwarding methods + +`unwrap()`/`expect()` will become `#[inline(semantic)]`. However if they are called deep inside from +the standard library, the source location would still be useless. Therefore, this RFC also propose +the following new functions to allow caller information be forwarded: + +```rust +pub macro caller_location() { + &($crate::caller::FILE, $crate::caller::LINE, $crate::caller::COLUMN) +} + +pub macro panic_at_source_location { + ($location:expr) => { ... }, + ($location:expr, $fmt:expr) => { ... }, + ($location:expr, $fmt:expr, $($args:tt)*) => { ... }, +} + +impl Option { + pub fn unwrap_at_source_location(self, location: &(&'static str, u32, u32)) -> T; + pub fn expect_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> T; +} + +impl Result { + pub fn unwrap_at_source_location(self, location: &(&'static str, u32, u32)) -> T; + pub fn expect_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> T; + pub fn unwrap_err_at_source_location(self, location: &(&'static str, u32, u32)) -> E; + pub fn expect_err_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> E; +} +``` + +Example: + +```rust +impl<'a, K, Q, V> Index<&'a Q> for BTreeMap +where + K: Ord + Borrow, + Q: Ord + ?Sized, +{ + type Output = V; + + #[inline(semantic)] + fn index(&self, key: &Q) -> &V { + self.get(key).expect_at_source_location(caller_location!(), "no entry found for key") + } +} +``` + +Long names are given since they are considered advanced functions and should not be used normally. + +## Runtime-free backtrace for `?` operator + +The standard `Try` implementations could participate in specialization, so external crates like +`error_chain` could provide a specialized impl that prepends the caller location into the callstack +everytime `?` is used. + +```rust +// libcore: +impl Try for Result { + type Ok = T; + type Error = E; + + fn into_result(self) -> Self { self } + fn from_ok(v: Self::Ok) -> Self { Ok(v) } + default fn from_error(e: Self::Error) -> Self { Err(e) } +// ^~~~~~~ +} + +// my_crate: +impl Try for Result { + #[inline(semantic)] + fn from_error(mut e: Self::Error) -> Self { + e.call_stack.push(*caller_location!()); + Err(e) + } +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +## Code bloat + +Previously, all call to `unwrap()` and `expect()` will refer to the same location. Therefore, the panicking branch will +only need to reuse a pointer to a single global tuple. + +After this RFC is implemented, the panicking branch will need to allocate space to store the varying caller location, +so the number of instructions per `unwrap()`/`expect()` will increase. + +Also the optimizer will lose the opportunity to consolidate all jumps to the panicking branch, e.g. with the code +`a.unwrap() + b.unwrap()`, before this RFC LLVM will optimize it to something like + +```rust +if (a.tag != SOME || b.tag != SOME) { + panic(&("called `Option::unwrap()` on a `None` value", "src/libcore/option.rs", 335, 20)); +} +a.value_of_some + b.value_of_some +``` + +After this RFC, LLVM can only lower this to + +```rust +if (a.tag != SOME) { + panic(&("called `Option::unwrap()` on a `None` value", "1.rs", 1, 1)); +} +if (b.tag != SOME) { + panic(&("called `Option::unwrap()` on a `None` value", "1.rs", 1, 14)); +} +a.value_of_some + b.value_of_some +``` + +## Does not support dynamic call + +`#[inline(semantic)]` relies on compile-time inlining to resolve the caller information. This makes it impossible to +work with dynamic function calls (i.e. via a function pointer or trait object v-table). + +## Narrow solution scope + +`#[inline(semantic)]` is only useful in solving the "get caller location" problem. Introducing an entirely new feature +just for this problem seems wasteful. + +[Default function arguments](#default-function-arguments) is another possible solution for this problem but with much +wider application. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +## Rationale + +This RFC tries to abide by the following restrictions: + +1. **Precise caller location**. Standard library functions which commonly panics will report the + source location at where the user call them. The source location should never point inside the + standard library. Examples of these functions include `Option::unwrap` and `HashMap::index`. + +2. **Source compatibility**. User should never need to modify existing source code to benefit from + the improved precision. + +3. **Debug-info independence**. The precise caller location can still be reported even after + stripping of debug information, which is very common on released software. + +4. **Interface independence**. The implementation of a trait should be able to decide whether to + accept a the caller information. It shouldn't require the trait itself to enforce it, i.e. it + should not affect the signature of the function. This is an extension of rule 2, since the + `Index` trait is involved in `HashMap::index`. The stability of `Index` must be upheld, e.g. it + should remain object-safe, and existing implementions should not be forced to accept the caller + location. + +Restriction 4 "interface independence" is currently not implemented due to lack of +post-monomorphized MIR pass, but `#[inline(semantic)]` itself as a language feature follows this +restriction. + +## Alternatives + +### 🚲 Name of everything 🚲 + +* Is `#[inline(semantic)]` an accurate description? `#[inline(mir)]`? `#[inline(force)]`? + `#[inline(with_caller_location)]`? +* Use a different attribute, instead of piggybacking on `#[inline]`? +* `***_at_source_location` is too long? +* Should we move `std::panic::Location` into `core`, and not use a 3-tuple to represent the + location? + +### Avoid introducing new public items + +* We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to + real constants after semantic-inlining (a MIR pass) instead of early during macro expansion (an + AST pass). Inside `#[inline(semantic)]` functions, these macros behave as this RFC's + `core::caller::{FILE, LINE, COLUMN}`. This way, we don't need to introduce + `panic_at_source_location!()`. The drawback is losing explicit control of whether we want to + report the actual location or the caller's location. + +* We could make `#[inline(always)]` mean the same as `#[inline(semantic)]` to avoid introducing a + new inline level. + +### Inline MIR + +Introduced as [alternative to RFC 1669][inline_mir], instead of the `caller::{FILE, LINE, COLUMN}` +lang-items, we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: + +```rust +#[inline(semantic)] +fn unwrap(self) -> T { + let file: &'static str; + let line: u32; + let column: u32; + unsafe { + mir! { + StorageLive(file); + file = const $CallerFile; + StorageLive(line); + line = const $CallerLine; + StorageLive(column); + column = const $CallerColumn; + goto -> 'c; + } + } + 'c: { + panic!("{}:{}:{}: oh no", file, line, column); + } +} +``` + +The problem of `mir!` in this context is trying to kill a fly with a sledgehammer. `mir!` is a very +generic mechanism, which requiring stabilizing the MIR syntax and considering the interaction with +the surrounding code. Besides, `#[inline(semantic)]` itself still exists and the magic constants +`$CallerFile` etc are still magic. + +### Default function arguments + +Assume this is solved by implementing [RFC issue 323]. + +```rust +fn unwrap(file: &'static str = file!(), line: u32 = line!(), column: u32 = column!()) -> T { + panic!("{}:{}:{}: oh no", file, line, column); +} +``` + +Default arguments was a serious contender to the better-caller-location problem as this is usually +how other languages solve it. + +| Language | Syntax | +|:---------|:-------| +| [Swift] | `func unwrap(file: String = #file, line: Int = #line) -> T` | +| [D] | `T unwrap(string file = __FILE__, size_t line = __LINE__)` | +| [C#] 5+ | `T Unwrap([CallerFilePath] string file = "", [CallerLineNumber] int line = 0)` | +| [Haskell] with GHC | `unwrap :: (?callstack :: CallStack) => Maybe t -> t` | +| [C++] with GCC 4.8+ | `T unwrap(const char* file = __builtin_FILE(), int line = __builtin_LINE())` | + +A naive solution will violate restriction 4 "interface independence": adding the `file, line, column` +arguments to `index()` will change its signature. This can be resolved if this is taken into +account. + +```rust +impl<'a, K, Q, V> Index<&'a Q> for BTreeMap +where + K: Ord + Borrow, + Q: Ord + ?Sized, +{ + type Output = V; + + // This should satisfy the trait even if the trait specifies + // `fn index(&self, idx: Idx) -> &Self::Output` + #[inline] + fn index(&self, key: &Q, file: &'static str = file!(), line: u32 = line!(), column: u32 = column!()) -> &V { + self.get(key).expect("no entry found for key", file, line, column) + } +} +``` + +This can be resolved if the future default argument proposal takes this into account. But again, +this feature itself is going to be large and controversial. + +## Non-viable alternatives + +Many alternatives have been proposed before but failed to satisfy the restrictions laid out in the +[Rationale](#rationale) subsection, thus should *not* be considered viable alternatives within this +RFC. + +### Macros + +The `unwrap!()` macro introduced in [RFC 1669] allows the user to write `unwrap!(x)` instead of +`x.unwrap()`. + +Similar solution is introducing a `loc!()` macro that expands to +`concat!(file!(), ":", line!(), ":", column!())`, so user writes `x.expect(loc!())` instead of +`x.unwrap()`. + +There is even the [`better_unwrap` crate](https://github.com/abonander/better_unwraps) that +automatically rewrites all `unwrap()` and `expect()` inside a module to provide the caller location +through a procedural attribute. + +All of these are non-viable, since they require the user to actively change their source code, thus +violating restriction 2 "source compatibility", ~~unless we are willing to drop the `!` from +macros~~. + +(Also, all pre-typeck rewrites are prone to false-positive affecting unrelated types that have an +`unwrap()` method. Post-typeck rewrites are no different from this RFC.) + +### Backtrace + +When given debug information (DWARF section on Linux, `*.pdb` file on Windows, `*.dSYM` folder on +macOS), the program is able to obtain the source code location for each address. This solution is +often used in runtime-heavy languages like Python, Java and [Go]. + +For Rust, however: + +* The debug information is usually not provided in release mode. +* Normal inlining will make the caller information inaccurate. +* Issues [24346] (*Backtrace does not include file and line number on non-Linux platforms*) and + [42295] (*Slow backtrace on panic*) and are still not entirely fixed. +* Backtrace may be disabled in resource-constrained environment. + +These drawbacks are the main reason why restriction 3 "debug-info independence" is added to the +motivation. + +### `SourceContext` generic parameter + +Introduced as an [alternative in RFC 1669][source_context], inspired by GHC's implicit parameter: + +```rust +fn unwrap(self) -> T { + panic!("{}: oh no", C::default()); +} +``` + +The `CallerSourceContext` lang item will instruct the compiler to create a new type implementing +`SourceContext` whenever `unwrap()` is instantiated. + +Unfortunately this violates restriction 4 "interface independence". This solution cannot apply to +`HashMap::index` as this will require a change of the method signature of `index()` which has been +stabilized. Methods applying this solution will also lose object-safety. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Semantic-inlining should run after monomorphization, not before. + +[RFC 1669]: https://github.com/rust-lang/rfcs/pull/1669 +[24346]: https://github.com/rust-lang/rust/issues/24346 +[42295]: https://github.com/rust-lang/rust/issues/42295 +[issue 33880]: https://github.com/rust-lang/rust/issues/33880 +[RFC issue 1744]: https://github.com/rust-lang/rfcs/issues/1744 +[RFC issue 323]: https://github.com/rust-lang/rfcs/issues/323 + +[a]: https://internals.rust-lang.org/t/rfrfc-better-option-result-error-messages/2904 +[b]: https://internals.rust-lang.org/t/line-info-for-unwrap-expect/3753 +[c]: https://internals.rust-lang.org/t/better-panic-location-reporting-for-unwrap-and-friends/5042 + +[source_context]: https://github.com/rust-lang/rfcs/pull/1669#issuecomment-231896669 +[inline_mir]: https://github.com/rust-lang/rfcs/pull/1669#issuecomment-231031865 +[Swift]: https://developer.apple.com/swift/blog/?id=15 +[D]: https://dlang.org/spec/traits.html#specialkeywords +[C#]: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/caller-information +[Haskell]: https://ghc.haskell.org/trac/ghc/wiki/ExplicitCallStack/ImplicitLocations +[Go]: https://golang.org/pkg/runtime/#Caller +[C++]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fLINE + +[inlining]: https://en.wikipedia.org/wiki/Inline_expansion \ No newline at end of file From c12648928c0a1457c41d6a47a80dcac20d24762b Mon Sep 17 00:00:00 2001 From: kennytm Date: Wed, 2 Aug 2017 18:37:11 +0800 Subject: [PATCH 02/12] Addressed some minor changes. --- text/0000-inline-semantic.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index d39d1d65c73..f4d1c1b4250 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -30,7 +30,7 @@ let n: Option = None; let m = n.unwrap(); ``` - + - [Summary](#summary) - [Motivation](#motivation) @@ -52,7 +52,7 @@ let m = n.unwrap(); - [Rationale and alternatives](#rationale-and-alternatives) - [Rationale](#rationale) - [Alternatives](#alternatives) - - [🚲 Name of everything 🚲](#🚲-name-of-everything-🚲) + - [🚲 Name of everything 🚲](#-name-of-everything-) - [Avoid introducing new public items](#avoid-introducing-new-public-items) - [Inline MIR](#inline-mir) - [Default function arguments](#default-function-arguments) @@ -364,7 +364,7 @@ args[3] = arg3 ## Why do we use semantic-inlining -If you are learning Rust alongside other languages, you may wonder why Rust obtain the caller +If you are learning Rust alongside other languages, you may wonder why Rust obtains the caller information in such a strange way. There are two restrictions that forces us to adapt this solution: 1. Programmatic access to the stack backtrace is often used in interpreted or runtime-heavy @@ -813,7 +813,8 @@ restriction. * Use a different attribute, instead of piggybacking on `#[inline]`? * `***_at_source_location` is too long? * Should we move `std::panic::Location` into `core`, and not use a 3-tuple to represent the - location? + location? Note that this is also advocated in [RFC 2070]. +* Use intrinsics or static instead of consts for `core::caller::{FILE, LINE, COLUMN}`? ### Avoid introducing new public items @@ -824,6 +825,9 @@ restriction. `panic_at_source_location!()`. The drawback is losing explicit control of whether we want to report the actual location or the caller's location. +* Same as above, but `file!()` etc are converted to a special literal kind, a kind that only the + compiler can create. + * We could make `#[inline(always)]` mean the same as `#[inline(semantic)]` to avoid introducing a new inline level. @@ -976,6 +980,7 @@ stabilized. Methods applying this solution will also lose object-safety. [issue 33880]: https://github.com/rust-lang/rust/issues/33880 [RFC issue 1744]: https://github.com/rust-lang/rfcs/issues/1744 [RFC issue 323]: https://github.com/rust-lang/rfcs/issues/323 +[RFC 2070]: https://github.com/rust-lang/rfcs/pull/2070 [a]: https://internals.rust-lang.org/t/rfrfc-better-option-result-error-messages/2904 [b]: https://internals.rust-lang.org/t/line-info-for-unwrap-expect/3753 From a4828f62f7651d85a535b9c0b03c7ecf473d9590 Mon Sep 17 00:00:00 2001 From: kennytm Date: Tue, 8 Aug 2017 08:42:31 +0800 Subject: [PATCH 03/12] Changed to how the attribute is transformed. --- text/0000-inline-semantic.md | 595 +++++++++++++++++------------------ 1 file changed, 293 insertions(+), 302 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index f4d1c1b4250..59cf2f74655 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -1,4 +1,4 @@ -- Feature Name: `inline_semantic`, `caller_location` +- Feature Name: `implicit_caller_location` - Start Date: 2017-07-31 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -11,19 +11,20 @@ Enable accurate caller location reporting during panic in `{Option, Result}::{unwrap, expect}` with the following changes: -1. Support the `#[inline(semantic)]` function attribute, which guarantees a function is inlined - before reaching LLVM. -2. Adds lang-item consts which retrieves the caller's source location. +1. Support the `#[implicit_caller_location]` function attribute, which guarantees a function has + access to the caller information. +2. Adds an intrinsic function `caller_location()` (safe wrapper: `Location::caller()`) to retrieve + the caller's source location. Example: ```rust -#![feature(inline_semantic, caller_location)] -use core::caller; +#![feature(implicit_caller_location)] +use std::panic::Location; -#[inline(semantic)] +#[implicit_caller_location] fn unwrap(self) -> T { - panic!("{}:{}:{}: oh no", caller::FILE, caller::LINE, caller::COLUMN); + panic!("{}: oh no", Location::caller()); } let n: Option = None; @@ -36,26 +37,29 @@ let m = n.unwrap(); - [Motivation](#motivation) - [Guide-level explanation](#guide-level-explanation) - [Let's reimplement `unwrap()`](#lets-reimplement-unwrap) - - [Semantic-inlining](#semantic-inlining) - - [Caller location](#caller-location) - - [Why do we use semantic-inlining](#why-do-we-use-semantic-inlining) + - [Adding caller location implicitly](#adding-caller-location-implicitly) + - [Location type](#location-type) + - [Why do we use implicit caller location](#why-do-we-use-implicit-caller-location) - [Reference-level explanation](#reference-level-explanation) - [Survey of panicking standard functions](#survey-of-panicking-standard-functions) - - [Semantic-inlining MIR pass](#semantic-inlining-mir-pass) - - [Caller location lang-item](#caller-location-lang-item) - - [Source-location-forwarding methods](#source-location-forwarding-methods) + - [Procedural attribute macro](#procedural-attribute-macro) + - [Redirection (MIR inlining)](#redirection-mir-inlining) + - [Standard libraries](#standard-libraries) - [Runtime-free backtrace for `?` operator](#runtime-free-backtrace-for--operator) + - [Location detail control](#location-detail-control) - [Drawbacks](#drawbacks) - [Code bloat](#code-bloat) - - [Does not support dynamic call](#does-not-support-dynamic-call) - [Narrow solution scope](#narrow-solution-scope) + - [Confusing scoping rule](#confusing-scoping-rule) - [Rationale and alternatives](#rationale-and-alternatives) - [Rationale](#rationale) - [Alternatives](#alternatives) - [🚲 Name of everything 🚲](#-name-of-everything-) - - [Avoid introducing new public items](#avoid-introducing-new-public-items) + - [Using an ABI instead of an attribute](#using-an-abi-instead-of-an-attribute) + - [Repurposing `file!()`, `line!()`, `column!()`](#repurposing-file-line-column) - [Inline MIR](#inline-mir) - [Default function arguments](#default-function-arguments) + - [Semantic inlining](#semantic-inlining) - [Non-viable alternatives](#non-viable-alternatives) - [Macros](#macros) - [Backtrace](#backtrace) @@ -174,12 +178,16 @@ slightly more ergonomic solution would be changing `my_unwrap` to a macro, thus be automatically provided. ```rust +pub fn my_unwrap_at_source_location(input: Option, file: &str, line: u32, column: u32) -> T { + match input { + Some(t) => t, + None => panic!("nothing to see at {}:{}:{}, move along", file, line, column), + } +} + macro_rules! my_unwrap { ($input:expr) => { - match $input { - Some(t) => t, - None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), - } + my_unwrap_at_source_location($input, file!(), line!(), column!()) } } println!("args[1] = {}", my_unwrap!(args().nth(1))); @@ -190,7 +198,7 @@ println!("args[1] = {}", my_unwrap!(args().nth(1))); But what if you have already published the `my_unwrap` crate that has thousands of users, and you want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` has the same problem! -## Semantic-inlining +## Adding caller location implicitly The reason a `my_unwrap!` macro works is because it copy-and-paste the entire content of its macro definition everytime it is used. @@ -202,89 +210,23 @@ println!("args[2] = {}", my_unwrap!(args().nth(2))); // is equivalent to: -println!("args[1] = {}", match args().nth(1) { - Some(t) => t, - None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), -}); -println!("args[2] = {}", match args().nth(2) { - Some(t) => t, - None => panic!("nothing to see at {}:{}:{}, move along", file!(), line!(), column!()), -}); +println!("args[1] = {}", my_unwrap(args().nth(1), file!(), line!(), column!())); +println!("args[1] = {}", my_unwrap(args().nth(2), file!(), line!(), column!())); ... ``` -What if we allow *normal functions* be able to copy-and-paste its content as well? This is called -**[inlining]**, which Rust and many other languages have supported this as an optimization step. If -the function is inlined, the compiler can also tell the inlined function the location it is inlined, -and thus able to report this to the user. - -Rust allows developers to use the `#[inline]` attribute to *hint* the optimizer that a function -should be inlined. However, if we want the precise caller location, a hint is not enough, it needs -to be a requirement. Therefore, the `#[inline(semantic)]` attribute is introduced. - -```rust -#![feature(inline_semantic)] - -#[inline(semantic)] // <-- new -pub fn my_unwrap(input: Option) -> T { - match input { - Some(t) => t, - None => panic!("nothing to see here, move along"), - } -} - -println!("args[1] = {}", my_unwrap(args().nth(1))); -println!("args[2] = {}", my_unwrap(args().nth(2))); - -// almost equivalent to: - -println!("args[1] = {}", match args().nth(1) { - Some(t) => t, - None => panic!("nothing to see here, move along"), -}); -println!("args[2] = {}", match args().nth(2) { - Some(t) => t, - None => panic!("nothing to see here, move along"), -}); -``` - -If you try this code above, you will find that the panic still occurs inside `my_unwrap`, not at the -caller's position. `#[inline(semantic)]` is still only a very-insistent inliner, so its behavior is -still like a normal function — the syntactic location of the `panic!` is still inside the `my_wrap`. -The Rust language provides a different way to obtain the caller location after the function is -inlined. - -## Caller location - -The core crate provides three magic constants `core::caller::{FILE, LINE, COLUMN}` which resolves to -the caller's location one the function is inlined. - -```rust -#![feature(inline_semantic, caller_location)] - -extern crate core; - -#[inline_semantic] -pub fn get_caller_loc() -> (&'static str, u32) { - (core::caller::FILE, core::caller::LINE) -} - -assert_eq!(get_caller_loc(), (file!(), line!())); -assert_eq!(get_caller_loc(), (file!(), line!())); -assert_eq!(get_caller_loc(), (file!(), line!())); -``` - -There is also a `caller_location!()` macro to return all three information as a single tuple. +What if we can instruct the compiler to automatically fill in the file-line-column? Rust supported +this starting from 1.YY with the `#[implicit_caller_location]` attribute: ```rust // 3.rs -#![feature(inline_semantic)] +#![feature(implicit_caller_location)] use std::env::args; -#[inline(semantic)] // <-- +#[implicit_caller_location] // <-- Just add this! pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, - None => panic!("nothing to see at {:?}, move along", caller_location!()), // <-- + None => panic!("nothing to see here, move along"), } } fn main() { @@ -294,23 +236,22 @@ fn main() { } ``` -Now we do get the caller locations, but the location in `my_unwrap` is also shown, which looks very -strange. +Now we magically have truly reproduced how the built-in `unwrap()` is implemented. ```text $ ./3 -thread 'main' panicked at 'nothing to see at ("3.rs", 12, 29), move along', 3.rs:8:16 +thread 'main' panicked at 'nothing to see here, move along', 3.rs:12:29 note: Run with `RUST_BACKTRACE=1` for a backtrace. $ ./3 arg1 args[1] = arg1 -thread 'main' panicked at 'nothing to see at ("3.rs", 13, 29), move along', 3.rs:8:16 +thread 'main' panicked at 'nothing to see here, move along', 3.rs:13:29 note: Run with `RUST_BACKTRACE=1` for a backtrace. $ ./3 arg1 arg2 args[1] = arg1 args[2] = arg2 -thread 'main' panicked at 'nothing to see at ("3.rs", 14, 29), move along', 3.rs:8:16 +thread 'main' panicked at 'nothing to see here, move along', 3.rs:14:29 note: Run with `RUST_BACKTRACE=1` for a backtrace. $ ./3 arg1 arg2 arg3 @@ -319,63 +260,49 @@ args[2] = arg2 args[3] = arg3 ``` -There is a more specialized macro, `panic_at_source_location!`, which allows you to customize where -the panic occurs. Now we have truly reproduced how the built-in `unwrap()` is implemented. +What `#[implicit_caller_location]` does is an automated version of what you've seen last section. +The attribute will copy `my_unwrap` into a new function `my_unwrap_at_source_location`, which will +accept the caller's location as an additional argument. The attribute also instructs the compiler +that, "whenever you see someone calls `my_unwrap(x)`, replace it with +`my_unwrap_at_source_location(x, file!(), line!(), column!())` instead!". This allows us to maintain +the stability guarantee, while allowing user to get the new behavior with just one recompile. + +## Location type + +Let's enhance `my_unwrap`, say, we also log a message to the log file before panicking? We would +need to get the caller's location as a value. This is supported using the method +`Location::caller()`: ```rust -// 4.rs -#![feature(inline_semantic)] -use std::env::args; -#[inline(semantic)] +use std::panic::Location; +#[implicit_caller_location] pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, - None => panic_at_source_location!(caller_location!(), "nothing to see here, move along"), // <-- + None => { + let location = Location::caller(); + println!("unwrapping a None from {}:{}", location.file(), location.line()); + panic!("nothing to see here, move along") + } } } -fn main() { - println!("args[1] = {}", my_unwrap(args().nth(1))); - println!("args[2] = {}", my_unwrap(args().nth(2))); - println!("args[3] = {}", my_unwrap(args().nth(3))); -} ``` -```text -$ ./4 -thread 'main' panicked at 'nothing to see here, move along', 4.rs:12:29 -note: Run with `RUST_BACKTRACE=1` for a backtrace. - -$ ./4 arg1 -args[1] = arg1 -thread 'main' panicked at 'nothing to see here, move along', 4.rs:13:29 -note: Run with `RUST_BACKTRACE=1` for a backtrace. - -$ ./4 arg1 arg2 -args[1] = arg1 -args[2] = arg2 -thread 'main' panicked at 'nothing to see here, move along', 4.rs:14:29 -note: Run with `RUST_BACKTRACE=1` for a backtrace. - -$ ./4 arg1 arg2 arg3 -args[1] = arg1 -args[2] = arg2 -args[3] = arg3 -``` - -## Why do we use semantic-inlining +## Why do we use implicit caller location If you are learning Rust alongside other languages, you may wonder why Rust obtains the caller information in such a strange way. There are two restrictions that forces us to adapt this solution: 1. Programmatic access to the stack backtrace is often used in interpreted or runtime-heavy - languages like Python and Java. However, the stack backtrace is not suitable for systems - languages like Rust, because optimization often collapses multiple levels of function calls, and - in some embedded system the backtrace may even be unavailable. + languages like Python and Java. However, the stack backtrace is not suitable as the only + solution for systems languages like Rust, because optimization often collapses multiple levels + of function calls, and in some embedded system the backtrace may even be unavailable. 2. Rust does not (yet) support default function arguments or function overloading, because it badly - interferes with type inference. Therefore, solutions that uses default function arguments are - also ruled out. Default-function-arguments-based solutions are often used in languages that does - not perform inference higher than statement level, e.g. Swift and C#. + interferes with type inference. Therefore, solutions that uses default function arguments + alongside normal arguments are also ruled out. Default-function-arguments-based solutions are + often used in languages that does not perform inference higher than statement level, e.g. Swift + and C#. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -520,47 +447,118 @@ are included.) -This RFC only advocates adding the `#[inline(semantic)]` attribute to the `unwrap` and `expect` -functions. The `index` and `index_mut` functions should also have it if possible, but currently -blocked by lack of post-monomorphized MIR pass. +This RFC only advocates adding the `#[implicit_caller_location]` attribute to the `unwrap` and +`expect` functions. The `index` and `index_mut` functions should also have it if possible, but +currently blocked by lack of post-monomorphized MIR pass. -## Semantic-inlining MIR pass +## Procedural attribute macro -A new inline level `#[inline(semantic)]` is introduced. This is like `#[inline(always)]`, but is -guaranteed that inlining happens before LLVM kicks in. With this guarantee, Rust can maintain a -deterministic set of situation where an `#[inline(semantic)]` function can know the information of -its caller or compile-time call stack. +The `#[implicit_caller_location]` attribute will modify a function at AST level and MIR level, +without touching type-checking (HIR level) or the low-level LLVM passes. -`#[inline(semantic)]` functions must not be recursive, as direct inlining is guaranteed as much as -possible, i.e. the following would be a compile-time error: +It will first wrap the body of the function in a closure, and then call it: ```rust -#[inline(semantic)] -fn factorial(x: u64) -> u64 { - if x <= 1 { - 1 - } else { - x * factorial(x - 1) //~ ERROR: `#[inline(semantic)]` function cannot call itself. - } +#[implicit_caller_location] +fn foo(x: A, y: B, z: C) -> R { + bar(x, y) +} + +// will become: + +#[rustc_implicit_caller_location] +#[inline] +fn foo(x: A, y: B, z: C) -> R { + std::ops::FnOnce::call_once(move |__location| { + bar(x, y) + }, (unsafe { std::intrinsics::caller_location() },)) } ``` -Currently semantic-inlining is performed as a MIR pass. Since MIR pass are run *before* -monomorphization, `#[inline(semantic)]` currently **cannot** be used on trait items: +The reason for this is to split the function into two: the function `foo` itself, and the closure +`foo::{{closure}}` in it. (Technically: it is the simplest way to create two `DefId` at HIR level as +far as I know.) + +The function signature of `foo` remains unchanged, so typechecking can proceed normally. The +attribute will be replaced by `#[rustc_implicit_caller_location]` to let the compiler internals +continue to treat it specially. `#[inline]` is added so external crate can see through `foo` to find +`foo::{{closure}}`. + +The closure `foo::{{closure}}` is a proper function, that the compiler can write calls directly to +`foo::{{closure}}`, skipping `foo`. Multiple calls to `foo` from different location can be done via +calling `foo::{{closure}}` directly, instead of copying the function body everytime which will bloat +the binary size. + +The intrinsic `caller_location()` is a placeholder which will be replaced by the actual caller +location when one calls `foo::{{closure}}` directly. + +## Redirection (MIR inlining) + +After all type-checking and validation is done, we can now inject the caller location. This is done +by redirecting all calls to `foo` to `foo::{{closure}}`. + +```rust +_r = call foo(_1, _2, _3) -> 'bb1; + +// will become: + +_c = call std::intrinsics::caller_location() -> 'bbt; +'bbt: +_r = call foo::{{closure}} (&[closure: x: _1, y: _2], _c) -> 'bb1; +``` + +Then, we will further replace the `caller_location()` intrinsic according to where `foo` is called. +If it is called from an ordinary function, it would be replaced by the callsite's location: + +```rust +// for ordinary functions, + +_c = call std::intrinsics::caller_location() -> 'bbt; + +// will become: + +_c = Location { file: file!(), line: line!(), column: column!() }; +goto -> 'bbt; +``` + +On the other hand, if it is called from an `#[implicit_caller_location]`'s closure e.g. +`foo::{{closure}}`, the intrinsic will be replaced by the closure argument `__location` instead, so +the caller location can propagate directly + +```rust +// for #[implicit_caller_location] closures, + +_c = call std::intrinsics::caller_location() -> 'bbt; + +// will become: + +_c = __location; +goto -> 'bbt; +``` + +These steps are very similar to inlining, and thus the first proof-of-concept is implemented +directly as a variant of MIR inliner (but a separate pass). This also means the redirection pass +currently suffers from all disadvantages of the MIR inliner, namely: + +* Locations will not be propagated into diverging functions (`fn() -> !`), since inlining them is + not supported yet. + +* MIR passes are run *before* monomorphization, meaning `#[implicit_caller_location]` currently + **cannot** be used on trait items: ```rust trait Trait { fn unwrap(&self); } impl Trait for u64 { - #[inline(semantic)] //~ ERROR: `#[inline(semantic)]` is not supported for trait items yet. + #[implicit_caller_location] //~ ERROR: `#[implicit_caller_location]` is not supported for trait items yet. fn unwrap(&self) {} } ``` -To support trait items, the semantic-inlining pass must be run as post-monomorphized MIR pass (which -does not exist yet), or a custom LLVM inlining pass which can extract the caller's source location. -This prevents the `Index` trait from having `#[inline(semantic)]` yet. +To support trait items, the redirection pass must be run as post-monomorphized MIR pass (which does +not exist yet), or a custom LLVM inlining pass which can extract the caller's source location. This +prevents the `Index` trait from having `#[implicit_caller_location]` yet. We cannot hack the impl resolution method into pre-monomorphization MIR pass because of deeply nested functions like @@ -577,16 +575,17 @@ fn f100() { } ``` -`rustc` already has a MIR inlining pass, which is disabled by default. The semantic-inlining pass -should be run before or concurrently with the MIR inlining pass. The semantic-inlining pass is -currently as a separate pass from the normal MIR inlining pass, so that they can become -post-monomorphized MIR passes independently. If the semantic-inlining pass is run after the normal -MIR inlining pass, the normal MIR inliner must treat `#[inline(semantic)]` as `#[inline(never)]`. +Currently the redirection pass always run before the inlining pass. If the redirection pass is run +after the normal MIR inlining pass, the normal MIR inliner must treat `#[implicit_caller_location]` +as `#[inline(never)]`. -`#[inline(semantic)]` functions may stay uninlined. This happens when one takes the address of such -functions. This must be allowed due to backward compatibility. (If post-monomorphized MIR pass -exists, methods via trait objects would be another case of calling `#[inline(semantic)]` functions -without it being inline.) +The closure `foo::{{closure}}` must never be inlined before the redirection pass. + +When `#[implicit_caller_location]` functions are called dynamically, no inlining will occur, and +thus it cannot take the location of the caller. Currently this will report where the function is +declared. Taking the address of such functions must be allowed due to backward compatibility. (If +post-monomorphized MIR pass exists, methods via trait objects would be another case of calling +`#[implicit_caller_location]` functions without caller location.) ```rust let f: fn(Option) -> u32 = Option::unwrap; @@ -596,111 +595,48 @@ f(None); g(None); // The effect of these two calls must be the same. ``` -## Caller location lang-item - -Once a function is semantically-inlined, the content of the function will know the span about the -original call site, which can be used to derive the filename, line, column, mod-path, etc. We would -define the following lang items to retrieve these information: +## Standard libraries -```rust -pub mod caller { - #[lang = "caller_file"] - pub const FILE: &str = ""; - #[lang = "caller_line"] - pub const LINE: u32 = 0; - #[lang = "caller_file"] - pub const COLUMN: u32 = 0; -} -``` - -After the semantic-inlining pass, these constants will be replaced by the corresponding call site -information: - -```rust -#[inline(semantic)] -fn line() -> u32 { core::caller::LINE } -assert_eq!(3, line()); -assert_eq!(4, line()); -assert_eq!(5, line()); -``` +The `caller_location()` intrinsic returns the `Location` structure which encodes the file, line and +column of the callsite. This shares the same structure as the existing type `std::panic::Location`. +Therefore, the type is promoted to a lang-item, and moved into `core::panicking::Location`. It is +re-exported in `libstd`. -These constants only make sense when used directly inside an `#[inline(semantic)]` function. This -excludes closures and const/static items inside the function, since they are represented as separate -MIR items. +Thanks to how `#[implicit_caller_location]` is implemented, we could provide a safe wrapper around +the `caller_location()` intrinsic: ```rust -use core::caller::LINE; - -const L: u32 = LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function - -#[inline(always)] -fn not_inline_semantic() -> u32 { - LINE //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function -} - -#[inline(semantic)] -fn inline_semantic() -> u32 { - const L: u32 = LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function - let closure = || LINE; //~ ERROR: Cannot read caller location outside of `#[inline(semantic)]` function - LINE // (only this one is ok.) +impl<'a> Location<'a> { + #[implicit_caller_location] + pub fn caller() -> Location<'static> { + unsafe { + ::intrinsics::caller_location() + } + } } ``` -Const-folding pass before the semantic-inliner must recognize these special constants and *not* fold -them. - -These values are provided as `const` lang items to make MIR transformation as simple as possible. -They could be intrinsic functions, but it would mean changing the `Call` terminator in MIR into an -`Assign` statement followed by a `Goto`, which is a bit complex. - -## Source-location-forwarding methods - -`unwrap()`/`expect()` will become `#[inline(semantic)]`. However if they are called deep inside from -the standard library, the source location would still be useless. Therefore, this RFC also propose -the following new functions to allow caller information be forwarded: +The `panic!` macro is modified to use `Location::caller()` (or the intrinsic directly) so it can +report the caller location inside `#[implicit_caller_location]`. ```rust -pub macro caller_location() { - &($crate::caller::FILE, $crate::caller::LINE, $crate::caller::COLUMN) -} - -pub macro panic_at_source_location { - ($location:expr) => { ... }, - ($location:expr, $fmt:expr) => { ... }, - ($location:expr, $fmt:expr, $($args:tt)*) => { ... }, -} - -impl Option { - pub fn unwrap_at_source_location(self, location: &(&'static str, u32, u32)) -> T; - pub fn expect_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> T; -} - -impl Result { - pub fn unwrap_at_source_location(self, location: &(&'static str, u32, u32)) -> T; - pub fn expect_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> T; - pub fn unwrap_err_at_source_location(self, location: &(&'static str, u32, u32)) -> E; - pub fn expect_err_at_source_location(self, location: &(&'static str, u32, u32), msg: &str) -> E; +macro_rules! panic { + ($msg:expr) => { + let loc = $crate::panicking::Location::caller(); + $crate::panicking::panic(&($msg, loc.file(), loc.line(), loc.column())) + }; + ... } ``` -Example: - -```rust -impl<'a, K, Q, V> Index<&'a Q> for BTreeMap -where - K: Ord + Borrow, - Q: Ord + ?Sized, -{ - type Output = V; - - #[inline(semantic)] - fn index(&self, key: &Q) -> &V { - self.get(key).expect_at_source_location(caller_location!(), "no entry found for key") - } -} -``` +Actually this is now more natural for `core::panicking::panic_fmt` to take `Location` directly +instead of tuples, so one should consider changing their signature. But this is out-of-scope for +this RFC. -Long names are given since they are considered advanced functions and should not be used normally. +`panic!` is often used about of `#[implicit_caller_location]` functions. In those cases, the +`caller_location()` intrinsic will pass unchanged through all MIR passes into trans. As a fallback, +the intrinsic will expand to `Location { file: file!(), line: line!(), col: column!() }` during +trans. ## Runtime-free backtrace for `?` operator @@ -722,21 +658,36 @@ impl Try for Result { // my_crate: impl Try for Result { - #[inline(semantic)] + #[implicit_caller_location] fn from_error(mut e: Self::Error) -> Self { - e.call_stack.push(*caller_location!()); + e.call_stack.push(Location::caller()); Err(e) } } ``` +## Location detail control + +An unstable flag `-Z location-detail` is added to `rustc` to control how much factural detail will +be emitted when using `caller_location()`. User can toggle `file`, `line` and `column` separately, +e.g. when compiling with: + +```sh +rustc -Zlocation-detail=line +``` + +only the line number will be real, and the file and column will always be dummy value like + + thread 'main' panicked at 'error message', :192:0 + + # Drawbacks [drawbacks]: #drawbacks ## Code bloat -Previously, all call to `unwrap()` and `expect()` will refer to the same location. Therefore, the panicking branch will -only need to reuse a pointer to a single global tuple. +Previously, all call to `unwrap()` and `expect()` will refer to the same location. Therefore, the +panicking branch will only need to reuse a pointer to a single global tuple. After this RFC is implemented, the panicking branch will need to allocate space to store the varying caller location, so the number of instructions per `unwrap()`/`expect()` will increase. @@ -763,18 +714,35 @@ if (b.tag != SOME) { a.value_of_some + b.value_of_some ``` -## Does not support dynamic call - -`#[inline(semantic)]` relies on compile-time inlining to resolve the caller information. This makes it impossible to -work with dynamic function calls (i.e. via a function pointer or trait object v-table). +(One could use `-Z location-detail` to get the old optimization behavior) ## Narrow solution scope -`#[inline(semantic)]` is only useful in solving the "get caller location" problem. Introducing an entirely new feature -just for this problem seems wasteful. +`#[implicit_caller_location]` is only useful in solving the "get caller location" problem. +Introducing an entirely new feature just for this problem seems wasteful. + +[Default function arguments](#default-function-arguments) is another possible solution for this +problem but with much wider application. + +## Confusing scoping rule -[Default function arguments](#default-function-arguments) is another possible solution for this problem but with much -wider application. +Consts, statics and closures are separate MIR items, meaning the following marked places will *not* +get caller locations: + +```rust +#[implicit_caller_location] +fn foo() { + static S: Location = Location::caller(); // will get actual location instead + let f = || Location::caller(); // will get actual location instead + Location::caller(); // this one will get caller location +} +``` + +This is confusing, but if we don't support this, we will need two `panic!` macros which is not much +better. + +Clippy could provide a lint against using `Location::caller()` outside of +`#[implicit_caller_location]`. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -801,43 +769,54 @@ This RFC tries to abide by the following restrictions: location. Restriction 4 "interface independence" is currently not implemented due to lack of -post-monomorphized MIR pass, but `#[inline(semantic)]` itself as a language feature follows this -restriction. +post-monomorphized MIR pass, but `#[implicit_caller_location]` itself as a language feature follows +this restriction. ## Alternatives ### 🚲 Name of everything 🚲 -* Is `#[inline(semantic)]` an accurate description? `#[inline(mir)]`? `#[inline(force)]`? - `#[inline(with_caller_location)]`? -* Use a different attribute, instead of piggybacking on `#[inline]`? -* `***_at_source_location` is too long? -* Should we move `std::panic::Location` into `core`, and not use a 3-tuple to represent the - location? Note that this is also advocated in [RFC 2070]. -* Use intrinsics or static instead of consts for `core::caller::{FILE, LINE, COLUMN}`? +* Is `#[implicit_caller_location]` an accurate description? +* Should we move `std::panic::Location` into `core`, or just use a 3-tuple to represent the + location? Note that the former is advocated in [RFC 2070]. +* Should caller location propagation be implicit (currently) or explicit? -### Avoid introducing new public items +### Using an ABI instead of an attribute -* We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to - real constants after semantic-inlining (a MIR pass) instead of early during macro expansion (an - AST pass). Inside `#[inline(semantic)]` functions, these macros behave as this RFC's - `core::caller::{FILE, LINE, COLUMN}`. This way, we don't need to introduce - `panic_at_source_location!()`. The drawback is losing explicit control of whether we want to - report the actual location or the caller's location. +```rust +pub extern "implicit-caller-location" fn my_unwrap() { + panic!("oh no"); +} +``` -* Same as above, but `file!()` etc are converted to a special literal kind, a kind that only the - compiler can create. +Compared with attributes, an ABI is a more natural way to tell the post-typechecking steps about +implicit parameters, pioneered by the `extern "rust-call"` ABI. However, creating a new ABI will +change the type of the function as well, causing the following statement to fail: -* We could make `#[inline(always)]` mean the same as `#[inline(semantic)]` to avoid introducing a - new inline level. +```rust +let f: fn(Option) -> u32 = Option::unwrap; +//~^ ERROR: [E0308]: mismatched types +``` + +Making this pass will require supporting implicitly coercing `extern "implicit-caller-location" fn` +pointer to a normal function pointer. Also, an ABI is not powerful enough to implicitly insert a +parameter, making it less competitive than just using an attribute. + +### Repurposing `file!()`, `line!()`, `column!()` + +We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to +real constants after redirection (a MIR or trans pass) instead of early during macro expansion (an +AST pass). Inside `#[implicit_caller_location]` functions, these macros behave as this RFC's +`caller_location()`. The drawback is using these macro will have different values in compile time +(e.g. inside `include!(file!())`) vs. runtime. ### Inline MIR -Introduced as [alternative to RFC 1669][inline_mir], instead of the `caller::{FILE, LINE, COLUMN}` -lang-items, we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: +Introduced as [alternative to RFC 1669][inline_mir], instead of the `caller_location()` intrinsic, +we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: ```rust -#[inline(semantic)] +#[implicit_caller_location] fn unwrap(self) -> T { let file: &'static str; let line: u32; @@ -861,8 +840,8 @@ fn unwrap(self) -> T { The problem of `mir!` in this context is trying to kill a fly with a sledgehammer. `mir!` is a very generic mechanism, which requiring stabilizing the MIR syntax and considering the interaction with -the surrounding code. Besides, `#[inline(semantic)]` itself still exists and the magic constants -`$CallerFile` etc are still magic. +the surrounding code. Besides, `#[implicit_caller_location]` itself still exists and the magic +constants `$CallerFile` etc are still magic. ### Default function arguments @@ -909,6 +888,12 @@ where This can be resolved if the future default argument proposal takes this into account. But again, this feature itself is going to be large and controversial. +### Semantic inlining + +Treat `#[implicit_caller_location]` as the same as a very forceful `#[inline(always)]`. This +eliminates the procedural macro pass. However, typical implementation will still need to manually +implement the two functions, otherwise the code will become very bloated after all the inlining. + ## Non-viable alternatives Many alternatives have been proposed before but failed to satisfy the restrictions laid out in the @@ -972,7 +957,13 @@ stabilized. Methods applying this solution will also lose object-safety. # Unresolved questions [unresolved]: #unresolved-questions -* Semantic-inlining should run after monomorphization, not before. +* Redirection pass should run after monomorphization, not before. + +* Diverging functions should be supported. + +* The closure `foo::{{closure}}` should inherit most attributes applied to the function `foo`, in + particular `#[inline]`, `#[cold]` and `#[naked]`. Currently a procedural macro won't see any of + these, nor would there be anyway to apply these attributes to a closure. [RFC 1669]: https://github.com/rust-lang/rfcs/pull/1669 [24346]: https://github.com/rust-lang/rust/issues/24346 From 30b12eb60bc6a7913e81e74394823fb7cec530b0 Mon Sep 17 00:00:00 2001 From: kennytm Date: Wed, 9 Aug 2017 16:55:56 +0800 Subject: [PATCH 04/12] Replied on some concerns. --- text/0000-inline-semantic.md | 37 ++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index 59cf2f74655..36cc45c897f 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -898,7 +898,7 @@ implement the two functions, otherwise the code will become very bloated after a Many alternatives have been proposed before but failed to satisfy the restrictions laid out in the [Rationale](#rationale) subsection, thus should *not* be considered viable alternatives within this -RFC. +RFC, at least at the time being. ### Macros @@ -922,17 +922,33 @@ macros~~. ### Backtrace -When given debug information (DWARF section on Linux, `*.pdb` file on Windows, `*.dSYM` folder on -macOS), the program is able to obtain the source code location for each address. This solution is +When given debug information (DWARF section/file on Linux, `*.pdb` file on Windows, `*.dSYM` folder +on macOS), the program is able to obtain the source code location for each address. This solution is often used in runtime-heavy languages like Python, Java and [Go]. For Rust, however: * The debug information is usually not provided in release mode. -* Normal inlining will make the caller information inaccurate. -* Issues [24346] (*Backtrace does not include file and line number on non-Linux platforms*) and - [42295] (*Slow backtrace on panic*) and are still not entirely fixed. -* Backtrace may be disabled in resource-constrained environment. + + In particular, `cargo` defaults to disabling debug symbols in release mode (this default can + certainly be changed). `rustc` itself is tested in CI and distributed in release mode, so + getting a usable location in release mode is a real concern (see also [RFC 1417] for why it was + disabled in the official distribution in the first place). + + Even if this is generated, the debug symbols are generally not distributed to end-users, which + means the error reports will only contain numerical addresses. This can be seen as a benefit, as + the implementation detail won't be exposed. But how to submit/analyze an error report would be + out-of-scope for this RFC. + +* There are multiple issues preventing us to rely on debug info nowadays. + + Issues [24346] (*Backtrace does not include file and line number on non-Linux platforms*) and + [42295] (*Slow backtrace on panic*) and are still not entirely fixed. Even after the debuginfo + is properly handled, if we decide not to expose the whole the full stacktrace, we may still need + to reopen pull request [40264] (*Ignore more frames on backtrace unwinding*). + + These signal that debuginfo support is not reliable enough if we want to solve the unwrap/expect + issue now. These drawbacks are the main reason why restriction 3 "debug-info independence" is added to the motivation. @@ -954,6 +970,8 @@ Unfortunately this violates restriction 4 "interface independence". This solutio `HashMap::index` as this will require a change of the method signature of `index()` which has been stabilized. Methods applying this solution will also lose object-safety. +The same drawback exists if we base the solution on [RFC 2000] (*const generics*). + # Unresolved questions [unresolved]: #unresolved-questions @@ -972,6 +990,9 @@ stabilized. Methods applying this solution will also lose object-safety. [RFC issue 1744]: https://github.com/rust-lang/rfcs/issues/1744 [RFC issue 323]: https://github.com/rust-lang/rfcs/issues/323 [RFC 2070]: https://github.com/rust-lang/rfcs/pull/2070 +[RFC 2000]: https://github.com/rust-lang/rfcs/pull/2000 +[40264]: https://github.com/rust-lang/rust/issues/40264 +[RFC 1417]: https://github.com/rust-lang/rfcs/issues/1417 [a]: https://internals.rust-lang.org/t/rfrfc-better-option-result-error-messages/2904 [b]: https://internals.rust-lang.org/t/line-info-for-unwrap-expect/3753 @@ -986,4 +1007,4 @@ stabilized. Methods applying this solution will also lose object-safety. [Go]: https://golang.org/pkg/runtime/#Caller [C++]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fLINE -[inlining]: https://en.wikipedia.org/wiki/Inline_expansion \ No newline at end of file +[inlining]: https://en.wikipedia.org/wiki/Inline_expansion From 1093814e925b1b7d601c26af8639ccf9b79893a8 Mon Sep 17 00:00:00 2001 From: kennytm Date: Fri, 11 Aug 2017 13:52:54 +0800 Subject: [PATCH 05/12] Further explained why #[inline(semantic)] is deprecated. --- text/0000-inline-semantic.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index 36cc45c897f..a4625381e78 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -891,8 +891,18 @@ this feature itself is going to be large and controversial. ### Semantic inlining Treat `#[implicit_caller_location]` as the same as a very forceful `#[inline(always)]`. This -eliminates the procedural macro pass. However, typical implementation will still need to manually -implement the two functions, otherwise the code will become very bloated after all the inlining. +eliminates the procedural macro pass. This was the approach suggested in the first edition of this +RFC, since the target functions (`unwrap`, `expect`, `index`) are just a few lines long. However, it +experienced push-back from the community as: + +1. Inlining causes debugging to be difficult. +2. It does not work with recursive functions. +3. People do want to apply the attribute to long functions. +4. The expected usage of "semantic inlining" and traditional inlining differ a lot, continue calling + it inlining may confuse beginners. + +Therefore the RFC is changed to the current form, and the inlining pass is now described as just an +implementation detail. ## Non-viable alternatives From 82ef9a0034ac2ddfcd6509f059e2ecdbae094c22 Mon Sep 17 00:00:00 2001 From: kennytm Date: Fri, 25 Aug 2017 14:41:31 +0800 Subject: [PATCH 06/12] Fix a typo. --- text/0000-inline-semantic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index a4625381e78..4a5e88e7213 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -633,7 +633,7 @@ Actually this is now more natural for `core::panicking::panic_fmt` to take `Loca instead of tuples, so one should consider changing their signature. But this is out-of-scope for this RFC. -`panic!` is often used about of `#[implicit_caller_location]` functions. In those cases, the +`panic!` is often used outside of `#[implicit_caller_location]` functions. In those cases, the `caller_location()` intrinsic will pass unchanged through all MIR passes into trans. As a fallback, the intrinsic will expand to `Location { file: file!(), line: line!(), col: column!() }` during trans. From 366351d0df2d3491236215cd33aecb4b008d3f16 Mon Sep 17 00:00:00 2001 From: Jake Goulding Date: Fri, 25 Aug 2017 10:48:27 -0400 Subject: [PATCH 07/12] Minor grammar and typo fixes (#1) --- text/0000-inline-semantic.md | 171 +++++++++++++++++------------------ 1 file changed, 85 insertions(+), 86 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index 4a5e88e7213..05c8deb69b2 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -13,7 +13,7 @@ the following changes: 1. Support the `#[implicit_caller_location]` function attribute, which guarantees a function has access to the caller information. -2. Adds an intrinsic function `caller_location()` (safe wrapper: `Location::caller()`) to retrieve +2. Add an intrinsic function `caller_location()` (safe wrapper: `Location::caller()`) to retrieve the caller's source location. Example: @@ -71,20 +71,20 @@ let m = n.unwrap(); # Motivation [motivation]: #motivation -It is well-known that the error message reported by `unwrap()` is useless. +It is well-known that the error message reported by `unwrap()` is useless: ```text thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335 note: Run with `RUST_BACKTRACE=1` for a backtrace. ``` -There have been numerous discussions ([a], [b], [c]) that wants `unwrap()` and friends to provide -the better information to locate the panic. Previously, [RFC 1669] attempted to address this by +There have been numerous discussions ([a], [b], [c]) that want `unwrap()` and friends to provide +better information to locate the panic. [RFC 1669] attempted to address this by introducing the `unwrap!(x)` macro to the standard library, but it was closed since the `x.unwrap()` convention is too entrenched. -This RFC tries to introduce line numbers into `unwrap()` without requiring users to adapt a new -idiom, i.e. the user should be able to see the precise location without changing any of the source +This RFC introduces line numbers into `unwrap()` without requiring users to adapt a new +idiom, i.e. the user should be able to see the precise location without changing any source code. # Guide-level explanation @@ -93,7 +93,7 @@ code. ## Let's reimplement `unwrap()` `unwrap()` and `expect()` are two methods on `Option` and `Result` that are commonly used when you -are *absolutely sure* they only contains a successful value and you want to extract it. +are *absolutely sure* they contain a successful value and you want to extract it. ```rust // 1.rs @@ -148,8 +148,8 @@ fn main() { } ``` -This trivial implementation, however, will only report the panic happens inside `my_unwrap`. This is -pretty useless, since it is the caller of `my_unwrap` that have made the wrong assumption! +This trivial implementation, however, will only report the panic that happens inside `my_unwrap`. This is +pretty useless since it is the caller of `my_unwrap` that made the wrong assumption! ```text $ ./2 @@ -173,8 +173,8 @@ args[2] = arg2 args[3] = arg3 ``` -The trivial solution would be requiring user to provide `file!()`, `line!()` and `column!()`. A -slightly more ergonomic solution would be changing `my_unwrap` to a macro, thus these constants can +The trivial solution would require the user to provide `file!()`, `line!()` and `column!()`. A +slightly more ergonomic solution would be changing `my_unwrap` into a macro, allowing these constants to be automatically provided. ```rust @@ -195,13 +195,13 @@ println!("args[1] = {}", my_unwrap!(args().nth(1))); ... ``` -But what if you have already published the `my_unwrap` crate that has thousands of users, and you -want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` has the same problem! +What if you have already published the `my_unwrap` crate that has thousands of users, and you +want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` had the same problem! ## Adding caller location implicitly -The reason a `my_unwrap!` macro works is because it copy-and-paste the entire content of its macro -definition everytime it is used. +The reason the `my_unwrap!` macro works is because it copy-and-pastes the entire content of its macro +definition every time it is used. ```rust println!("args[1] = {}", my_unwrap!(args().nth(1))); @@ -215,8 +215,8 @@ println!("args[1] = {}", my_unwrap(args().nth(2), file!(), line!(), column!())); ... ``` -What if we can instruct the compiler to automatically fill in the file-line-column? Rust supported -this starting from 1.YY with the `#[implicit_caller_location]` attribute: +What if we could instruct the compiler to automatically fill in the file, line, and column? +Rust 1.YY introduced the `#[implicit_caller_location]` attribute for exactly this reason: ```rust // 3.rs @@ -236,7 +236,7 @@ fn main() { } ``` -Now we magically have truly reproduced how the built-in `unwrap()` is implemented. +Now we have truly reproduced how the built-in `unwrap()` is implemented. ```text $ ./3 @@ -260,16 +260,16 @@ args[2] = arg2 args[3] = arg3 ``` -What `#[implicit_caller_location]` does is an automated version of what you've seen last section. -The attribute will copy `my_unwrap` into a new function `my_unwrap_at_source_location`, which will -accept the caller's location as an additional argument. The attribute also instructs the compiler -that, "whenever you see someone calls `my_unwrap(x)`, replace it with -`my_unwrap_at_source_location(x, file!(), line!(), column!())` instead!". This allows us to maintain -the stability guarantee, while allowing user to get the new behavior with just one recompile. +`#[implicit_caller_location]` is an automated version of what you've seen in the last section. The +attribute copies `my_unwrap` to a new function `my_unwrap_at_source_location` which accepts the +caller's location as an additional argument. The attribute also instructs the compiler to replace +`my_unwrap(x)` with `my_unwrap_at_source_location(x, file!(), line!(), column!())` whenever it sees +it. This allows us to maintain the stability guarantee while allowing the user to get the new +behavior with just one recompile. ## Location type -Let's enhance `my_unwrap`, say, we also log a message to the log file before panicking? We would +Let's enhance `my_unwrap` to also log a message to the log file before panicking. We would need to get the caller's location as a value. This is supported using the method `Location::caller()`: @@ -291,29 +291,28 @@ pub fn my_unwrap(input: Option) -> T { ## Why do we use implicit caller location If you are learning Rust alongside other languages, you may wonder why Rust obtains the caller -information in such a strange way. There are two restrictions that forces us to adapt this solution: +information in such a strange way. There are two restrictions that force us to adopt this solution: 1. Programmatic access to the stack backtrace is often used in interpreted or runtime-heavy languages like Python and Java. However, the stack backtrace is not suitable as the only - solution for systems languages like Rust, because optimization often collapses multiple levels - of function calls, and in some embedded system the backtrace may even be unavailable. + solution for systems languages like Rust because optimization often collapses multiple levels + of function calls. In some embedded systems, the backtrace may even be unavailable! -2. Rust does not (yet) support default function arguments or function overloading, because it badly - interferes with type inference. Therefore, solutions that uses default function arguments - alongside normal arguments are also ruled out. Default-function-arguments-based solutions are - often used in languages that does not perform inference higher than statement level, e.g. Swift - and C#. +2. Solutions that use default function arguments alongside normal arguments are are often used in + languages that do not perform inference higher than statement level, e.g. Swift and C#. Rust + does not (yet) support default function arguments or function overloading because they interfere + with type inference, so such solutions are ruled out. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## Survey of panicking standard functions -Many standard functions may panic. These are divided into three categories, on whether they should -receive caller information despite the inlining cost associated with it. +Many standard functions may panic. These are divided into three categories depending on whether they +should receive caller information despite the inlining cost associated with it. -(The list of functions is not exhaustive. Only those with a "Panics" section in the documentation -are included.) +The list of functions is not exhaustive. Only those with a "Panics" section in the documentation +are included. 1. **Must have.** These functions are designed to generate a panic, or used so often that indicating a panic happening from them often gives no useful information. @@ -336,7 +335,7 @@ are included.) | `VecDeque::index` | index out of bounds | 2. **Nice to have.** These functions are not commonly used, or the panicking condition is pretty - rare. Often the panic information contains enough clue to fix the error without backtrace. + rare. Often the panic information contains enough clue to fix the error without a backtrace. Inlining them would bloat the binary size without much benefit.
List of category 2 functions @@ -411,8 +410,8 @@ are included.)
-3. **Not needed.** Panics from these indicate silly programmer error and the panic itself have - enough clue to let programmers figure out where did the error comes from. +3. **Not needed.** Panics from these indicate silly programmer error and the panic itself has + enough clue to let programmers figure out where the error comes from.
List of category 3 functions @@ -449,12 +448,12 @@ are included.) This RFC only advocates adding the `#[implicit_caller_location]` attribute to the `unwrap` and `expect` functions. The `index` and `index_mut` functions should also have it if possible, but -currently blocked by lack of post-monomorphized MIR pass. +this is currently blocked by a lack of a post-monomorphized MIR pass. ## Procedural attribute macro -The `#[implicit_caller_location]` attribute will modify a function at AST level and MIR level, -without touching type-checking (HIR level) or the low-level LLVM passes. +The `#[implicit_caller_location]` attribute will modify a function at the AST and MIR levels +without touching the type-checking (HIR level) or the low-level LLVM passes. It will first wrap the body of the function in a closure, and then call it: @@ -475,18 +474,18 @@ fn foo(x: A, y: B, z: C) -> R { } ``` -The reason for this is to split the function into two: the function `foo` itself, and the closure -`foo::{{closure}}` in it. (Technically: it is the simplest way to create two `DefId` at HIR level as -far as I know.) +This is to split the function into two: the function `foo` itself, and the closure +`foo::{{closure}}` in it. (Technically: it is the simplest way to create two `DefId`s at the HIR +level as far as I know.) The function signature of `foo` remains unchanged, so typechecking can proceed normally. The attribute will be replaced by `#[rustc_implicit_caller_location]` to let the compiler internals -continue to treat it specially. `#[inline]` is added so external crate can see through `foo` to find +continue to treat it specially. `#[inline]` is added so external crates can see through `foo` to find `foo::{{closure}}`. -The closure `foo::{{closure}}` is a proper function, that the compiler can write calls directly to -`foo::{{closure}}`, skipping `foo`. Multiple calls to `foo` from different location can be done via -calling `foo::{{closure}}` directly, instead of copying the function body everytime which will bloat +The closure `foo::{{closure}}` is a proper function so that the compiler can write calls directly to +`foo::{{closure}}`, skipping `foo`. Multiple calls to `foo` from different locations can be done via +calling `foo::{{closure}}` directly, instead of copying the function body every time which would bloat the binary size. The intrinsic `caller_location()` is a placeholder which will be replaced by the actual caller @@ -507,7 +506,7 @@ _c = call std::intrinsics::caller_location() -> 'bbt; _r = call foo::{{closure}} (&[closure: x: _1, y: _2], _c) -> 'bb1; ``` -Then, we will further replace the `caller_location()` intrinsic according to where `foo` is called. +We will further replace the `caller_location()` intrinsic according to where `foo` is called. If it is called from an ordinary function, it would be replaced by the callsite's location: ```rust @@ -521,9 +520,9 @@ _c = Location { file: file!(), line: line!(), column: column!() }; goto -> 'bbt; ``` -On the other hand, if it is called from an `#[implicit_caller_location]`'s closure e.g. +If it is called from an `#[implicit_caller_location]`'s closure e.g. `foo::{{closure}}`, the intrinsic will be replaced by the closure argument `__location` instead, so -the caller location can propagate directly +that the caller location can propagate directly ```rust // for #[implicit_caller_location] closures, @@ -537,7 +536,7 @@ goto -> 'bbt; ``` These steps are very similar to inlining, and thus the first proof-of-concept is implemented -directly as a variant of MIR inliner (but a separate pass). This also means the redirection pass +directly as a variant of the MIR inliner (but a separate pass). This also means the redirection pass currently suffers from all disadvantages of the MIR inliner, namely: * Locations will not be propagated into diverging functions (`fn() -> !`), since inlining them is @@ -575,7 +574,7 @@ fn f100() { } ``` -Currently the redirection pass always run before the inlining pass. If the redirection pass is run +Currently the redirection pass always runs before the inlining pass. If the redirection pass is run after the normal MIR inlining pass, the normal MIR inliner must treat `#[implicit_caller_location]` as `#[inline(never)]`. @@ -584,7 +583,7 @@ The closure `foo::{{closure}}` must never be inlined before the redirection pass When `#[implicit_caller_location]` functions are called dynamically, no inlining will occur, and thus it cannot take the location of the caller. Currently this will report where the function is declared. Taking the address of such functions must be allowed due to backward compatibility. (If -post-monomorphized MIR pass exists, methods via trait objects would be another case of calling +a post-monomorphized MIR pass exists, methods via trait objects would be another case of calling `#[implicit_caller_location]` functions without caller location.) ```rust @@ -600,7 +599,7 @@ g(None); // The effect of these two calls must be the same. The `caller_location()` intrinsic returns the `Location` structure which encodes the file, line and column of the callsite. This shares the same structure as the existing type `std::panic::Location`. Therefore, the type is promoted to a lang-item, and moved into `core::panicking::Location`. It is -re-exported in `libstd`. +re-exported from `libstd`. Thanks to how `#[implicit_caller_location]` is implemented, we could provide a safe wrapper around the `caller_location()` intrinsic: @@ -630,7 +629,7 @@ macro_rules! panic { ``` Actually this is now more natural for `core::panicking::panic_fmt` to take `Location` directly -instead of tuples, so one should consider changing their signature. But this is out-of-scope for +instead of tuples, so one should consider changing their signature, but this is out-of-scope for this RFC. `panic!` is often used outside of `#[implicit_caller_location]` functions. In those cases, the @@ -640,9 +639,9 @@ trans. ## Runtime-free backtrace for `?` operator -The standard `Try` implementations could participate in specialization, so external crates like +The standard `Try` implementations could participate in specialization so that external crates like `error_chain` could provide a specialized impl that prepends the caller location into the callstack -everytime `?` is used. +every time `?` is used. ```rust // libcore: @@ -668,15 +667,15 @@ impl Try for Result { ## Location detail control -An unstable flag `-Z location-detail` is added to `rustc` to control how much factural detail will -be emitted when using `caller_location()`. User can toggle `file`, `line` and `column` separately, +An unstable flag `-Z location-detail` is added to `rustc` to control how much factual detail will +be emitted when using `caller_location()`. The user can toggle `file`, `line` and `column` separately, e.g. when compiling with: ```sh rustc -Zlocation-detail=line ``` -only the line number will be real, and the file and column will always be dummy value like +only the line number will be real. The file and column will always be a dummy value like thread 'main' panicked at 'error message', :192:0 @@ -686,14 +685,14 @@ only the line number will be real, and the file and column will always be dummy ## Code bloat -Previously, all call to `unwrap()` and `expect()` will refer to the same location. Therefore, the -panicking branch will only need to reuse a pointer to a single global tuple. +Previously, all calls to `unwrap()` and `expect()` referred to the same location. Therefore, the +panicking branch will only needed to reuse a pointer to a single global tuple. After this RFC is implemented, the panicking branch will need to allocate space to store the varying caller location, so the number of instructions per `unwrap()`/`expect()` will increase. -Also the optimizer will lose the opportunity to consolidate all jumps to the panicking branch, e.g. with the code -`a.unwrap() + b.unwrap()`, before this RFC LLVM will optimize it to something like +The optimizer will lose the opportunity to consolidate all jumps to the panicking branch. Before +this RFC, LLVM would optimize `a.unwrap() + b.unwrap()`, to something like ```rust if (a.tag != SOME || b.tag != SOME) { @@ -714,7 +713,7 @@ if (b.tag != SOME) { a.value_of_some + b.value_of_some ``` -(One could use `-Z location-detail` to get the old optimization behavior) +One can use `-Z location-detail` to get the old optimization behavior. ## Narrow solution scope @@ -738,8 +737,8 @@ fn foo() { } ``` -This is confusing, but if we don't support this, we will need two `panic!` macros which is not much -better. +This is confusing, but if we don't support this, we will need two `panic!` macros which is not a +better solution. Clippy could provide a lint against using `Location::caller()` outside of `#[implicit_caller_location]`. @@ -751,26 +750,26 @@ Clippy could provide a lint against using `Location::caller()` outside of This RFC tries to abide by the following restrictions: -1. **Precise caller location**. Standard library functions which commonly panics will report the - source location at where the user call them. The source location should never point inside the +1. **Precise caller location**. Standard library functions which commonly panic will report the + source location as where the user called them. The source location should never point inside the standard library. Examples of these functions include `Option::unwrap` and `HashMap::index`. -2. **Source compatibility**. User should never need to modify existing source code to benefit from +2. **Source compatibility**. Users should never need to modify existing source code to benefit from the improved precision. 3. **Debug-info independence**. The precise caller location can still be reported even after stripping of debug information, which is very common on released software. 4. **Interface independence**. The implementation of a trait should be able to decide whether to - accept a the caller information. It shouldn't require the trait itself to enforce it, i.e. it + accepts the caller information; it shouldn't require the trait itself to enforce it. It should not affect the signature of the function. This is an extension of rule 2, since the `Index` trait is involved in `HashMap::index`. The stability of `Index` must be upheld, e.g. it should remain object-safe, and existing implementions should not be forced to accept the caller location. Restriction 4 "interface independence" is currently not implemented due to lack of -post-monomorphized MIR pass, but `#[implicit_caller_location]` itself as a language feature follows -this restriction. +post-monomorphized MIR pass, but implementing `#[implicit_caller_location]` as a language feature +follows this restriction. ## Alternatives @@ -779,7 +778,7 @@ this restriction. * Is `#[implicit_caller_location]` an accurate description? * Should we move `std::panic::Location` into `core`, or just use a 3-tuple to represent the location? Note that the former is advocated in [RFC 2070]. -* Should caller location propagation be implicit (currently) or explicit? +* Should the caller location propagation be implicit (currently) or explicit? ### Using an ABI instead of an attribute @@ -807,12 +806,12 @@ parameter, making it less competitive than just using an attribute. We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to real constants after redirection (a MIR or trans pass) instead of early during macro expansion (an AST pass). Inside `#[implicit_caller_location]` functions, these macros behave as this RFC's -`caller_location()`. The drawback is using these macro will have different values in compile time +`caller_location()`. The drawback is using these macro will have different values at compile time (e.g. inside `include!(file!())`) vs. runtime. ### Inline MIR -Introduced as [alternative to RFC 1669][inline_mir], instead of the `caller_location()` intrinsic, +Introduced as an [alternative to RFC 1669][inline_mir], instead of the `caller_location()` intrinsic, we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: ```rust @@ -839,7 +838,7 @@ fn unwrap(self) -> T { ``` The problem of `mir!` in this context is trying to kill a fly with a sledgehammer. `mir!` is a very -generic mechanism, which requiring stabilizing the MIR syntax and considering the interaction with +generic mechanism which requires stabilizing the MIR syntax and considering the interaction with the surrounding code. Besides, `#[implicit_caller_location]` itself still exists and the magic constants `$CallerFile` etc are still magic. @@ -915,7 +914,7 @@ RFC, at least at the time being. The `unwrap!()` macro introduced in [RFC 1669] allows the user to write `unwrap!(x)` instead of `x.unwrap()`. -Similar solution is introducing a `loc!()` macro that expands to +A similar solution is introducing a `loc!()` macro that expands to `concat!(file!(), ":", line!(), ":", column!())`, so user writes `x.expect(loc!())` instead of `x.unwrap()`. @@ -923,12 +922,12 @@ There is even the [`better_unwrap` crate](https://github.com/abonander/better_un automatically rewrites all `unwrap()` and `expect()` inside a module to provide the caller location through a procedural attribute. -All of these are non-viable, since they require the user to actively change their source code, thus +All of these are non-viable since they require the user to actively change their source code, thus violating restriction 2 "source compatibility", ~~unless we are willing to drop the `!` from macros~~. -(Also, all pre-typeck rewrites are prone to false-positive affecting unrelated types that have an -`unwrap()` method. Post-typeck rewrites are no different from this RFC.) +All pre-typeck rewrites are prone to false-positive failures affecting unrelated types that have an +`unwrap()` method. Post-typeck rewrites are no different from this RFC. ### Backtrace @@ -947,10 +946,10 @@ For Rust, however: Even if this is generated, the debug symbols are generally not distributed to end-users, which means the error reports will only contain numerical addresses. This can be seen as a benefit, as - the implementation detail won't be exposed. But how to submit/analyze an error report would be + the implementation detail won't be exposed, but how to submit/analyze an error report would be out-of-scope for this RFC. -* There are multiple issues preventing us to rely on debug info nowadays. +* There are multiple issues preventing us from relying on debug info nowadays. Issues [24346] (*Backtrace does not include file and line number on non-Linux platforms*) and [42295] (*Slow backtrace on panic*) and are still not entirely fixed. Even after the debuginfo From 2c89cee66cefa772765b4092dc16b554c720caf0 Mon Sep 17 00:00:00 2001 From: kennytm Date: Tue, 5 Sep 2017 21:00:52 +0800 Subject: [PATCH 08/12] Updated for responses. 1. `#[implicit_caller_location]` -> `#[blame_caller]`. The internal attribute remains as `#[rustc_implicit_caller_location]`. 2. Imported discussion about "my fault" vs "your fault". Ensure only `panic!()` and `assert!()` will get caller's location. 3. Added DbC as a viable alternative (in terms of syntax). 4. Removed the section about `Try` since we decided not to support `#[blame_caller]` on traits. --- text/0000-inline-semantic.md | 311 +++++++++++++++++++++++++---------- 1 file changed, 225 insertions(+), 86 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index 05c8deb69b2..a04ca8129b4 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -1,4 +1,4 @@ -- Feature Name: `implicit_caller_location` +- Feature Name: `blame_caller` - Start Date: 2017-07-31 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -11,18 +11,18 @@ Enable accurate caller location reporting during panic in `{Option, Result}::{unwrap, expect}` with the following changes: -1. Support the `#[implicit_caller_location]` function attribute, which guarantees a function has - access to the caller information. +1. Support the `#[blame_caller]` function attribute, which guarantees a function has access to the + caller information. 2. Add an intrinsic function `caller_location()` (safe wrapper: `Location::caller()`) to retrieve the caller's source location. Example: ```rust -#![feature(implicit_caller_location)] +#![feature(blame_caller)] use std::panic::Location; -#[implicit_caller_location] +#[blame_caller] fn unwrap(self) -> T { panic!("{}: oh no", Location::caller()); } @@ -37,7 +37,7 @@ let m = n.unwrap(); - [Motivation](#motivation) - [Guide-level explanation](#guide-level-explanation) - [Let's reimplement `unwrap()`](#lets-reimplement-unwrap) - - [Adding caller location implicitly](#adding-caller-location-implicitly) + - [Blame the caller](#blame-the-caller) - [Location type](#location-type) - [Why do we use implicit caller location](#why-do-we-use-implicit-caller-location) - [Reference-level explanation](#reference-level-explanation) @@ -45,7 +45,7 @@ let m = n.unwrap(); - [Procedural attribute macro](#procedural-attribute-macro) - [Redirection (MIR inlining)](#redirection-mir-inlining) - [Standard libraries](#standard-libraries) - - [Runtime-free backtrace for `?` operator](#runtime-free-backtrace-for--operator) + - [“My fault” vs “Your fault”](#my-fault-vs-your-fault) - [Location detail control](#location-detail-control) - [Drawbacks](#drawbacks) - [Code bloat](#code-bloat) @@ -60,6 +60,7 @@ let m = n.unwrap(); - [Inline MIR](#inline-mir) - [Default function arguments](#default-function-arguments) - [Semantic inlining](#semantic-inlining) + - [Design-by-contract](#design-by-contract) - [Non-viable alternatives](#non-viable-alternatives) - [Macros](#macros) - [Backtrace](#backtrace) @@ -198,7 +199,7 @@ println!("args[1] = {}", my_unwrap!(args().nth(1))); What if you have already published the `my_unwrap` crate that has thousands of users, and you want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` had the same problem! -## Adding caller location implicitly +## Blame the caller The reason the `my_unwrap!` macro works is because it copy-and-pastes the entire content of its macro definition every time it is used. @@ -216,13 +217,13 @@ println!("args[1] = {}", my_unwrap(args().nth(2), file!(), line!(), column!())); ``` What if we could instruct the compiler to automatically fill in the file, line, and column? -Rust 1.YY introduced the `#[implicit_caller_location]` attribute for exactly this reason: +Rust 1.YY introduced the `#[blame_caller]` attribute for exactly this reason: ```rust // 3.rs -#![feature(implicit_caller_location)] +#![feature(blame_caller)] use std::env::args; -#[implicit_caller_location] // <-- Just add this! +#[blame_caller] // <-- Just add this! pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, @@ -260,12 +261,12 @@ args[2] = arg2 args[3] = arg3 ``` -`#[implicit_caller_location]` is an automated version of what you've seen in the last section. The -attribute copies `my_unwrap` to a new function `my_unwrap_at_source_location` which accepts the -caller's location as an additional argument. The attribute also instructs the compiler to replace -`my_unwrap(x)` with `my_unwrap_at_source_location(x, file!(), line!(), column!())` whenever it sees -it. This allows us to maintain the stability guarantee while allowing the user to get the new -behavior with just one recompile. +`#[blame_caller]` is an automated version of what you've seen in the last section. The attribute +copies `my_unwrap` to a new function `my_unwrap_at_source_location` which accepts the caller's +location as an additional argument. The attribute also instructs the compiler to replace +`my_unwrap(x)` with `my_unwrap_at_source_location(x, file!(), line!(), column!())` (sort of) +whenever it sees it. This allows us to maintain the stability guarantee while allowing the user to +get the new behavior with just one recompile. ## Location type @@ -275,7 +276,7 @@ need to get the caller's location as a value. This is supported using the method ```rust use std::panic::Location; -#[implicit_caller_location] +#[blame_caller] pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, @@ -446,19 +447,20 @@ are included.
-This RFC only advocates adding the `#[implicit_caller_location]` attribute to the `unwrap` and -`expect` functions. The `index` and `index_mut` functions should also have it if possible, but -this is currently blocked by a lack of a post-monomorphized MIR pass. +This RFC only advocates adding the `#[blame_caller]` attribute to the `unwrap` and `expect` +functions. The `index` and `index_mut` functions should also have it if possible, but this is +currently postponed as it is not investigated yet how to insert the transformation after +monomorphization. ## Procedural attribute macro -The `#[implicit_caller_location]` attribute will modify a function at the AST and MIR levels -without touching the type-checking (HIR level) or the low-level LLVM passes. +The `#[blame_caller]` attribute will modify a function at the AST and MIR levels without touching +the type-checking (HIR level) or the low-level LLVM passes. It will first wrap the body of the function in a closure, and then call it: ```rust -#[implicit_caller_location] +#[blame_caller] fn foo(x: A, y: B, z: C) -> R { bar(x, y) } @@ -480,13 +482,13 @@ level as far as I know.) The function signature of `foo` remains unchanged, so typechecking can proceed normally. The attribute will be replaced by `#[rustc_implicit_caller_location]` to let the compiler internals -continue to treat it specially. `#[inline]` is added so external crates can see through `foo` to find -`foo::{{closure}}`. +continue to treat it specially. `#[inline]` is added so external crates can see through `foo` to +find `foo::{{closure}}`. The closure `foo::{{closure}}` is a proper function so that the compiler can write calls directly to `foo::{{closure}}`, skipping `foo`. Multiple calls to `foo` from different locations can be done via -calling `foo::{{closure}}` directly, instead of copying the function body every time which would bloat -the binary size. +calling `foo::{{closure}}` directly, instead of copying the function body every time which would +bloat the binary size. The intrinsic `caller_location()` is a placeholder which will be replaced by the actual caller location when one calls `foo::{{closure}}` directly. @@ -520,12 +522,12 @@ _c = Location { file: file!(), line: line!(), column: column!() }; goto -> 'bbt; ``` -If it is called from an `#[implicit_caller_location]`'s closure e.g. -`foo::{{closure}}`, the intrinsic will be replaced by the closure argument `__location` instead, so -that the caller location can propagate directly +If it is called from an `#[rustc_implicit_caller_location]`'s closure e.g. `foo::{{closure}}`, the +intrinsic will be replaced by the closure argument `__location` instead, so that the caller location +can propagate directly ```rust -// for #[implicit_caller_location] closures, +// for #[rustc_implicit_caller_location] closures, _c = call std::intrinsics::caller_location() -> 'bbt; @@ -542,22 +544,23 @@ currently suffers from all disadvantages of the MIR inliner, namely: * Locations will not be propagated into diverging functions (`fn() -> !`), since inlining them is not supported yet. -* MIR passes are run *before* monomorphization, meaning `#[implicit_caller_location]` currently - **cannot** be used on trait items: +* MIR passes are run *before* monomorphization, meaning `#[blame_caller]` currently **cannot** be + used on trait items: ```rust trait Trait { fn unwrap(&self); } impl Trait for u64 { - #[implicit_caller_location] //~ ERROR: `#[implicit_caller_location]` is not supported for trait items yet. + #[blame_caller] //~ ERROR: `#[blame_caller]` is not supported for trait items yet. fn unwrap(&self) {} } ``` To support trait items, the redirection pass must be run as post-monomorphized MIR pass (which does -not exist yet), or a custom LLVM inlining pass which can extract the caller's source location. This -prevents the `Index` trait from having `#[implicit_caller_location]` yet. +not exist yet), or converted to queries provided after resolve, or a custom LLVM inlining pass which +can extract the caller's source location. This prevents the `Index` trait from having +`#[blame_caller]` yet. We cannot hack the impl resolution method into pre-monomorphization MIR pass because of deeply nested functions like @@ -575,16 +578,16 @@ fn f100() { ``` Currently the redirection pass always runs before the inlining pass. If the redirection pass is run -after the normal MIR inlining pass, the normal MIR inliner must treat `#[implicit_caller_location]` -as `#[inline(never)]`. +after the normal MIR inlining pass, the normal MIR inliner must treat +`#[rustc_implicit_caller_location]` as `#[inline(never)]`. The closure `foo::{{closure}}` must never be inlined before the redirection pass. -When `#[implicit_caller_location]` functions are called dynamically, no inlining will occur, and -thus it cannot take the location of the caller. Currently this will report where the function is +When `#[rustc_implicit_caller_location]` functions are called dynamically, no inlining will occur, +and thus it cannot take the location of the caller. Currently this will report where the function is declared. Taking the address of such functions must be allowed due to backward compatibility. (If a post-monomorphized MIR pass exists, methods via trait objects would be another case of calling -`#[implicit_caller_location]` functions without caller location.) +`#[rustc_implicit_caller_location]` functions without caller location.) ```rust let f: fn(Option) -> u32 = Option::unwrap; @@ -601,12 +604,12 @@ column of the callsite. This shares the same structure as the existing type `std Therefore, the type is promoted to a lang-item, and moved into `core::panicking::Location`. It is re-exported from `libstd`. -Thanks to how `#[implicit_caller_location]` is implemented, we could provide a safe wrapper around -the `caller_location()` intrinsic: +Thanks to how `#[blame_caller]` is implemented, we could provide a safe wrapper around the +`caller_location()` intrinsic: ```rust impl<'a> Location<'a> { - #[implicit_caller_location] + #[blame_caller] pub fn caller() -> Location<'static> { unsafe { ::intrinsics::caller_location() @@ -616,7 +619,7 @@ impl<'a> Location<'a> { ``` The `panic!` macro is modified to use `Location::caller()` (or the intrinsic directly) so it can -report the caller location inside `#[implicit_caller_location]`. +report the caller location inside `#[blame_caller]`. ```rust macro_rules! panic { @@ -632,39 +635,116 @@ Actually this is now more natural for `core::panicking::panic_fmt` to take `Loca instead of tuples, so one should consider changing their signature, but this is out-of-scope for this RFC. -`panic!` is often used outside of `#[implicit_caller_location]` functions. In those cases, the +`panic!` is often used outside of `#[blame_caller]` functions. In those cases, the `caller_location()` intrinsic will pass unchanged through all MIR passes into trans. As a fallback, the intrinsic will expand to `Location { file: file!(), line: line!(), col: column!() }` during trans. -## Runtime-free backtrace for `?` operator +## “My fault” vs “Your fault” -The standard `Try` implementations could participate in specialization so that external crates like -`error_chain` could provide a specialized impl that prepends the caller location into the callstack -every time `?` is used. +In a `#[blame_caller]` function, we expect all panics being attributed to the caller (thus the +attribute name). However, sometimes the code panics not due to the caller, but the implementation +itself. It may be important to distinguish between "my fault" (implementation error) and +"your fault" (caller violating API requirement). As an example, ```rust -// libcore: -impl Try for Result { - type Ok = T; - type Error = E; - - fn into_result(self) -> Self { self } - fn from_ok(v: Self::Ok) -> Self { Ok(v) } - default fn from_error(e: Self::Error) -> Self { Err(e) } -// ^~~~~~~ -} - -// my_crate: -impl Try for Result { - #[implicit_caller_location] - fn from_error(mut e: Self::Error) -> Self { - e.call_stack.push(Location::caller()); - Err(e) +use std::collections::HashMap; +use std::hash::Hash; + +fn count_slices(array: &[T], window: usize) -> HashMap<&[T], usize> { + if !(0 < window && window <= array.len()) { + panic!("invalid window size"); + // ^ triggering this panic is "your fault" + } + let mut result = HashMap::new(); + for w in array.windows(window) { + if let Some(r) = result.get_mut(w) { + *r += 1; + } else { + panic!("why??"); + // ^ triggering this panic is "my fault" + // (yes this code is wrong and entry API should be used) + } } + + result } ``` +One simple solution is to separate the "my fault" panic and "your fault" panic into two, but since +[declarative macro 1.0 is insta-stable][insta-stable], this RFC would prefer to postpone introducing +any new public macros until "Macros 2.0" lands, where stability and scoping are better handled. + +For comparison, the Swift language does +[distinguish between the two kinds of panics semantically][swift-panics]. The "your fault" ones are +called `precondition`, while the "my fault" ones are called `assert`, though they don't deal with +caller location, and practically they are equivalent to Rust's `assert!` and `debug_assert!`. +Nevertheless, this also suggests we can still separate existing panicking macros into the "my fault" +and "your fault" camps accordingly: +* Definitely "my fault" (use actual location): `debug_assert!` and friends, `unreachable!`, + `unimplemented!` +* Probably "your fault" (propagate caller location): `assert!` and friends, `panic!` + +The question is, should calling `unwrap()`, `expect()` and `x[y]` (`index()`) be "my fault" or "your +fault"? Let's consider existing implementation of `index()` methods: +```rust +// Vec::index +fn index(&self, index: usize) -> &T { + &(**self)[index] +} + +// BTreeMap::index +fn index(&self, key: &Q) -> &V { + self.get(key).expect("no entry found for key") +} + +// Wtf8::index +fn index(&self, range: ops::RangeFrom) -> &Wtf8 { + // is_code_point_boundary checks that the index is in [0, .len()] + if is_code_point_boundary(self, range.start) { + unsafe { slice_unchecked(self, range.start, self.len()) } + } else { + slice_error_fail(self, range.start, self.len()) + } +} +``` + +If they all get `#[blame_caller]`, the `x[y]`, `expect()` and `slice_error_fail()` should all report +"your fault", i.e. caller location should be propagated downstream. It does mean that the current +default of caller-location-propagation-by-default is more common. This also means "my fault" +happening during development may become harder to spot. This can be solved using `RUST_BACKTRACE=1`, +or workaround by splitting into two functions: + +```rust +use std::collections::HashMap; +use std::hash::Hash; + +#[blame_caller] +fn count_slices(array: &[T], window: usize) -> HashMap<&[T], usize> { + if !(0 < window && window <= array.len()) { + panic!("invalid window size"); // <-- your fault + } + (|| { + let mut result = HashMap::new(); + for w in array.windows(window) { + if let Some(r) = result.get_mut(w) { + *r += 1; + } else { + panic!("why??"); // <-- my fault (caller propagation can't go into closures) + } + } + result + })() +} +``` + +Anyway, treating everything as "your fault" will encourage that `#[blame_caller]` functions should +be short, which goes in line with the ["must have" list](#survey-of-panicking-standard-functions) in +the RFC. Thus the RFC will remain advocating for propagating caller location implicitly. + +[insta-stable]: https://github.com/rust-lang/rust/pull/39229#issuecomment-274348420 +[swift-panics]: https://stackoverflow.com/questions/29673027/difference-between-precondition-and-assert-in-swift + ## Location detail control An unstable flag `-Z location-detail` is added to `rustc` to control how much factual detail will @@ -717,8 +797,8 @@ One can use `-Z location-detail` to get the old optimization behavior. ## Narrow solution scope -`#[implicit_caller_location]` is only useful in solving the "get caller location" problem. -Introducing an entirely new feature just for this problem seems wasteful. +`#[blame_caller]` is only useful in solving the "get caller location" problem. Introducing an +entirely new feature just for this problem seems wasteful. [Default function arguments](#default-function-arguments) is another possible solution for this problem but with much wider application. @@ -729,7 +809,7 @@ Consts, statics and closures are separate MIR items, meaning the following marke get caller locations: ```rust -#[implicit_caller_location] +#[blame_caller] fn foo() { static S: Location = Location::caller(); // will get actual location instead let f = || Location::caller(); // will get actual location instead @@ -740,8 +820,7 @@ fn foo() { This is confusing, but if we don't support this, we will need two `panic!` macros which is not a better solution. -Clippy could provide a lint against using `Location::caller()` outside of -`#[implicit_caller_location]`. +Clippy could provide a lint against using `Location::caller()` outside of `#[blame_caller]`. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -768,17 +847,17 @@ This RFC tries to abide by the following restrictions: location. Restriction 4 "interface independence" is currently not implemented due to lack of -post-monomorphized MIR pass, but implementing `#[implicit_caller_location]` as a language feature -follows this restriction. +post-monomorphized MIR pass, but implementing `#[blame_caller]` as a language feature follows this +restriction. ## Alternatives ### 🚲 Name of everything 🚲 -* Is `#[implicit_caller_location]` an accurate description? +* Is `#[blame_caller]` an accurate description? * Should we move `std::panic::Location` into `core`, or just use a 3-tuple to represent the location? Note that the former is advocated in [RFC 2070]. -* Should the caller location propagation be implicit (currently) or explicit? +* Is `Location::caller()` properly named? ### Using an ABI instead of an attribute @@ -805,7 +884,7 @@ parameter, making it less competitive than just using an attribute. We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to real constants after redirection (a MIR or trans pass) instead of early during macro expansion (an -AST pass). Inside `#[implicit_caller_location]` functions, these macros behave as this RFC's +AST pass). Inside `#[blame_caller]` functions, these macros behave as this RFC's `caller_location()`. The drawback is using these macro will have different values at compile time (e.g. inside `include!(file!())`) vs. runtime. @@ -815,7 +894,7 @@ Introduced as an [alternative to RFC 1669][inline_mir], instead of the `caller_l we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: ```rust -#[implicit_caller_location] +#[blame_caller] fn unwrap(self) -> T { let file: &'static str; let line: u32; @@ -839,8 +918,8 @@ fn unwrap(self) -> T { The problem of `mir!` in this context is trying to kill a fly with a sledgehammer. `mir!` is a very generic mechanism which requires stabilizing the MIR syntax and considering the interaction with -the surrounding code. Besides, `#[implicit_caller_location]` itself still exists and the magic -constants `$CallerFile` etc are still magic. +the surrounding code. Besides, `#[blame_caller]` itself still exists and the magic constants +`$CallerFile` etc are still magical. ### Default function arguments @@ -889,10 +968,10 @@ this feature itself is going to be large and controversial. ### Semantic inlining -Treat `#[implicit_caller_location]` as the same as a very forceful `#[inline(always)]`. This -eliminates the procedural macro pass. This was the approach suggested in the first edition of this -RFC, since the target functions (`unwrap`, `expect`, `index`) are just a few lines long. However, it -experienced push-back from the community as: +Treat `#[blame_caller]` as the same as a very forceful `#[inline(always)]`. This eliminates the +procedural macro pass. This was the approach suggested in the first edition of this RFC, since the +target functions (`unwrap`, `expect`, `index`) are just a few lines long. However, it experienced +push-back from the community as: 1. Inlining causes debugging to be difficult. 2. It does not work with recursive functions. @@ -903,6 +982,64 @@ experienced push-back from the community as: Therefore the RFC is changed to the current form, and the inlining pass is now described as just an implementation detail. +### Design-by-contract + +This is inspired when investigating the difference in +["my fault" vs "your fault"](#my-fault-vs-your-fault). We incorporate ideas from [design-by-contract] +(DbC) by specifying that "your fault" is a kind of contract violation. Preconditions are listed as +part of the function signature, e.g. + +```rust +// declaration +extern { + #[precondition(fd >= 0, "invalid file descriptor {}", fd)] + fn close_fd(fd: c_int); +} + +// declaration + defintion +#[precondition(option.is_some(), "Trying to unwrap None")] +fn unwrap(option: Option) -> T { + match option { + Some(t) => t, + None => unsafe { std::mem::unchecked_unreachable() }, + } +} +``` + +Code that appears in the `#[precondition]` attribute should be copied to caller site, so when the +precondition is violated, they can get the caller's location. + +Specialization should be treated like subtyping, where preconditions can be *weakened*: + +```rust +trait Foo { + #[precondition(condition_1)] + fn foo(); +} + +impl Foo for T { + #[precondition(condition_2a)] + #[precondition(condition_2b)] + default fn foo() { ... } +} + +impl Foo for u32 { + #[precondition(condition_3)] + fn foo() { ... } +} + +assert!(condition_3 || (condition_2a && condition_2b) || condition_1); +// ^ automatically inserted when the following is called... +::foo(); +``` + +Before Rust 1.0, there was the [`hoare`] compiler plugin which introduces DbC using the similar +syntax. However, the conditions are expanded inside the function, so the assertions will not fail +with the caller's location. A proper solution will be similar to what this RFC proposes. + +[design-by-contract]: https://en.wikipedia.org/wiki/Design_by_contract +[`hoare`]: https://crates.io/crates/hoare + ## Non-viable alternatives Many alternatives have been proposed before but failed to satisfy the restrictions laid out in the @@ -984,7 +1121,9 @@ The same drawback exists if we base the solution on [RFC 2000] (*const generics # Unresolved questions [unresolved]: #unresolved-questions -* Redirection pass should run after monomorphization, not before. +* If we want to support adding `#[blame_caller]` to trait methods, the redirection + pass/query/whatever should be placed after monomorphization, not before. Currently the RFC + simply prohibit applying `#[blame_caller]` to trait methods as a future-proofing measure. * Diverging functions should be supported. From 3ddca9d864107c913f6a9907ca01e01a36f3263e Mon Sep 17 00:00:00 2001 From: kennytm Date: Sat, 16 Sep 2017 16:46:11 +0800 Subject: [PATCH 09/12] Addressed pnkfelix's comments. --- text/0000-inline-semantic.md | 47 ++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index a04ca8129b4..e6c1686ae53 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -270,9 +270,8 @@ get the new behavior with just one recompile. ## Location type -Let's enhance `my_unwrap` to also log a message to the log file before panicking. We would -need to get the caller's location as a value. This is supported using the method -`Location::caller()`: +Let's enhance `my_unwrap` to also log a message to the log file before panicking. We would need to +get the caller's location as a value. This is supported using the method `Location::caller()`: ```rust use std::panic::Location; @@ -289,6 +288,37 @@ pub fn my_unwrap(input: Option) -> T { } ``` +## Propagation of blame + +When your `#[blame_caller]` function calls another `#[blame_caller]` function, the caller location +will be propagated downwards: + +```rust +use std::panic::Location; +#[blame_caller] +pub fn my_get_index(input: &[T], index: usize) -> &T { + my_unwrap(input.get(index)) // line 4 +} +indirectly_unwrap(None); // line 6 +``` + +When you run this, the panic will refer to line 6, the original caller, instead of line 4 where +`my_get_index` calls `my_unwrap`. When a library function is marked `#[blame_caller]`, it is +expected the function is short, and does not have any logic errors. This allows us to always blame +the caller on failure. + +If a panic that refers to the local location is actually needed, you may workaround by wrapping the +code in a closure which cannot blame the caller: + +```rust +#[blame_caller] +pub fn my_get_index(input: &[T], index: usize) -> &T { + (|| { + my_unwrap(input.get(index)) + })() +} +``` + ## Why do we use implicit caller location If you are learning Rust alongside other languages, you may wonder why Rust obtains the caller @@ -493,6 +523,10 @@ bloat the binary size. The intrinsic `caller_location()` is a placeholder which will be replaced by the actual caller location when one calls `foo::{{closure}}` directly. +Currently the `foo::{{closure}}` cannot inherit attributes defined on the main function. To prevent +problems regarding ABI, using `#[naked]` or `extern "ABI"` together with +`#[rustc_implicit_caller_location]` should raise an error. + ## Redirection (MIR inlining) After all type-checking and validation is done, we can now inject the caller location. This is done @@ -1128,8 +1162,11 @@ The same drawback exists if we base the solution on [RFC 2000] (*const generics * Diverging functions should be supported. * The closure `foo::{{closure}}` should inherit most attributes applied to the function `foo`, in - particular `#[inline]`, `#[cold]` and `#[naked]`. Currently a procedural macro won't see any of - these, nor would there be anyway to apply these attributes to a closure. + particular `#[inline]`, `#[cold]`, `#[naked]` and also the ABI. Currently a procedural macro + won't see any of these, nor would there be anyway to apply these attributes to a closure. + Therefore, `#[rustc_implicit_caller_location]` currently will reject `#[naked]` and ABI, and + leaving `#[inline]` and `#[cold]` mean no-op. There is no semantic reason why these cannot be + used though. [RFC 1669]: https://github.com/rust-lang/rfcs/pull/1669 [24346]: https://github.com/rust-lang/rust/issues/24346 From ffea16fa2ebffd06bdc73c6490de7962947cc0e3 Mon Sep 17 00:00:00 2001 From: kennytm Date: Wed, 20 Sep 2017 18:11:19 +0800 Subject: [PATCH 10/12] Link to RFC #2154. --- text/0000-inline-semantic.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index e6c1686ae53..b488bc18adc 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -1133,6 +1133,8 @@ For Rust, however: These drawbacks are the main reason why restriction 3 "debug-info independence" is added to the motivation. +(A debuginfo-based stack trace proposal can be found at [RFC 2154].) + ### `SourceContext` generic parameter Introduced as an [alternative in RFC 1669][source_context], inspired by GHC's implicit parameter: @@ -1178,6 +1180,7 @@ The same drawback exists if we base the solution on [RFC 2000] (*const generics [RFC 2000]: https://github.com/rust-lang/rfcs/pull/2000 [40264]: https://github.com/rust-lang/rust/issues/40264 [RFC 1417]: https://github.com/rust-lang/rfcs/issues/1417 +[RFC 2154]: https://github.com/rust-lang/rfcs/pull/2154 [a]: https://internals.rust-lang.org/t/rfrfc-better-option-result-error-messages/2904 [b]: https://internals.rust-lang.org/t/line-info-for-unwrap-expect/3753 From 97747b3df243d8e12a63a4babe95f1021efc7b97 Mon Sep 17 00:00:00 2001 From: kennytm Date: Sun, 28 Jan 2018 01:33:46 +0800 Subject: [PATCH 11/12] s/blame/track/g --- text/0000-inline-semantic.md | 87 ++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 43 deletions(-) diff --git a/text/0000-inline-semantic.md b/text/0000-inline-semantic.md index b488bc18adc..6e5b8c08c6e 100644 --- a/text/0000-inline-semantic.md +++ b/text/0000-inline-semantic.md @@ -1,4 +1,4 @@ -- Feature Name: `blame_caller` +- Feature Name: `track_caller` - Start Date: 2017-07-31 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -11,7 +11,7 @@ Enable accurate caller location reporting during panic in `{Option, Result}::{unwrap, expect}` with the following changes: -1. Support the `#[blame_caller]` function attribute, which guarantees a function has access to the +1. Support the `#[track_caller]` function attribute, which guarantees a function has access to the caller information. 2. Add an intrinsic function `caller_location()` (safe wrapper: `Location::caller()`) to retrieve the caller's source location. @@ -19,10 +19,10 @@ the following changes: Example: ```rust -#![feature(blame_caller)] +#![feature(track_caller)] use std::panic::Location; -#[blame_caller] +#[track_caller] fn unwrap(self) -> T { panic!("{}: oh no", Location::caller()); } @@ -37,8 +37,9 @@ let m = n.unwrap(); - [Motivation](#motivation) - [Guide-level explanation](#guide-level-explanation) - [Let's reimplement `unwrap()`](#lets-reimplement-unwrap) - - [Blame the caller](#blame-the-caller) + - [Track the caller](#track-the-caller) - [Location type](#location-type) + - [Propagation of tracker](#propagation-of-tracker) - [Why do we use implicit caller location](#why-do-we-use-implicit-caller-location) - [Reference-level explanation](#reference-level-explanation) - [Survey of panicking standard functions](#survey-of-panicking-standard-functions) @@ -199,7 +200,7 @@ println!("args[1] = {}", my_unwrap!(args().nth(1))); What if you have already published the `my_unwrap` crate that has thousands of users, and you want to maintain API stability? Before Rust 1.XX, the builtin `unwrap()` had the same problem! -## Blame the caller +## Track the caller The reason the `my_unwrap!` macro works is because it copy-and-pastes the entire content of its macro definition every time it is used. @@ -217,13 +218,13 @@ println!("args[1] = {}", my_unwrap(args().nth(2), file!(), line!(), column!())); ``` What if we could instruct the compiler to automatically fill in the file, line, and column? -Rust 1.YY introduced the `#[blame_caller]` attribute for exactly this reason: +Rust 1.YY introduced the `#[track_caller]` attribute for exactly this reason: ```rust // 3.rs -#![feature(blame_caller)] +#![feature(track_caller)] use std::env::args; -#[blame_caller] // <-- Just add this! +#[track_caller] // <-- Just add this! pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, @@ -261,7 +262,7 @@ args[2] = arg2 args[3] = arg3 ``` -`#[blame_caller]` is an automated version of what you've seen in the last section. The attribute +`#[track_caller]` is an automated version of what you've seen in the last section. The attribute copies `my_unwrap` to a new function `my_unwrap_at_source_location` which accepts the caller's location as an additional argument. The attribute also instructs the compiler to replace `my_unwrap(x)` with `my_unwrap_at_source_location(x, file!(), line!(), column!())` (sort of) @@ -275,7 +276,7 @@ get the caller's location as a value. This is supported using the method `Locati ```rust use std::panic::Location; -#[blame_caller] +#[track_caller] pub fn my_unwrap(input: Option) -> T { match input { Some(t) => t, @@ -288,14 +289,14 @@ pub fn my_unwrap(input: Option) -> T { } ``` -## Propagation of blame +## Propagation of tracker -When your `#[blame_caller]` function calls another `#[blame_caller]` function, the caller location +When your `#[track_caller]` function calls another `#[track_caller]` function, the caller location will be propagated downwards: ```rust use std::panic::Location; -#[blame_caller] +#[track_caller] pub fn my_get_index(input: &[T], index: usize) -> &T { my_unwrap(input.get(index)) // line 4 } @@ -303,15 +304,15 @@ indirectly_unwrap(None); // line 6 ``` When you run this, the panic will refer to line 6, the original caller, instead of line 4 where -`my_get_index` calls `my_unwrap`. When a library function is marked `#[blame_caller]`, it is -expected the function is short, and does not have any logic errors. This allows us to always blame +`my_get_index` calls `my_unwrap`. When a library function is marked `#[track_caller]`, it is +expected the function is short, and does not have any logic errors. This allows us to always track the caller on failure. If a panic that refers to the local location is actually needed, you may workaround by wrapping the -code in a closure which cannot blame the caller: +code in a closure which cannot track the caller: ```rust -#[blame_caller] +#[track_caller] pub fn my_get_index(input: &[T], index: usize) -> &T { (|| { my_unwrap(input.get(index)) @@ -477,20 +478,20 @@ are included. -This RFC only advocates adding the `#[blame_caller]` attribute to the `unwrap` and `expect` +This RFC only advocates adding the `#[track_caller]` attribute to the `unwrap` and `expect` functions. The `index` and `index_mut` functions should also have it if possible, but this is currently postponed as it is not investigated yet how to insert the transformation after monomorphization. ## Procedural attribute macro -The `#[blame_caller]` attribute will modify a function at the AST and MIR levels without touching +The `#[track_caller]` attribute will modify a function at the AST and MIR levels without touching the type-checking (HIR level) or the low-level LLVM passes. It will first wrap the body of the function in a closure, and then call it: ```rust -#[blame_caller] +#[track_caller] fn foo(x: A, y: B, z: C) -> R { bar(x, y) } @@ -578,7 +579,7 @@ currently suffers from all disadvantages of the MIR inliner, namely: * Locations will not be propagated into diverging functions (`fn() -> !`), since inlining them is not supported yet. -* MIR passes are run *before* monomorphization, meaning `#[blame_caller]` currently **cannot** be +* MIR passes are run *before* monomorphization, meaning `#[track_caller]` currently **cannot** be used on trait items: ```rust @@ -586,7 +587,7 @@ trait Trait { fn unwrap(&self); } impl Trait for u64 { - #[blame_caller] //~ ERROR: `#[blame_caller]` is not supported for trait items yet. + #[track_caller] //~ ERROR: `#[track_caller]` is not supported for trait items yet. fn unwrap(&self) {} } ``` @@ -594,7 +595,7 @@ impl Trait for u64 { To support trait items, the redirection pass must be run as post-monomorphized MIR pass (which does not exist yet), or converted to queries provided after resolve, or a custom LLVM inlining pass which can extract the caller's source location. This prevents the `Index` trait from having -`#[blame_caller]` yet. +`#[track_caller]` yet. We cannot hack the impl resolution method into pre-monomorphization MIR pass because of deeply nested functions like @@ -638,12 +639,12 @@ column of the callsite. This shares the same structure as the existing type `std Therefore, the type is promoted to a lang-item, and moved into `core::panicking::Location`. It is re-exported from `libstd`. -Thanks to how `#[blame_caller]` is implemented, we could provide a safe wrapper around the +Thanks to how `#[track_caller]` is implemented, we could provide a safe wrapper around the `caller_location()` intrinsic: ```rust impl<'a> Location<'a> { - #[blame_caller] + #[track_caller] pub fn caller() -> Location<'static> { unsafe { ::intrinsics::caller_location() @@ -653,7 +654,7 @@ impl<'a> Location<'a> { ``` The `panic!` macro is modified to use `Location::caller()` (or the intrinsic directly) so it can -report the caller location inside `#[blame_caller]`. +report the caller location inside `#[track_caller]`. ```rust macro_rules! panic { @@ -669,14 +670,14 @@ Actually this is now more natural for `core::panicking::panic_fmt` to take `Loca instead of tuples, so one should consider changing their signature, but this is out-of-scope for this RFC. -`panic!` is often used outside of `#[blame_caller]` functions. In those cases, the +`panic!` is often used outside of `#[track_caller]` functions. In those cases, the `caller_location()` intrinsic will pass unchanged through all MIR passes into trans. As a fallback, the intrinsic will expand to `Location { file: file!(), line: line!(), col: column!() }` during trans. ## “My fault” vs “Your fault” -In a `#[blame_caller]` function, we expect all panics being attributed to the caller (thus the +In a `#[track_caller]` function, we expect all panics being attributed to the caller (thus the attribute name). However, sometimes the code panics not due to the caller, but the implementation itself. It may be important to distinguish between "my fault" (implementation error) and "your fault" (caller violating API requirement). As an example, @@ -743,7 +744,7 @@ fn index(&self, range: ops::RangeFrom) -> &Wtf8 { } ``` -If they all get `#[blame_caller]`, the `x[y]`, `expect()` and `slice_error_fail()` should all report +If they all get `#[track_caller]`, the `x[y]`, `expect()` and `slice_error_fail()` should all report "your fault", i.e. caller location should be propagated downstream. It does mean that the current default of caller-location-propagation-by-default is more common. This also means "my fault" happening during development may become harder to spot. This can be solved using `RUST_BACKTRACE=1`, @@ -753,7 +754,7 @@ or workaround by splitting into two functions: use std::collections::HashMap; use std::hash::Hash; -#[blame_caller] +#[track_caller] fn count_slices(array: &[T], window: usize) -> HashMap<&[T], usize> { if !(0 < window && window <= array.len()) { panic!("invalid window size"); // <-- your fault @@ -772,7 +773,7 @@ fn count_slices(array: &[T], window: usize) -> HashMap<&[T], usize } ``` -Anyway, treating everything as "your fault" will encourage that `#[blame_caller]` functions should +Anyway, treating everything as "your fault" will encourage that `#[track_caller]` functions should be short, which goes in line with the ["must have" list](#survey-of-panicking-standard-functions) in the RFC. Thus the RFC will remain advocating for propagating caller location implicitly. @@ -831,7 +832,7 @@ One can use `-Z location-detail` to get the old optimization behavior. ## Narrow solution scope -`#[blame_caller]` is only useful in solving the "get caller location" problem. Introducing an +`#[track_caller]` is only useful in solving the "get caller location" problem. Introducing an entirely new feature just for this problem seems wasteful. [Default function arguments](#default-function-arguments) is another possible solution for this @@ -843,7 +844,7 @@ Consts, statics and closures are separate MIR items, meaning the following marke get caller locations: ```rust -#[blame_caller] +#[track_caller] fn foo() { static S: Location = Location::caller(); // will get actual location instead let f = || Location::caller(); // will get actual location instead @@ -854,7 +855,7 @@ fn foo() { This is confusing, but if we don't support this, we will need two `panic!` macros which is not a better solution. -Clippy could provide a lint against using `Location::caller()` outside of `#[blame_caller]`. +Clippy could provide a lint against using `Location::caller()` outside of `#[track_caller]`. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -881,14 +882,14 @@ This RFC tries to abide by the following restrictions: location. Restriction 4 "interface independence" is currently not implemented due to lack of -post-monomorphized MIR pass, but implementing `#[blame_caller]` as a language feature follows this +post-monomorphized MIR pass, but implementing `#[track_caller]` as a language feature follows this restriction. ## Alternatives ### 🚲 Name of everything 🚲 -* Is `#[blame_caller]` an accurate description? +* Is `#[track_caller]` an accurate description? * Should we move `std::panic::Location` into `core`, or just use a 3-tuple to represent the location? Note that the former is advocated in [RFC 2070]. * Is `Location::caller()` properly named? @@ -918,7 +919,7 @@ parameter, making it less competitive than just using an attribute. We could change the meaning of `file!()`, `line!()` and `column!()` so they are only converted to real constants after redirection (a MIR or trans pass) instead of early during macro expansion (an -AST pass). Inside `#[blame_caller]` functions, these macros behave as this RFC's +AST pass). Inside `#[track_caller]` functions, these macros behave as this RFC's `caller_location()`. The drawback is using these macro will have different values at compile time (e.g. inside `include!(file!())`) vs. runtime. @@ -928,7 +929,7 @@ Introduced as an [alternative to RFC 1669][inline_mir], instead of the `caller_l we could provide a full-fledged inline MIR macro `mir!` similar to the inline assembler: ```rust -#[blame_caller] +#[track_caller] fn unwrap(self) -> T { let file: &'static str; let line: u32; @@ -952,7 +953,7 @@ fn unwrap(self) -> T { The problem of `mir!` in this context is trying to kill a fly with a sledgehammer. `mir!` is a very generic mechanism which requires stabilizing the MIR syntax and considering the interaction with -the surrounding code. Besides, `#[blame_caller]` itself still exists and the magic constants +the surrounding code. Besides, `#[track_caller]` itself still exists and the magic constants `$CallerFile` etc are still magical. ### Default function arguments @@ -1002,7 +1003,7 @@ this feature itself is going to be large and controversial. ### Semantic inlining -Treat `#[blame_caller]` as the same as a very forceful `#[inline(always)]`. This eliminates the +Treat `#[track_caller]` as the same as a very forceful `#[inline(always)]`. This eliminates the procedural macro pass. This was the approach suggested in the first edition of this RFC, since the target functions (`unwrap`, `expect`, `index`) are just a few lines long. However, it experienced push-back from the community as: @@ -1157,9 +1158,9 @@ The same drawback exists if we base the solution on [RFC 2000] (*const generics # Unresolved questions [unresolved]: #unresolved-questions -* If we want to support adding `#[blame_caller]` to trait methods, the redirection +* If we want to support adding `#[track_caller]` to trait methods, the redirection pass/query/whatever should be placed after monomorphization, not before. Currently the RFC - simply prohibit applying `#[blame_caller]` to trait methods as a future-proofing measure. + simply prohibit applying `#[track_caller]` to trait methods as a future-proofing measure. * Diverging functions should be supported. From dd6febab6c0c385a1b50045996f130db8f61a8bf Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Sat, 27 Jan 2018 09:46:37 -0800 Subject: [PATCH 12/12] RFC 2091: implicit caller location --- text/{0000-inline-semantic.md => 2019-inline-semantic.md} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename text/{0000-inline-semantic.md => 2019-inline-semantic.md} (99%) diff --git a/text/0000-inline-semantic.md b/text/2019-inline-semantic.md similarity index 99% rename from text/0000-inline-semantic.md rename to text/2019-inline-semantic.md index 6e5b8c08c6e..f5af59de1f7 100644 --- a/text/0000-inline-semantic.md +++ b/text/2019-inline-semantic.md @@ -1,7 +1,7 @@ - Feature Name: `track_caller` - Start Date: 2017-07-31 -- RFC PR: (leave this empty) -- Rust Issue: (leave this empty) +- RFC PR: https://github.com/rust-lang/rfcs/pull/2091 +- Rust Issue: https://github.com/rust-lang/rust/issues/47809 ----