New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Read::initializer in favor of ptr::freeze #58363

Open
wants to merge 11 commits into
base: master
from

Conversation

@sfackler
Copy link
Member

sfackler commented Feb 10, 2019

Read implementations should only write into the buffer passed to them,
but have the ability to read from it. Access of uninitialized memory can
easily cause UB, so there's then a question of what a user of a reader
should do to initialize buffers.

Previously, we allowed a Read implementation to promise it wouldn't look
at the contents of the buffer, which allows the user to pass
uninitialized memory to it.

Instead, this PR adds a method to "freeze" undefined bytes into
arbitrary-but-defined bytes. This is currently done via an inline
assembly directive noting the address as an output, so LLVM no longer
knows it's uninitialized. There is a proposed "freeze" operation in LLVM
itself that would do this directly, but it hasn't been fully
implemented.

Some targets don't support inline assembly, so there we instead pass the
pointer to an extern "C" function, which is similarly opaque to LLVM.

The current approach is very low level. If we stabilize, we'll probably
want to add something like slice.freeze() to make this easier to use.

r? @alexcrichton

@sfackler

This comment has been minimized.

Copy link
Member Author

sfackler commented Feb 10, 2019

This doesn't currently work on asmjs and wasm, since those don't support inline assembly. What's the right way to link an extern function to libcore? It doesn't seem like we currently do anything like that.

/// ```
#[inline]
#[unstable(feature = "ptr_freeze", issue = "0")]
pub unsafe fn freeze<T>(dst: *mut T, count: usize) {

This comment has been minimized.

@sfackler

sfackler Feb 10, 2019

Author Member

Is this the right interface? It's currently a bit weird in that we don't actually use the count. It could alternatively just take the pointer, and say that it freezes all memory reachable through it?

This comment has been minimized.

@sfackler

sfackler Feb 10, 2019

Author Member

Also, should this be unsafe in the first place? Since it's not actually modifying any of the pointed-to data, does it matter if it's valid or not?

This comment has been minimized.

@alexcrichton

alexcrichton Feb 11, 2019

Member

Since it's already "basically stable", I wonder if this should take &mut [T] and move to std::mem?

I'd naively think that it could be safe and probably should be, but I'm not an expert!

We also briefly discussed maybe only taking u8 for now? I'm not sure how useful this would be beyond u8 and other POD types

This comment has been minimized.

@RalfJung

RalfJung Feb 11, 2019

Member

I think it should take raw pointers so people don't have to create references to uninitialized data.

count being unused just comes from the fact that LLVM does not support "real" freeze, but I think this is a much better interface than "reachable from".

This comment has been minimized.

@Amanieu

Amanieu Feb 12, 2019

Contributor

Could we change this to T: ?Sized so that you can pass a slice in? Then the count parameter would no longer be necessary.

This comment has been minimized.

@rkruppe

rkruppe Feb 12, 2019

Member

AFAIK there's currently no way to form a *mut [T] without going through a reference first.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Feb 10, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:33a0c50a:start=1549838546979018656,finish=1549838547900758182,duration=921739526
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---

[00:04:00] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:00] tidy error: /checkout/src/libcore/ptr.rs:972: unexplained "```ignore" doctest; try one:
[00:04:00] 
[00:04:00] * make the test actually pass, by adding necessary imports and declarations, or
[00:04:00] * use "```text", if the code is not Rust code, or
[00:04:00] * use "```compile_fail,Ennnn", if the code is expected to fail at compile time, or
[00:04:00] * use "```should_panic", if the code is expected to fail at run time, or
[00:04:00] * use "```no_run", if the code should type-check but not necessary linkable/runnable, or
[00:04:00] * explain it like "```ignore (cannot-test-this-because-xxxx)", if the annotation cannot be avoided.
[00:04:00] 
[00:04:02] some tidy checks failed
[00:04:02] 
[00:04:02] 
[00:04:02] 
[00:04:02] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:02] 
[00:04:02] 
[00:04:02] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:02] Build completed unsuccessfully in 0:00:47
[00:04:02] Build completed unsuccessfully in 0:00:47
[00:04:02] Makefile:68: recipe for target 'tidy' failed
[00:04:02] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:16e24d55
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Feb 10 22:46:41 UTC 2019
---
travis_time:end:315bb9a3:start=1549838801911730052,finish=1549838801916853624,duration=5123572
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:2859078d
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:10c67751
travis_time:start:10c67751
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:02585b08
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Deprecate Read::initializer in favor of ptr::freeze
Read implementations should only write into the buffer passed to them,
but have the ability to read from it. Access of uninitialized memory can
easily cause UB, so there's then a question of what a user of a reader
should do to initialize buffers.

Previously, we allowed a Read implementation to promise it wouldn't look
at the contents of the buffer, which allows the user to pass
uninitialized memory to it.

Instead, this PR adds a method to "freeze" undefined bytes into
arbitrary-but-defined bytes. This is currently done via an inline
assembly directive noting the address as an output, so LLVM no longer
knows it's uninitialized. There is a proposed "freeze" operation in LLVM
itself that would do this directly, but it hasn't been fully
implemented.

Some targets don't support inline assembly, so there we instead pass the
pointer to an extern "C" function, which is similarly opaque to LLVM.

The current approach is very low level. If we stabilize, we'll probably
want to add something like `slice.freeze()` to make this easier to use.

@sfackler sfackler force-pushed the sfackler:buffer-freeze branch from d92521f to 5e0fb23 Feb 10, 2019

/// // We're passing this buffer to an arbitrary reader and aren't
/// // guaranteed they won't read from it, so freeze to avoid UB.
/// let mut buf: [u8; 4] = mem::uninitialized();
/// ptr::freeze(&mut buf, 1);

This comment has been minimized.

@alexcrichton

alexcrichton Feb 11, 2019

Member

Should this use buf.as_mut_ptr() and 4?

This comment has been minimized.

@sfackler

sfackler Feb 11, 2019

Author Member

I don't think it really matters either way - we're either freezing a single [u8; 4] value, or 4 u8 values.

This comment has been minimized.

@alexcrichton

alexcrichton Feb 12, 2019

Member

Oh sure, I just figured it was a bit odd compared to how we'd expect it to idiomatically be used

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 11, 2019

I think one thing that might be good to add here as well is a few tests that exercise ptr::freeze in either codegen or run-pass tests. We want to basically make sure that undef doesn't show up in LLVM IR I think.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Feb 11, 2019

Show resolved Hide resolved src/libcore/ptr.rs Outdated
@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 11, 2019

Cc @nagisa @rkruppe with whom I had a brief chat about this on Zulip the other day.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 11, 2019

Also, do we have some plan for how to let Miri do this? Miri can actually meaningfully take count into account. However, it would have to recognize this function as special and intercept it, or it will bail on the inline assembly. What is a good way to do that? An intrinsic? Cc @oli-obk

@Centril
Copy link
Contributor

Centril left a comment

Nits :)

/// Uninitialized memory has undefined contents, and interation with that data
/// can easily cause undefined behavior. This function "freezes" memory
/// contents, converting uninitialized memory to initialized memory with
/// arbitrary conents so that use of it is well defined.

This comment has been minimized.

@Centril

Centril Feb 11, 2019

Contributor
Suggested change Beta
/// arbitrary conents so that use of it is well defined.
/// arbitrary contents so that use of it is well defined.

This comment has been minimized.

@RalfJung

RalfJung Feb 11, 2019

Member

"arbitrary but fixed contents" might be a better formulation.

Also it might be worth noting that use is only well defined for integer type -- even with arbitrary but fixed contents, using this with bool or &T is UB.

This comment has been minimized.

@Centril

Centril Feb 11, 2019

Contributor

Also it might be worth noting that use is only well defined for integer type -- even with arbitrary but fixed contents, using this with bool or &T is UB.

That's a great point! Definitely worth noting.

This comment has been minimized.

@sfackler

sfackler Feb 11, 2019

Author Member

I have this phrased as "Every bit representation of T must be a valid value", but I don't think that's the best way of saying that. Ideas?

This comment has been minimized.

@Centril

Centril Feb 11, 2019

Contributor

I have this phrased as "Every bit representation of T must be a valid value",

Is there perhaps an auto-trait waiting to be invented for that? We could ostensibly make the API a bit safer by adding a constraint T: TheTrait...?

This comment has been minimized.

@sfackler

sfackler Feb 11, 2019

Author Member

There's been talk of this kind of thing for quite a while (pub auto trait Pod {}), but I think that'd be relevant as part of a safe interface over this specific function.

This comment has been minimized.

@RalfJung

RalfJung Feb 11, 2019

Member

I don't think this can be an auto trait -- auto traits are always implemented for fieldless enums, but this function is not sufficient to make a fieldless enum valid.

///
/// * `dst` must be [valid] for reads.
///
/// Note that even if `T` has size `0`, the pointer must be non-NULL and properly aligned.

This comment has been minimized.

@Centril

Centril Feb 11, 2019

Contributor
Suggested change Beta
/// Note that even if `T` has size `0`, the pointer must be non-NULL and properly aligned.
/// Note that even if `size_of::<T>() == 0`, the pointer must be non-NULL and properly aligned.

This comment has been minimized.

@RalfJung

RalfJung Feb 11, 2019

Member

Note that this formulation is used consistently throughout this file, so I'd prefer that if it gets changed that happens in a separate PR.

This comment has been minimized.

@Centril

Centril Feb 11, 2019

Contributor

Ah; that's a good idea; I'll see if I can remember to make such a PR... :)

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Feb 11, 2019

Either an intrinsic or a lang item. I think an intrinsic is the Right Thing here, because it would allow the compiler to choose which magic to apply. So the on some platforms the intrinsic lowering would emit assembly, on others a call to the opaque extern function.

Although I can believe that writing this with cfg in pure Rust is easier (like done by this PR). If that is the preferred way, we can just make the function a lang item and thus miri knows when it's calling it and can intercept the call.

Update src/libcore/ptr.rs
Co-Authored-By: sfackler <sfackler@gmail.com>
@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 11, 2019

This doesn't currently work on asmjs and wasm

A cursory look at the LLVM code reveals that an assembly parser exists, which would suggest that wasm does in fact support asm!. And indeed it does :)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 11, 2019

I think an intrinsic is the Right Thing here, because it would allow the compiler to choose which magic to apply.

It is also the right thing given that freeze may eventually become an LLVM instruction, but the implementation is also something that can be changed in the future. Nevertheless value is in doing things the right way the first time 'round :)

@sfackler

This comment has been minimized.

Copy link
Member Author

sfackler commented Feb 11, 2019

A cursory look at the LLVM code reveals that an assembly parser exists, which would suggest that wasm does in fact support asm!. And indeed it does :)

Oh, great! I was just guessing off the fact that test::black_box is cfg'd off on those platforms.

I'll update it to an intrinsic tonight.

@@ -946,6 +946,58 @@ pub unsafe fn write_volatile<T>(dst: *mut T, src: T) {
intrinsics::volatile_store(dst, src);
}

/// Freezes `count * size_of::<T>()` bytes of memory, converting undefined data into

This comment has been minimized.

@RalfJung

RalfJung Feb 12, 2019

Member

I think this is the first time we talk about this kind of data in the docs. I usually call it "uninitialized data" as I feel that is easier to understand. It also is further away from LLVM's undef, which is good -- Rust's "uninitialized data" is much more like poision than undef.

/// arbitrary contents so that use of it is well defined.
///
/// This function has no runtime effect; it is purely an instruction to the
/// compiler. In particular, it does not actually write anything to the memory.

This comment has been minimized.

@RalfJung

RalfJung Feb 12, 2019

Member

This function does have an effect in the "Rust abstract machine" though, not just in the compiler. And of course it has a run-time effect by inhibiting optimizations.

Maybe a comparison with a compiler fence helps? Those also clearly have an effect on runtime behavior even though they do not have a runtime effect themselves.

This comment has been minimized.

@RalfJung

RalfJung Feb 12, 2019

Member

Oh, also I think we should be clear about this counting as a write access as far as mutability and data races are concerned.

Doing this on a shared reference is UB.

/// unsafe {
/// // We're passing this buffer to an arbitrary reader and aren't
/// // guaranteed they won't read from it, so freeze to avoid UB.
/// let mut buf: [u8; 4] = mem::uninitialized();

This comment has been minimized.

@RalfJung

RalfJung Feb 12, 2019

Member

Could you add a FIXME somewhere about porting this to MaybeUninit?

This comment has been minimized.

@RalfJung

RalfJung Feb 16, 2019

Member

(This is still open, from what I can see)

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 12, 2019

Could you add to the PR description an explanation of why you want to move away from the now-deprecated scheme, or add a link to where this was documented?

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 12, 2019

The docs say

This function has no runtime effect

I think this should be clearly marked as a detail of the current implementation. The specified behavior of this function is to freeze all uninitialized memory and keep initialized memory unchanged. How that is achieved and whether this has any run-time cost is up to the implementation.

I think this is like black_box, where the programmer may only rely on it being the identity function but implementations are encouraged to use this to inhibit optimizations.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Feb 12, 2019

How does everyone feel about adding a wrapper struct akin to MaybeUninit, ManuallyDrop, and UnsafeCell? So something like this:

pub struct Frozen<T>;

impl<T> Frozen<T> {
    pub fn new(t: T) -> Frozen<T>;
    pub fn into_inner(this: Frozen<T>) -> T;
}

impl<T: ?Sized> Frozen<T> {
    pub fn from_mut(t: &mut T) -> &mut Frozen<T>;
}

impl<T: ?Sized> Deref for Frozen<T> {
    type Target = T;
}
impl<T: ?Sized> DerefMut for Frozen<T> {}
Update src/libcore/ptr.rs
Co-Authored-By: sfackler <sfackler@gmail.com>

shepmaster and others added some commits Feb 13, 2019

Update src/libcore/ptr.rs
Co-Authored-By: sfackler <sfackler@gmail.com>
Show resolved Hide resolved src/libcore/intrinsics.rs Outdated

resolved

Update src/libcore/intrinsics.rs
Co-Authored-By: sfackler <sfackler@gmail.com>

@sfackler sfackler changed the title WIP: Deprecate Read::initializer in favor of ptr::freeze Deprecate Read::initializer in favor of ptr::freeze Feb 14, 2019

/// Freezes `count * size_of::<T>()` bytes of memory, converting uninitialized data into
/// arbitrary but fixed data.
///
/// Uninitialized memory has undefined contents, and interaction with those contents

This comment has been minimized.

@RalfJung

RalfJung Feb 16, 2019

Member

In #58468, I use terminology like "uninitialized memory does not have a fixed value/content", to try and distinguish it from memory that contains unknown but fixed data. Do you think it would make sense to also use such terminology here, to explain why interactions can cause UB?

///
/// * `dst` must be [valid] for writes.
///
/// * Every bit representation of `T` must be a valid value.

This comment has been minimized.

@RalfJung

RalfJung Feb 16, 2019

Member

Why is that a precondition to freeze? freeze does not assert validity of anything.

freeze on a *mut bool is not UB. Just dereferencing the pointer later is.

This comment has been minimized.

@sfackler

sfackler Feb 16, 2019

Author Member

I thought that what you meant in this comment: #58363 (comment). If the validity requirements are less strict I can move this to be a more general warning about use of the function rather than a hard UB constraint.

This comment has been minimized.

@Centril

Centril Feb 16, 2019

Contributor

The phrasing here seems congruent with the behavior of freeze::<*mut bool>(...) tho. The value is the pointer (*mut bool), not the pointee (bool).

This comment has been minimized.

@RalfJung

RalfJung Feb 16, 2019

Member

The validity requirements come in once you actually operate on or a make a copy of a value at some type. So the following is UB:

let x = MaybeUninit::<bool>::uninitialized();
ptr::freeze(x.as_mut_ptr());
let x = x.into_initialized(); // UB

But it's not the freeze that is UB, it's the into_initialized!

This comment has been minimized.

@Centril

Centril Feb 16, 2019

Contributor

@RalfJung Oh... you're talking about *mut bool as the type of the first argument, not T == *mut bool... i.e. ptr::freeze<bool>(x.as_mut_ptr(): *mut bool) rather than ptr::freeze<*mut bool>(??: *mut *mut bool). As you have phrased it with x.into_initialized() being UB, this seems more of a post-condition?

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 16, 2019

Aside from the nits and the asm block question, this seems fine to me.

But I have one procedural question for @rust-lang/lang: IMO this is a fairly fundamental addition to the set of operations that a Rust program can perform in the Rust abstract machine, in particular considering the consequences for reading previously deallocated memory. In particular, there is a (closed) RFC by @aturon that is in direct contradiction with what this operation enables. So I'd prefer if an RFC for just the existence of freeze was opened before I r+ this -- and obviously, the RFC blocks freeze from being stabilized. Does that sound reasonable?

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 16, 2019

I think a T-libs + T-lang RFC sounds good in this case tho it doesn't have to be done before you r+.

@sfackler

This comment has been minimized.

Copy link
Member Author

sfackler commented Feb 16, 2019

Sure, I'll write up an RFC today.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 17, 2019

I am a bit surprised this is pointer-based, it sounded like LLVM's "freeze" will be an SSA operation taking a value and producing another.
But in Rust a function with that signature would require the input to be valid, so we'd have to have a "frozen" constructor instead and that doesn't work with slices.

Regardless of that, I hope there won't ever be a safe way to get a frozen uninitialized value, even of so-called "POD" types, for the information leak reasons mentioned above.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 17, 2019

Regardless of that, I hope there won't ever be a safe way to get a frozen uninitialized value, even of so-called "POD" types, for the information leak reasons mentioned above.

Well, by our definition of memory safety, the following function is sound and safe:

fn get_some_data() -> u128 { unsafe { 
  let x = MaybeUninit::uninitialized();
  ptr::freeze(x.as_mut_ptr());
  x.into_initialized()
} }
@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 17, 2019

Btw, this also implies that freeze will never be a const fn. CTFE cannot be non-deterministic.

Well I guess we could make it initialize to 0 or so, but... yuk. From a UB catching perspective, that's extremely incomplete.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 17, 2019

Well, by our definition of memory safety, the following function is sound and safe:

(This might be a distinction between having the safe function in the standard library vs. not...)

Well I guess we could make it initialize to 0 or so, but... yuk.

We could do that in CTFE but presumably not at run-time so they may give different results? and that's bad...

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 17, 2019

We could do that in CTFE but presumably not at run-time so they may give different results? and that's bad...

Yeah, there's no way we can reproduce the non-deterministic run-time behavior.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 20, 2019

@RalfJung

Btw, this also implies that freeze will never be a const fn. CTFE cannot be non-deterministic.

This seems wrong to me-- I'd expect that it'd initialize to zero or something similar.

from a UB catching perspective, that's extremely incomplete.

What sort of UB are you hoping to catch here? IIUC reading/writing freeze values is completely sound.

@eternaleye

This comment has been minimized.

Copy link
Contributor

eternaleye commented Feb 20, 2019

@cramertj: Consider if a user offsets a pointer by a frozen value. Zero is the least likely value to invoke UB, artificially hiding risk.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 20, 2019

@eternaleye If the problem is literally just with the number zero, that seems solvable by making it 42 / 24601 / funny arbitrary-but-not-random-number-of-choice.

@eternaleye

This comment has been minimized.

Copy link
Contributor

eternaleye commented Feb 20, 2019

@cramertj: The problem is that any fixed choice, precisely by failing to capture the nondeterminism, artificially hides risk. ~0 hides risk in BitAnd, and other values hide risk in other circumstances.

The source of danger is telling the compiler it's allowed to be certain, when the problem is that there may be uncertainty.

EDIT: Nasty case:

const fn foo(x: u64) -> u64 {
    ...
    ptr::freeze(...);
    ...
}

#[test]
fn check_foo() {
    assert!(foo(3) == 7)
}

This test silently became meaningless regarding runtime behavior, when the argument to foo is not a constant.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 20, 2019

This seems wrong to me-- I'd expect that it'd initialize to zero or something similar.

@cramertj That might work during CTFE, but does freeze(args...) do that, for the same args..., when executed at run-time?

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 20, 2019

@Centril From your question I take it you view referential transparency of const fns run at runtime as a goal. Can you say more about why that's important to you? Do you know of usecases that it's needed for? I'd been (perhaps mistakenly) taking it as something of a given that we'd expose const fns that wouldn't be referentially transparent at runtime.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Feb 20, 2019

Do you know of usecases that it's needed for?

One problem that I can forsee is optimizations changing the behavior of code by const evaluating a call to a const fn. If that const fn has different output at runtime than at compile-time, such an optimization will result in behavioral changes. In the worst case, an array length check could be const evaluated, while the access is not, causing a wrong check to be optimized out.

That said, we already have such things as unstable const eval features (e.g. comparing pointers or casting pointers to usize). We marked these operations as unsafe to show that it's UB to use them in ways that cause different output with the same input depending on whether the function is const evaluated or not.

So we can just mark the freeze function as unsafe in const contexts.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 20, 2019

Btw, this also implies that freeze will never be a const fn. CTFE cannot be non-deterministic.

This seems wrong to me-- I'd expect that it'd initialize to zero or something similar.

But then CTFE could yield a different result than run-time code, which is confusing at best.

If that const fn has different output at runtime than at compile-time, such an optimization will result in behavioral changes. In the worst case, an array length check could be const evaluated, while the access is not, causing a wrong check to be optimized out.

Well, as long as the behavior at compile-time is one of the possible behaviors at run-time, such an optimization is still correct. If we say "freeze is non-deterministic", and then some freeze calls use 0 for uninitialized data and others do not, that is a perfectly correct optimization.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 21, 2019

The following is what I think on const fns and determinism at run-time... if you want to discuss the subject further, let's do it somewhere else to not derail this PR too much. For now, we should not make freeze a const unsafe fn.


@Centril From your question I take it you view referential transparency of const fns run at runtime as a goal.

I think referential transparency is nice, but it is a stronger property than I think we can get away with, at least for polymorphic functions, due to the pervasiveness of &mut T in Rust. For example, we presumably want to make Iterator work in const contexts and so we must allow &mut self. See #57349 for an issue tracking &mut in const fn. In the absence of reachable &mut Ts in function arguments, referential transparency is retained and so you can at minimum ensure this for monomorphic functions. Given the loss of parametricity due to specialization, it becomes harder to ensure this for polymorphic functions, but I think you can still do so with a bound on type variables that rule out &mut T.

Instead of a type system guarantee of referential transparency, my goal is a weaker determinism property roughly as outlined by @RalfJung in their thoughts on compile-time function evaluation and type systems. Particularly, I think we should aspire to "CTFE correctness" in Ralf's terminology.

Can you say more about why that's important to you?

For the same reason as outlined by Ralf in the post and here (#58363 (comment)). It would be hugely surprising if execution is deterministic at compile-time but not at run-time. Moreover, I believe that the determinism of const fn represents an opportunity to give people tools to write more robust and maintainable software by being able to restrict power. Setting boundaries where you can divide code up into "functional cores" (pure stuff) and "imperative shells" (io) enhances local reasoning.

I don't think we can bear the complexity cost of another almost-const-fn-but-not and so const fn will have to be it if we are to have such controls. Hitherto I've also not seen use-cases that are significant enough to change anything.

Do you know of usecases that it's needed for?

One could imagine situations where relying on determinism at run-time is important.

If we move beyond const-generics and allow types to depend on run-time computations (this is not something that is on our roadmaps, but it would be sad to take steps to rule out such a long-term future...), e.g.

fn foo(n: usize) {
    let arr: [u8; dyn n] = ...;
}

we cannot say, given const fn bar(n: usize) -> usize which isn't ctfe-correct, that [u8; dyn bar(n)] is the same type as [u8; dyn bar(n)]. Moreover, it would be unsound to encode:

-- This is fine, we can already fake this today in Rust:
data (=) : a -> b -> Type where
   Refl : x = x

-- With non-determinism, `f` may give varying results for equal inputs
-- and so it would be unsound to claim that `f a = f b`.
cong : {f : t -> u} -> a = b -> f a = f b

We do not need β-reduction to be strongly normalizing for such computations to be fine if const fns are ctfe-correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment