Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Miri on Playground does not error when printing non-UTF-8 output #1309

Closed
elichai opened this issue Apr 6, 2020 · 12 comments
Closed
Labels
C-support Category: Not necessarily a bug, but someone asking for support

Comments

@elichai
Copy link
Contributor

elichai commented Apr 6, 2020

So it seems that somehow miri bypasses the the check for UTF8 validity when printing a str

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=393c2e1fe450d4a03e7a1d293d54d04e

It's a bit too late here for me to debug where it's happening in libstd right now.

@elichai elichai changed the title UB code errors on when compiled and ran but works in miri UB code errors when ran but works fine in miri Apr 6, 2020
@RalfJung RalfJung added the C-support Category: Not necessarily a bug, but someone asking for support label Apr 7, 2020
@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Hm, that is curious.

First of all, I don't think this code has UB, at least not UB that Miri currently is designed to find. The only possible UB here is "creating a non-UTF8 &str", but that is invalid data behind a referece, and as the README states:

In particular, Miri does currently not check that integers/floats are initialized or that references point to valid data.

Moreover, the UTF-8 condition is not something the compiler actually exploits currently. Also see rust-lang/unsafe-code-guidelines#78; the general consensus was that we should just remove the UTF-8 clause from the language spec and make it purely a case of library UB. (In other words, UTF-8 would be a safety invariant, but not a validity invariant.)


Now, to the error you are seeing when pressing "Run" above. I think that error is actually coming from the playground:

Execution operation failed: Output was not valid UTF-8: invalid utf-8 sequence of 1 bytes from index 167

@shepmaster, could you confirm?
If so, then the question is rather why the Miri playground integration does not get the same error. (I am not sure if I would call this a bug, but it is curious.)

@elichai did you try running this example locally with rustc, to exclude playground effects?

@RalfJung RalfJung changed the title UB code errors when ran but works fine in miri Running Miri on Playground does not error when printing non-UTF-8 output Apr 7, 2020
@elichai
Copy link
Contributor Author

elichai commented Apr 7, 2020

@elichai did you try running this example locally with rustc, to exclude playground effects?

I did not know that "playground effects" were a thing, how/what are these?
Just checked locally and you're right, I don't get any error locally.
[src/main.rs:3] s = "2"

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

I did not know that "playground effects" were a thing, how/what are these?

I did not know either. But the output looked different from when the program itself prints something, where we also get

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/playground`

so I concluded it wasn't the program printing this, but playground. Which also made more sense because the error message did not at all sound like UB, and if this was a regular assertion inside the program then Miri would also evaluate it.

@shepmaster
Copy link
Member

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Ah yes, looks like it's that (though curiously that bug seems to not affect Miri output).

Closing as duplicate of rust-lang/rust-playground#386 then.

@RalfJung RalfJung closed this as completed Apr 7, 2020
@shepmaster
Copy link
Member

There's still some difference in how Miri treats this code that seems suspicious:

% xxd miri
00000000: 5b73 7263 2f6d 6169 6e2e 7273 3a33 5d20  [src/main.rs:3]
00000010: 7320 3d20 22ef bfbd 3222 0a              s = "...2".

% xxd run
00000000: 5b73 7263 2f6d 6169 6e2e 7273 3a33 5d20  [src/main.rs:3]
00000010: 7320 3d20 228b 3222 0a                   s = ".2".

Shouldn't Miri output the same as cargo run?

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Hm, strange. Looks like "8b" turned into "ef bf bd", whatever that means.

The Miri code that does the printing is here:

let buf_cont = this.memory.read_bytes(buf, Size::from_bytes(n))?;
// We need to flush to make sure this actually appears on the screen
let res = if fd == 1 {
// Stdout is buffered, flush to make sure it appears on the screen.
// This is the write() syscall of the interpreted program, we want it
// to correspond to a write() syscall on the host -- there is no good
// in adding extra buffering here.
let res = io::stdout().write(buf_cont);
io::stdout().flush().unwrap();
res
} else {
// No need to flush, stderr is not buffered.
io::stderr().write(buf_cont)
};

This should not do anything "weird" wrt. UTF-8.

@RalfJung RalfJung reopened this Apr 7, 2020
@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

@shepmaster I cannot reproduce this locally, both are the same here.

$ MIRI_SYSROOT=~/.cache/miri/HOST CARGO_EXTRA_FLAGS="-q" ./miri run utf8.rs 2> miri-output
$ xxd miri-output 
00000000: 5b75 7466 382e 7273 3a33 5d20 7320 3d20  [utf8.rs:3] s = 
00000010: 228b 3222 0a                             ".2".
$ rustc utf8.rs && ./utf8 2> rustc-output
$ xxd rustc-output 
00000000: 5b75 7466 382e 7273 3a33 5d20 7320 3d20  [utf8.rs:3] s = 
00000010: 228b 3222 0

What are the exact commands you used to create your two files?

@shepmaster
Copy link
Member

% cargo +nightly-2020-04-07 miri 2> miri

% cargo +nightly-2020-04-07 run 2> run

% cat src/main.rs
fn main() {
    let s = unsafe { std::str::from_utf8_unchecked(&[139, 50]) };
    dbg!(s);
}

I did do some small editing in vim to remove Cargo noise from the generated files, but AFAICT nothing changed.

I am on macOS 10.15.3 (19D76).

@shepmaster
Copy link
Member

Changing to stdout has the same behavior but easier to see:

fn main() {
    let s = unsafe { std::str::from_utf8_unchecked(&[139, 50]) };
    println!("{}", s);
}
% xxd run
00000000: 8b32 0a                                  .2.

% xxd miri
00000000: efbf bd32 0a                             ...2.

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Hm indeed, I can reproduce this even when using -q (to avoid having to remove any noise). Interesting.

I will open a new issue for that though, seems unrelated to the playground problem this here was originally about. Thanks for pointing this out! I suspect it is a cargo problem, but it is still good to know.

@eddyb
Copy link
Member

eddyb commented Apr 8, 2020

Also see rust-lang/unsafe-code-guidelines#78; the general consensus was that we should just remove the UTF-8 clause from the language spec and make it purely a case of library UB. (In other words, UTF-8 would be a safety invariant, but not a validity invariant.)

That bit nerdsniped me into taking a stab at struct str([u8]);: rust-lang/rust#70911.

(I am aware that doesn't change much, and miri still checks str's UTF-8 validity, but it feels more appropriate to have str be library type)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-support Category: Not necessarily a bug, but someone asking for support
Projects
None yet
Development

No branches or pull requests

4 participants