Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement epoll_wait #2764

Merged
merged 1 commit into from
Feb 3, 2023
Merged

Implement epoll_wait #2764

merged 1 commit into from
Feb 3, 2023

Conversation

DebugSteven
Copy link
Contributor

This PR continues working on #602.

This is an implementation for sleep, though admittedly not a good one. It just does busy waiting.

src/shims/unix/linux/fd.rs Outdated Show resolved Hide resolved
bytes: &[u8],
) -> InterpResult<'tcx, io::Result<usize>> {
let v1 = self.val.get();
let v2 = v1.checked_add(u64::from_be_bytes(bytes.try_into().unwrap())).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a FIXME to handle these error cases to behave as described in the doc comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this case is hit by the test suites of a handful of crates. So far I know of aqueue 1.2.10, async-io-typed 3.0.0, terminus-store 0.20.0, quic-rpc 0.3.2, and snocat 0.8.0-alpha.1.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 28, 2023

I guess the important thing is to test that sleep doesn't sleep less than the specified time, so maybe just compare the lower end of the range?

Once/if we figure out how to do tokio tests in isolation mode, we can make the test more reliable with an upper limit

@oli-obk
Copy link
Contributor

oli-obk commented Feb 3, 2023

@bors r+

@bors
Copy link
Collaborator

bors commented Feb 3, 2023

📌 Commit e87acf1 has been approved by oli-obk

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Feb 3, 2023

⌛ Testing commit e87acf1 with merge 7fc42bf...

@bors
Copy link
Collaborator

bors commented Feb 3, 2023

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 7fc42bf to master...

@bors bors merged commit 7fc42bf into rust-lang:master Feb 3, 2023
let _epfd = epfd.as_epoll_handle()?;

// FIXME return number of events ready when scheme for marking events ready exists
Ok(Scalar::from_i32(numevents))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this tell the caller that numevents events are ready without actually initializing the buffer? That sounds pretty bad, the user will then see UB when trying to access the info in the buffer.

/// supplied buffer is less than 8 bytes, or if an attempt is
/// made to write the value 0xffffffffffffffff.
///
/// FIXME: use endianness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't land code that just silently does the wrong thing on some targets. At the very least it must ICE. Nothing is worse than if Miri pretends to be able to execute something but then executes it incorrectly. That must never happen.

// or fail with EAGAIN if the file descriptor is nonblocking.
let v2 = v1.checked_add(u64::from_be_bytes(bytes.try_into().unwrap())).unwrap();
self.val.set(v2);
assert_eq!(8, bytes.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked before converting the value to an integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap two lines above already does this check, we can remove the assert

/// ready during the requested timeout milliseconds. On failure,
/// `epoll_wait()` returns -1 and errno is set to indicate the error.
///
/// <https://man7.org/linux/man-pages/man2/epoll_wait.2.html>
Copy link
Member

@RalfJung RalfJung Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our implementation is very, very far from this documentation, right? How's that not a huge problem? If a random program calls epoll_wait (keep in mind that tokio is not the only thing that can call this function), then one of two things must happen:

  • either it behaves in some reasonable way
  • or it stops execution (ideally via throw_unsupported, or an ICE if need be)

But here we just silently ignore the timeout and a bunch of other things, so execution will just silently go wrong, right? That's not something we want Miri to ever do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all an ongoing implementation where each new test fleshes out the implementation details. I don't think it would be good to require testing and implementing the entire thing in one go, as that PR would become unreviewable.

We could hide all of it behind a -Zmiri-experimental-epoll to make it clear this is unfinished

Copy link
Member

@RalfJung RalfJung Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all an ongoing implementation where each new test fleshes out the implementation details. I don't think it would be good to require testing and implementing the entire thing in one go, as that PR would become unreviewable.

That's not what I asked for.

I think ideally, the supported part of the function is explicitly carved out, by aggressively doing throw_unsup for any parameter combination that is potentially not implemented correctly yet. That doesn't mean implementing the entire function, not at all! It could mean doing throw_unsup on literally every input, for instance. That is how we have worked so far for other shims and I think it has worked well. Wouldn't that work here as well?

I find this helps quite a bit with reviewing, since then one can immediately see which things the function intends to support, and can check if those make sense. Right now that seems to be a secret between you and @DebugSteven, and it's not even documented anywhere what the current status is. That is not great. It should be clear from the code where the gaps are, and which things are expected to work correctly. Right now I look at this and I can't see how this works correctly for any input (given that it doesn't fill the events buffer), but I can't even tell if that is a bug since I don't know for which inputs it is supposed to work.

We could hide all of it behind a -Zmiri-experimental-epoll to make it clear this is unfinished

That would be a backup plan, but from my perspective it would mean that when reviewing the PR which removes this flag, I do have to review the entire functionality of all gated shims at once, since incremental review of the previous PRs is not possible. I would prefer to avoid that.

/// stored in the counter is the largest unsigned 64-bit value
/// minus 1 (i.e., 0xfffffffffffffffe). If the addition would
/// cause the counter's value to exceed the maximum, then the
/// write either blocks until a read is performed on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mention of reads is confusing. Reads don't seem to do anything with the counter?

pub val: u32,
/// The object contains an unsigned 64-bit integer (uint64_t) counter that is maintained by the
/// kernel. This counter is initialized with the value specified in the argument initval.
pub val: Cell<u64>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what does the value mean? This documentation is rather mysterious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original POSIX documentation is not very clear either ^^ the value can mean all kinds of things depending on the usage pattern. All that the storage should know is that it's a 64 bit int

Copy link
Member

@RalfJung RalfJung Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have an example then for how the Miri-implemented shims use this. POSIX docs being bad is no excuse for our own docs being bad. ;)

///
/// FIXME: use endianness
fn write<'tcx>(
&self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change write to take &mut self and then avoid the Cell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we looked into this, and the issue is that preexisting code needs to write access memory while using the file descriptors. It would be possible, but it would make all code using writes very roundabout. Also I think other write impls already use interior mutability.

I can open a PR if you'd like to see the workarounds needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think other write impls already use interior mutability.

I don't recall that.

But if you tried the alternative and considered it worse than this, that's enough of an argument. :) Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... one write impl that "accidentally" uses interior mutability is our actual File handles. Write is implemented for &File.

The reason we can't easily switch to &mut is this piece of code:

https://github.com/rust-lang/miri/blob/master/src/shims/unix/fs.rs#L814-L819

We could handle all of this correctly, but it would require cloning somewhere, as we can't hold the mutable reference to the file descriptor while also reading from memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we're doing something rather clever there, directly copying from the machine memory to the syscall.

@@ -1,5 +1,5 @@
// Need to disable preemption to stay on the supported MVP codepath in mio.
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance -Zmiri-preemption-rate=0
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the -Zmiri-preemption-rate=0 here was premature, this is causing issues again. I'll add the flag back.

bors added a commit that referenced this pull request Apr 21, 2023
disable preemption in tokoo tests again

The comment even still says we need preemption disabled, but the flag got lost in #2764.
bors added a commit that referenced this pull request Apr 21, 2023
disable preemption in tokoo tests again

The comment even still says we need preemption disabled, but the flag got lost in #2764.
bors added a commit that referenced this pull request Apr 21, 2023
disable preemption in tokio tests again

The comment even still says we need preemption disabled, but the flag got lost in #2764.
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Apr 28, 2023
disable preemption in tokio tests again

The comment even still says we need preemption disabled, but the flag got lost in rust-lang/miri#2764.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants