Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge-triggered epoll events #3243

Merged
merged 1 commit into from
Jan 25, 2024
Merged

Edge-triggered epoll events #3243

merged 1 commit into from
Jan 25, 2024

Conversation

ppopth
Copy link
Contributor

@ppopth ppopth commented Dec 5, 2023

EDITED: I have already implemented this feature for tcp sockets, udp sockets, unix domain sockets, pipes, eventfds.

@github-actions github-actions bot added the Component: Main Composing the core Shadow executable label Dec 5, 2023
@stevenengler
Copy link
Contributor

The reason why you didn't want to integrate this as you mentioned in #2673 is that Shadow already "does the right thing", but, without this feature, I have to fork the upstream repo and convince the libp2p folks to use the forked repo instead.

Just to clarify, we would like to better follow what Linux does, but we just didn't have time to work on it (look in-depth into the Linux behaviour, consider the best way to design it, any performance implications, etc). If Shadow can better model the Linux behaviour, have a bunch of tests for it, and not significantly slow down the simulator, then I think it would be great to have this behaviour. IIRC the old PR didn't pass all of the tests, and Shadow's old C epoll code was a bit difficult to follow, so we also were more worried about introducing bugs. Now with the epoll code in rust, we can hopefully be more confident in any changes.

I'll leave a few comments that I have, but I'm not as knowledgeable about epoll so I'll let other people weigh in on this PR.

I implemented it only for eventfd and legacy tcp

I think before merging we would want this to be consistent across Shadow for all file types, unless Linux does differently. And this is where tests would be useful, to make sure all of the file types behave this way. IIRC pipes also have this edge-triggered behaviour on Linux.

src/main/host/descriptor/epoll/entry.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/mod.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/eventfd.rs Outdated Show resolved Hide resolved
@robgjansen
Copy link
Member

I totally agree with @stevenengler here. We added epoll in rust but I never got around to adding the tests to help fuzz test the epoll api. Those fuzz tests would help us make sure we are following the Linux behavior, even if the Linux behavior is not the same as what is documented in the man pages. If we have the tests that show the Linux behavior and Shadow behaves exactly the same, that will give us a strong argument to merge.

@stevenengler
Copy link
Contributor

I'd expect that this causes more state updates, resulting in the event source running more callbacks, but the current version of this PR doesn't seem to effect tor simulation runtime performance.

run_time

@ppopth
Copy link
Contributor Author

ppopth commented Dec 10, 2023

I think before merging we would want this to be consistent across Shadow for all file types, unless Linux does differently. And this is where tests would be useful, to make sure all of the file types behave this way. IIRC pipes also have this edge-triggered behaviour on Linux.

Totally agree, however, since this patch is quite hacky and we don't know yet how we should implement it. I will implement the rest when we know it, otherwise the effort will be wasted.

@ppopth
Copy link
Contributor Author

ppopth commented Dec 11, 2023

@robgjansen @stevenengler Are you okay with the FileState::INPUT_BUFFER_PARITY? I feel that it's probably not the most proper way to do, but I guess the most proper way may require a big change.

If you are okay to merge it with that, I will do all the necessary work on that to get it merged: fuzz tests, other tests, and edge-triggered event for all the file types.

My current feeling is that, if we don't want FileState::INPUT_BUFFER_PARITY and the proper way is too big, all the effort will be wasted.

Honestly, I'm not sure if the proper way will be really too big, I just feel that it will be. Let me think a moment to find a better way with small change.

@stevenengler
Copy link
Contributor

stevenengler commented Dec 13, 2023

@robgjansen @stevenengler Are you okay with the FileState::INPUT_BUFFER_PARITY? I feel that it's probably not the most proper way to do, but I guess the most proper way may require a big change.

I think my opinion is that it's okay. In that case the flag should have a comment explaining why it's done this way, and that it's an exception. Alternatively, what do you think of the design in #3250? Do you think this covers everything that you'd need to support epoll? It adds a new argument for listeners so that files can emit signals and listeners can opt to receive them. It might be overkill for this, but we might want this ability to emit other types of signals in the future as well.

@robgjansen
Copy link
Member

I would prefer proper signal listener support from #3250 over using FileState::INPUT_BUFFER_PARITY as an exceptional case. If #3250 is merged, then this PR becomes much easier to implement using the new signaling features from #3250.

@stevenengler
Copy link
Contributor

stevenengler commented Dec 19, 2023

We've now merged #3250, so it should be ready to use in this PR.

So what we're looking for before we can merge this PR is:

  • use the new signal mechanism instead of the INPUT_BUFFER_PARITY state
  • use the new edge-triggered epoll behaviour in all file types (udp, pipe, etc) and not just eventfd/tcp, as long as linux does as well (the tests should show which files do or don't)
  • does this edge-triggered behaviour occur only on the input buffer, or also the output buffer?
  • for the input buffer, does the edge-triggered behaviour occur when either reading or writing from/to the buffer? (Does reading from the buffer also trigger epoll?)
  • new syscall tests in rust that test edge-triggered epoll under these various conditions for different file types to make sure that shadow behaves the same as linux

Thanks for working on this!

Tests

We have some existing C epoll tests in src/test/epoll/test_epoll.c and Rust epoll tests in src/test/epoll/test_epoll.rs. The new tests should be written in rust, but it's up to you whether they should go in test_epoll.rs or a new file. If the tests are large, it might make sense to have these edge-triggered tests in a new file.

You can test the epoll syscalls using the libc library (ex: libc::epoll_ctl). In test_epoll.rs we used the nix library to test the syscalls, but I think that was a mistake and we should have used the libc or linux-api libraries instead. The tests are also stuck on an old version of nix see (#3141 #3265) which uses an old version of their epoll API.

For us as maintainers, we don't really know exactly how linux behaves with this edge-triggered-but-not-really-edge-triggered epoll, so the the more tests you add the easier it will be to convince us that the changes accurately follow linux and that it's good to merge.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 4, 2024

Does reading from the buffer also trigger epoll?

From the test I implemented in #3274, the answer is no in Linux. However, it's surprising that the answer is yes in the current main branch of Shadow.

Since I don't want to pay the effort fixing this and I don't know any application depending on that behaviour, I will not add such the test in this PR.

@stevenengler
Copy link
Contributor

Since I don't want to pay the effort fixing this and I don't know any application depending on that behaviour, I will not add such the test in this PR.

Thanks for making that issue and writing a test case! If you want to, you can add the test to Shadow but in the ShadowTest::new("write-then-read", ...) use set![TestEnvironment::Libc] (and leave a comment referring to #3274). This indicates that the test is supposed to pass in Shadow but doesn't, and you don't need to fix it since it will be skipped when running in Shadow.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 4, 2024

@stevenengler What is the state of SharedBuf? Is it something you still want to use in the long term?

In order to implement edge-triggered epoll for pipe and unix sockets, it seems like I have to implement BufferSignals (just like FileSignals) in addition to BufferState for SharedBuf.

If you are planning to deprecate SharedBuf, it's probably not worth the effort. If no, I will do the BufferSignals.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 6, 2024

does this edge-triggered behaviour occur only on the input buffer, or also the output buffer?

I found strange behaviour on the output buffer. This edge-triggered thing occurs on the output buffer of udp sockets, but not tcp sockets or unix datagram sockets. I think it will take too much effort doing research on this, so I will not add the tests for those cases.

@stevenengler
Copy link
Contributor

stevenengler commented Jan 7, 2024

What is the state of SharedBuf? Is it something you still want to use in the long term?

I think we still want it in the long term. We need some kind of shared buffer between two files (or sometimes more if there are multiple readers or writers).

In order to implement edge-triggered epoll for pipe and unix sockets, it seems like I have to implement BufferSignals (just like FileSignals) in addition to BufferState for SharedBuf.

If you are planning to deprecate SharedBuf, it's probably not worth the effort. If no, I will do the BufferSignals.

I forgot about the SharedBuf when adding the signals. I made #3276 to add the boilerplate code needed to support signals in SharedBuf. I don't want to conflict with the code you're working on, so let me know if you've already started adding signals to SharedBuf and if you have we can use your code instead. If not, hopefully #3276 will make this PR easier.

I found strange behaviour on the output buffer. This edge-triggered thing occurs on the output buffer of udp sockets, but not tcp sockets or unix datagram sockets. I think it will take too much effort doing research on this, so I will not add the tests for those cases.

Okay thanks for checking. I think it's okay to leave the output buffer out of the scope of this PR then. But please open an issue so we can remember that this edge-triggered behaviour doesn't follow linux for all output buffers and that it needs more investigation.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 7, 2024

I don't want to conflict with the code you're working on, so let me know if you've already started adding signals to SharedBuf and if you have we can use your code instead

Yes, I have mine already and it looks exactly like yours in #3276 (except some in unix.rs)

@ppopth
Copy link
Contributor Author

ppopth commented Jan 7, 2024

@stevenengler What is your preferred way to do write to a raw file descriptor in the tests?

  1. Use std::fs::File::from_raw_fd just like in https://stackoverflow.com/questions/54858018/how-do-i-write-to-a-specific-raw-file-descriptor-from-rust
  2. or, use libc::write just like https://github.com/shadow/shadow/blob/348bc4/src/test/pipe/test_pipe.rs#L185-L198

In test_epoll.rs, it uses nix::unistd::write.

@stevenengler
Copy link
Contributor

@stevenengler What is your preferred way to do write to a raw file descriptor in the tests?

  1. Use std::fs::File::from_raw_fd just like in https://stackoverflow.com/questions/54858018/how-do-i-write-to-a-specific-raw-file-descriptor-from-rust

  2. or, use libc::write just like https://github.com/shadow/shadow/blob/348bc4/src/test/pipe/test_pipe.rs#L185-L198

In test_epoll.rs, it uses nix::unistd::write.

The preferred is option (2).

In the tests, if it's a syscall that you're specifically testing, then use plain libc. But you can use either libc or nix for other syscalls that you're not testing. I'd avoid using std functions in tests since it's difficult to know exactly what syscalls they're making.

So in test_epoll.rs since you're testing epoll related syscalls, use libc for the epoll syscalls (libc::epoll_wait), and you can use nix or libc for other syscalls like read or write (libc::write or nix::unistd::write).

@github-actions github-actions bot added Component: Libraries Support functions like LD_PRELOAD and logging Component: Testing Unit and integration tests and frameworks Component: Build Build/install tools and dependencies labels Jan 7, 2024
@ppopth ppopth marked this pull request as ready for review January 7, 2024 14:20
@ppopth
Copy link
Contributor Author

ppopth commented Jan 7, 2024

@stevenengler it's ready for your eyes

Copy link
Contributor

@stevenengler stevenengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good! I still have a few more files to look through (unix.rs, the rust tcp files, and only quickly looked through test_epoll_edge.rs), but thought I'd leave the comments I have so far. They're all minor comments, and two additional tests I think would be good to have.

src/main/host/descriptor/socket/inet/udp.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/mod.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/epoll/entry.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/epoll/entry.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/eventfd.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/shared_buf.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/epoll/entry.rs Show resolved Hide resolved
@stevenengler stevenengler self-requested a review January 11, 2024 01:53
stevenengler added a commit that referenced this pull request Jan 12, 2024
In the old code when adding or removing data from a socket buffer, a
socket.c function would update the file state (called the "status" in
the C code), then a tcp.c function might also update the file state. For
example `tcp_receiveUserData` would call
`legacysocket_removeFromInputBuffer` which would update the state, and
then `tcp_receiveUserData` would later also update the state. This would
lead to situations where socket.c would remove the READABLE flag, and
then tcp.c would immediately add it back. While the end result is that
the flag would effectively appear unchanged, this would cause
edge-triggered epoll to think that the file's state changed twice even
though it effectively didn't, leading to #3274.

Since the TCP code is the only C socket code left, now the socket.c code
no longer updates the file state (with an exception of
`legacysocket_pullOutPacket`) and the tcp.c code is responsible for
updating the file state.

The added test was authored by @ppoth in #3274.

We should also be testing other socket/file types, but I think this will
be easier once the new `test_epoll_edge.rs` tests in #3243 are merged. I
did test udp and unix sockets manually and the new test passes for them
as well.

Closes #3274.

**Edit:** The runtime performance is equivalent, but the simulation
results change, so I want to take another look at this.
https://github.com/shadow/benchmark-results/blob/master/tor/2024-01-07-T22-11-55/README.md

**Edit 2:** I think it's expected that the simulation results would
change. Before we would remove the `READABLE` flag and then add it back
again, which would reset the epoll entry's priority each time a packet
was pushed to a TCP socket. Now the priority is reset only when the
event is collected. This means that epoll events will be processed in a
different order than they previously were.
Copy link
Contributor

@stevenengler stevenengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think the improved edge-triggered epoll and tests will be a big improvement. All of the requested changes should be small, and then there are two additional tests I think would be good to have. After this I'll do a last benchmark run and then I expect we can merge it.

  • use the new signal mechanism instead of the INPUT_BUFFER_PARITY state

Done.

  • use the new edge-triggered epoll behaviour in all file types (udp, pipe, etc) and not just eventfd/tcp, as long as linux does as well (the tests should show which files do or don't)

Done.

  • does this edge-triggered behaviour occur only on the input buffer, or also the output buffer?

@ppopth Once this is merged, can you make an issue for supporting the UDP output buffer behaviour you observed?

  • for the input buffer, does the edge-triggered behaviour occur when either reading or writing from/to the buffer? (Does reading from the buffer also trigger epoll?)

Done. There was a related bug #3274 where a partial read would trigger edge-triggered epoll in shadow but not Linux. This was fixed in #3277. This current PR contains a test that checks this for all sockets and pipes.

  • new syscall tests in rust that test edge-triggered epoll under these various conditions for different file types to make sure that shadow behaves the same as linux

Done. The new tests check tcp, udp, unix sockets (all three types), pipes (stream and packet modes), and eventfds.

src/lib/tcp/src/connection.rs Outdated Show resolved Hide resolved
src/lib/tcp/src/connection.rs Outdated Show resolved Hide resolved
src/lib/tcp/src/lib.rs Show resolved Hide resolved
src/main/host/descriptor/socket/inet/tcp.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/socket/inet/tcp.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/socket/unix.rs Outdated Show resolved Hide resolved
src/main/host/descriptor/socket/unix.rs Show resolved Hide resolved
src/main/host/descriptor/socket/unix.rs Outdated Show resolved Hide resolved
src/test/epoll/test_epoll_edge.rs Outdated Show resolved Hide resolved
src/test/epoll/test_epoll_edge.rs Outdated Show resolved Hide resolved
@stevenengler
Copy link
Contributor

stevenengler commented Jan 12, 2024

Conflicting files

src/main/host/descriptor/socket.c

In #3277 I moved the legacyfile_adjustStatus((LegacyFile*)socket, STATUS_FILE_READABLE, TRUE, 0) from socket.c to tcp.c, so you'll need to move the signal code there as well to fix the conflict.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 17, 2024

Conflicting files

src/main/host/descriptor/socket.c

In #3277 I moved the legacyfile_adjustStatus((LegacyFile*)socket, STATUS_FILE_READABLE, TRUE, 0) from socket.c to tcp.c, so you'll need to move the signal code there as well to fix the conflict.

I resolved the conflict by merging instead of rebasing so that I would not lose the history. I will rebase it later.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 18, 2024

@stevenengler it's ready for your eyes

@stevenengler
Copy link
Contributor

@stevenengler it's ready for your eyes

Looks good! Just do a rebase, then I'll approve it, run the benchmark, and merge it.

@ppopth
Copy link
Contributor Author

ppopth commented Jan 24, 2024

@stevenengler it's all done. Can you please add a milestone to this PR?

@stevenengler
Copy link
Contributor

No change to the tor benchmark. I don't think tor uses edge-triggered epoll, so this should be expected.

run_time
transfer_time_5242880 exit

@stevenengler stevenengler merged commit d957f4c into shadow:main Jan 25, 2024
23 of 24 checks passed
@stevenengler
Copy link
Contributor

Thanks @ppopth!

@ppopth
Copy link
Contributor Author

ppopth commented Feb 7, 2024

@ppopth Once this is merged, can you make an issue for supporting the UDP output buffer behaviour you observed?

The issue is created at #3295

@ppopth ppopth deleted the edge-trigger2 branch February 22, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Build Build/install tools and dependencies Component: Libraries Support functions like LD_PRELOAD and logging Component: Main Composing the core Shadow executable Component: Testing Unit and integration tests and frameworks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants