Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize packet events over local events #2522

Merged
merged 4 commits into from
Feb 22, 2023

Conversation

stevenengler
Copy link
Contributor

Previously, when the host would get the next event and there were multiple events at the same time, it would prioritize events with a lower source host ID. This means that sometimes packets would be prioritized over local events, and sometimes local events would be prioritized depending on the source host's ID relative to the current host's ID. This PR removes this inconsistency, and now packet events will always be prioritized over local events.

This PR also splits the event into separate event types: packet events (that contain a packet object) and local events (that contain a task). This means that we can take a packet object directly from a packet event, which is more flexible than running an arbitrary task closure containing the packet object.

@stevenengler stevenengler self-assigned this Nov 3, 2022
@github-actions github-actions bot added the Component: Main Composing the core Shadow executable label Nov 3, 2022
@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Base: 67.85% // Head: 67.21% // Decreases project coverage by -0.65% ⚠️

Coverage data is based on head (30da0d2) compared to base (7aa8f26).
Patch coverage: 62.02% of modified lines in pull request are covered.

❗ Current head 30da0d2 differs from pull request most recent head 5b72aab. Consider uploading reports for the commit 5b72aab to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2522      +/-   ##
==========================================
- Coverage   67.85%   67.21%   -0.65%     
==========================================
  Files         203      193      -10     
  Lines       30507    28648    -1859     
  Branches     5963     5628     -335     
==========================================
- Hits        20701    19255    -1446     
+ Misses       5113     4949     -164     
+ Partials     4693     4444     -249     
Flag Coverage Δ
tests 67.21% <62.02%> (-0.65%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/main/network/packet.rs 20.28% <0.00%> (-0.15%) ⬇️
src/main/core/work/event.rs 59.77% <59.61%> (+23.50%) ⬆️
src/main/host/host.rs 83.08% <81.81%> (+3.04%) ⬆️
src/main/core/worker.rs 76.47% <90.00%> (-0.92%) ⬇️
src/main/host/syscall/handler/time.rs 29.26% <0.00%> (-38.92%) ⬇️
src/main/host/syscall/handler/ioctl.rs 24.24% <0.00%> (-31.64%) ⬇️
src/main/host/syscall/handler/eventfd.rs 0.00% <0.00%> (-25.72%) ⬇️
src/main/host/descriptor/eventfd.rs 0.00% <0.00%> (-21.22%) ⬇️
src/main/host/syscall/handler/sched.rs 26.47% <0.00%> (-18.85%) ⬇️
src/main/network/router/mod.rs 69.44% <0.00%> (-16.56%) ⬇️
... and 118 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/main/core/work/event.rs Outdated Show resolved Hide resolved
src/main/core/work/event.rs Outdated Show resolved Hide resolved
@stevenengler
Copy link
Contributor Author

The network and simulation performance is significantly different with this change: https://github.com/shadow/benchmark-results/tree/master/tor/2022-11-07-T14-56-21

@robgjansen Any issues with this before I merge it?

run_time

transfer_time_1048576 exit

@stevenengler stevenengler force-pushed the packet-event-order branch 2 times, most recently from cbbc0f2 to 30da0d2 Compare November 18, 2022 15:28
@stevenengler
Copy link
Contributor Author

stevenengler commented Nov 18, 2022

We compared a benchmark run of this PR (2022-11-14-T17-46-27) and the previous weekly benchmark run (2022-11-12-T03-50-23) using the host heartbeat messages and the scripts in src/tools, and didn't find the network results to be significantly different.

We have found that the number of allocations and syscalls were significantly less in the PR benchmark run.

PR:

Global syscall counts: {read:744607152, epoll_wait:390993063, write:299110729, epoll_ctl:151463882, getsockopt:83843256, ioctl:82845185, sendto:74909883, recvfrom:53884213, timerfd_settime:43373158, getrandom:25515138, close:8878698, lseek:7493425, open:5025897, fstat:4628634, openat:2746635, timerfd_create:2681458, setsockopt:1999664, socket:1893475, connect:1891057, getsockname:1810735, uname:1769203, fcntl:1765211, accept:1757246, shutdown:1545519, futex:1287931, accept4:999312, brk:304482, mmap:16699, shadow_hostname_to_addr_ipv4:16651, rt_sigaction:12176, munmap:12156, mprotect:7713, getpid:2180, shadow_init_memory_manager:2180, bind:2061, listen:2061, epoll_create:1861, fchmod:1190, epoll_create1:833, flock:833, pipe2:833, sysinfo:833, clone:714, getdents64:714, rt_sigprocmask:714, set_robust_list:714, shadow_get_shm_blk:714, eventfd2:357}
Global allocated object counts: {Packet:2028866038, TaskRef:1957435879, Event:1851421582, Payload:973770742, StatusListener:220451349, SysCallCondition:212311463, Timer:162451727, LegacyDescriptor:14232794, RegularFile:7779072, TCP:3766319, TimerFd:2681458, Futex:212057, Epoll:5588, ManagedThread:2894, SysCallHandler:2894, Process:2180, NetworkInterface:1742, FutexTable:871, HostCInternal:871, Router:871, UDP:357}

Weekly of main branch:

Global syscall counts: {read:1232331835, epoll_wait:795098338, write:302624449, epoll_ctl:255307445, recvfrom:197039394, getsockopt:85155365, ioctl:84157468, sendto:82957731, timerfd_settime:44315297, getrandom:25595377, close:8885211, lseek:7488831, open:5040479, fstat:4628808, openat:2746662, timerfd_create:2681484, setsockopt:1999283, socket:1893294, connect:1890876, getsockname:1810636, uname:1769222, fcntl:1765249, accept:1757359, shutdown:1545604, futex:1286445, accept4:999112, brk:299065, mmap:16833, shadow_hostname_to_addr_ipv4:16657, munmap:12277, rt_sigaction:12176, mprotect:7510, getpid:2180, shadow_init_memory_manager:2180, bind:2061, listen:2061, epoll_create:1861, fchmod:1190, epoll_create1:833, flock:833, pipe2:833, sysinfo:833, clone:714, getdents64:714, rt_sigprocmask:714, set_robust_list:714, shadow_get_shm_blk:714, eventfd2:357}
Global allocated object counts: {TaskRef:4697115261, Event:2593379194, Packet:2055394681, Payload:989337824, StatusListener:580971615, SysCallCondition:572748357, Timer:521813618, LegacyDescriptor:14247062, RegularFile:7793681, TCP:3765952, TimerFd:2681484, Futex:211922, Epoll:5588, ManagedThread:2894, SysCallHandler:2894, Process:2180, NetworkInterface:1742, FutexTable:871, HostCInternal:871, Router:871, UDP:357}

@robgjansen
Copy link
Member

With the new TGen benchmark, no change in simulated network performance:

Time to first byte

Time to last byte

But we do still get the improvement in run time.

@robgjansen
Copy link
Member

@stevenengler please go ahead and merge at your convenience :)

@stevenengler stevenengler force-pushed the packet-event-order branch 2 times, most recently from 00f7b90 to e91cb8c Compare November 28, 2022 16:36
@robgjansen robgjansen added this to In progress in Release v2.4 via automation Dec 5, 2022
@robgjansen
Copy link
Member

We lowered the TGen speed threshold in #2633, so the tgen tests should now be more resilient to minor changes in packet scheduling. (We'll want to increase the speed threshold again when we think our net stack is stable enough to warrant it.)

@stevenengler stevenengler force-pushed the packet-event-order branch 2 times, most recently from 2cc78f3 to 5ab5d55 Compare February 17, 2023 18:29
@stevenengler
Copy link
Contributor Author

Rebased and re-ran the benchmarks since a lot has changed since this PR was opened (such as the new relay and router code). The run-time performance improves significantly, and the network performance changes slightly.

Tor

1676991147_grim
1676991211_grim

Tgen

1676991043_grim
1676991091_grim

@stevenengler
Copy link
Contributor Author

@robgjansen Re-marking you as reviewer. You've already reviewed the code once, but it's been a while so letting you have another look if you want.

Copy link
Member

@robgjansen robgjansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my understanding of why we see the runtime performance improvement with this PR.

Prior to this PR, sometimes (depending on the host id) local events would be prioritized ahead of packet events that happen at the same instant, meaning that we could move a single packet event from the event queue to the router codel queue, which would cause a new local event to start the relay forward loop, which would get placed in front of other waiting packets events in the event queue. Effectively we would create a new relay forward event for every packet, and run the event to forward only one packet, and then repeat when the next packet gets added to the router codel queue again.

With this PR, all packets are consistently getting moved from the event queue to the router codel queue in one batch, and then after that we run the relay forward loop which can then forward as many packets as possible. This significantly reduces the number of event objects (and related push and pop operations) required to receive a set of packets that arrive at the same instant.

src/main/core/work/event.rs Outdated Show resolved Hide resolved
src/main/core/work/event.rs Outdated Show resolved Hide resolved
Previously, when the host would get the next event and there were multiple
events at the same time, it would prioritize events with a lower source host
ID. This means that sometimes packets would be prioritized over local events,
and sometimes local events would be prioritized depending on the source host's
ID relative to the current host's ID. This PR removes this inconsistency, and
now packet events will always be prioritized over local events.

This PR also splits the event into separate event types: packet events (that
contain a packet object) and local events (that contain a task). This means
that we can take a packet object directly from a packet event, which is more
flexible than running an arbitrary task closure containing the packet object.
@stevenengler stevenengler merged commit 8f1e55b into shadow:main Feb 22, 2023
Release v2.4 automation moved this from In progress to Done Feb 22, 2023
@stevenengler stevenengler deleted the packet-event-order branch February 22, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Main Composing the core Shadow executable
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants