Releases: shadow/shadow
v3.2.0
Summary
The primary changes that we've made in this release include:
- Updated our policies regarding supported platforms and contributor guidelines
- Added new documentation about performance-tuning and profiling
- Added or extended support for the
clone3
,lseek
, andalarm
syscalls - New basic support for netlink sockets (thanks @ppopth!)
- Numerous bug fixes and improvements, including to the build system, syscall handlers, the strace facility, and process spawning/forking.
More details about specific changes are below.
Documentation / policy updates:
- We've added support for Ubuntu 24.04 and Fedora 40, and dropped support for EOL versions of Fedora. See supported platforms.
- We've updated our documentation for profiling shadow simulations. See profiling.
- We've updated our contributor guide on merging pull requests to include instructions for rebasing before merging.
- We've added a page on Performance-tuning configuration options.
MAJOR changes (breaking):
- No breaking changes in this release
MINOR changes (backwards-compatible):
- Added support for the
CLONE_CLEAR_SIGHAND
flag for theclone3
syscall. - Improved shadow's "strace" output for unreadable memory accesses (#2821).
- Added some support for netlink sockets (#3198).
- Added
lseek
support for pipes (#3320). - Added support for the
alarm
syscall (#3321).
PATCH changes (bugfixes):
- On fork and fork-like invocations of
clone
, signal handlers are now correctly copied from the parent instead of reset to default (unlessCLONE_CLEAR_SIGHAND
is used) (#3284). - Fix exponential slowdown after repeated usage of the
wait4
syscall. - Fix the build system's clang version check (#3262).
- Fixed a cmake warning (#3269).
- Fixed epoll edge trigger behavior with files (#3277, fixing issue #3274).
- Fixed
--version
output (#3287). - Fixed a panic-causing race condition when logging unsupported syscall numbers (#3288).
- Fixed behaviour of edge-triggered epoll when more data arrives (#3243).
- Fixed some panics in debug builds when the
clone
syscall handler returns abnormally (#3291, fixing #3290). - Improved build time of tests (#3304).
- The build system now logs a warning when detecting golang version 1.21.x, which is incompatible with shadow's golang tests. (#3307, fixing #3267).
- Changed some syscall error cases to return
ENOTSUP
instead ofENOSYS
(#3314). - Fixed error detection and handling when spawning processes (#3344).
Full changelog since v3.1.0:
Thanks to @jtracey, @ppopth, @robgjansen, @sporksmith, and @stevenengler for their contributions to this release!
v3.1.0
Summary
The primary user-facing changes that we've made in this release include:
- New support for spawning processes inside of a Shadow simulation. Technically, this means it now supports
fork
,vfork
,execve
, and other related syscalls. - A new experimental TCP stack written in Rust, can be enabled with the experimental command-line flag
--use-new-tcp
. The stack is not yet recommended for default use because it is still missing important TCP features such as congestion control, but work on it continues. - Substantial progress in migrating the shim's C code to
no_std
Rust code. - Numerous Socket API improvements, so that Shadow more accurately follows the behaviour of Linux.
- Additional migration of C code to Rust (we're now below 20% C code remaining).
More details about specific changes are below. We also have a much more detailed writeup of many of these changes in our most recent discussion post #3187
MAJOR changes (breaking):
- No breaking changes in this release
MINOR changes (backwards-compatible):
ERROR
-level log lines are now logged tostderr
in addition tostdout
ifstdout
is not a tty butstderr
is. This helps make errors more visible in the common case thatstdout
is redirected to a log file butstderr
is not. This can currently be disabled via the (unstable) optionlog-errors-to-tty
.- Added support for subprocess creation and management.
- The
fork
syscall andfork
-like invocations of theclone
andclone3
syscalls. - Process parent pid's, process group IDs, process session IDs, and related syscalls.
- Child exit signals (e.g. SIGCHLD)
- The
execve
syscall.
- The
- Added Debian 12 (Bookworm) to our supported platforms.
- Added support for
sendmsg
,recvmsg
, andshutdown
for UDP sockets. - Added support for
MSG_TRUNC
andMSG_PEEK
asrecv
syscall argument flags for UDP sockets. - Added support for
MSG_TRUNC
as arecv
syscall return flag for UDP and Unix sockets. - Added support for the
SO_DOMAIN
,SO_PROTOCOL
, andSO_ACCEPTCONN
socket options for TCP and UDP sockets. - Added support for the
SIOCGSTAMP
ioctl for TCP and UDP sockets. - Improved the simulation run time performance when there are a large number of active sockets on a single host.
(#3238)
PATCH changes (bugfixes):
- Updated documentation and tests to reflect that shadow no longer requires
/dev/shm
to be executable. (This requirement was actually removed in v3.0.0) - Removed several incorrect libc syscall wrappers. These wrappers are a "fast path" for intercepting syscalls at the library level instead of via seccomp. The removed wrappers were for syscalls whose glibc functions have different semantics than the underlying syscall.
- Fixed a bug in
sched_getaffinity
. This bug was previously mostly latent due to an incorrectly generated libc syscall wrapper, though would have affected managed programs that made the syscall without going through libc. - Fixed #2681: shadow can now escape spin loops that use an inlined syscall instruction to make
sched_yield
syscalls. - Fixed a deadlock when the managed process calls
recv
(or similar syscalls) on a TCP or UDP socket with an invalid memory address. - Fixed a bug that would allow UDP sockets to accept packets from addresses that aren't the peer address.
- Fixed an incorrect return value from the
FIONREAD
ioctl for UDP sockets. - Fixed the behaviour of the
read
andrecv
syscalls when called with 0-length buffers. - Fixed incorrect behaviour (incorrect return value or panic) when
connect
is called on a listening unix or tcp socket.
(#3191)
Full changelog since v3.0.0:
Thanks to @stevenengler, @sporksmith, @robgjansen, @rwails for their contributions to this release!
v3.0.0
Summary
The dev team had accumulated a large set of breaking changes that would require a major version bump. In this release, we have focused on clearing our breaking changes queue and merging those improvements. Because these are breaking changes, this release has bumped our major version from 2 to 3. This release also significantly improves the runtime performance compared to Shadow 2.5.0.
Configuration format
-
Shadow no longer implicitly searches its working directory for executables to be run under the simulation. If you wish to specify a process path relative to Shadow's working directory, prefix that path with
./
. -
Shadow now supports YAML merge keys and extension fields. This allows you to combine YAML maps using the
<<
key.Example:
# an "extension field" that we use to store common host options x-host-client: &host-client bandwidth_up: 10Mbps bandwidth_down: 10Mbps hosts: client1: # merge the fields from the extension field above <<: *host-client processes: ... client2: <<: *host-client processes: ...
-
Removed the quantity options for hosts and processes. It's now recommended to use YAML anchors and merge keys instead.
Shadow 2.x:
hosts: client: quantity: 3 processes: ...
Shadow 3.x:
hosts: client1: &client processes: ... # copy all fields from 'client1' client2: *client # copy all fields from 'client1' and add additional fields client3: <<: *client ip_addr: 152.21.4.24
-
Renamed the
host_defaults
field tohost_option_defaults
and renamed the host'soptions
field tohost_options
.Shadow 2.x:
host_defaults: ... hosts: client: options: ...
Shadow 3.x:
host_option_defaults: ... hosts: client: host_options: ...
-
Removed the host
pcap_directory
configuration option and replaced it with a newpcap_enabled
option.Shadow 2.x:
hosts: client: options: pcap_directory: ./
Shadow 3.x:
hosts: client: host_options: pcap_enabled: true
-
Host names are restricted to the patterns documented in hostname(7).
-
The process
environment
configuration option now takes a map instead of a semicolon-delimited string.Shadow 2.x:
hosts: client: processes: - path: curl environment: ENV_A=1;ENV_B=foo
Shadow 3.x:
hosts: client: processes: - path: curl environment: - ENV_A: "1" - ENV_B: foo
-
The per-process option
stop_time
has been replaced withshutdown_time
. When set, the signal specified byshutdown_signal
(a new option) will be sent to the process at the specified time. While shadow previously sentSIGKILL
at a process'sstop_time
, the defaultshutdown_signal
isSIGTERM
to better support graceful shutdown.Shadow 2.x:
hosts: client: processes: - path: curl stop_time: 10s
Shadow 3.x:
hosts: client: processes: - path: curl shutdown_time: 10s shutdown_signal: SIGKILL
-
A new
expected_final_state
allows you to specify the expected state of the process at the end of the simulation. The supported states areexited
,signaled
, orrunning
. If any process is not in the correct state at the end of the simulation, Shadow will return a non-zero exit code. The defaultexpected_final_state
is exited with code 0.In Shadow 2.x the behaviour was to consider any processes which exited with code 0, OR which were still running at the end of the simulation, as a success. Shadow 3.x does not support this specific behaviour, and you must choose a single state.
Example:
hosts: server: processes: - path: nginx # we expect nginx to run until the end of the simulation expected_final_state: running
-
Added support for a
parallelism
value of 0, which allows Shadow to choose a reasonable parallelism (we currently use the number of physical cores in Shadow's affinity/cgroup). The default value forparallelism
has also been changed from 1 to 0. -
It is now an error to set a process'
shutdown_time
orstart_time
to be after the simulation'sstop_time
. -
Sub-second configuration values are now allowed for all time-related options, including
start_time
,stop_time
, etc. -
Removed and updated various experimental options including
use_shim_syscall_handler
,interface_qdisc
, anduse_extended_yaml
.
File structure
- A host's data files (files in
<data-dir>/hosts/<hostname>/
) are no longer prefixed with the hostname. For example a file that was previously namedshadow.data/hosts/server/server.curl.1000.stdout
is now namedshadow.data/hosts/server/curl.1000.stdout
. - The per-process
.exitcode
file has been removed due to its confusing semantics, and the newexpected_final_state
attribute replacing its primary use-case. - Generated pcap files are now named using their interface name instead of their IP address. For example "lo.pcap" and "eth0.pcap" instead of "127.0.0.1.pcap" and "11.0.0.1.pcap".
Performance
Shadow's scheduler is very performance-sensitive and needs to run tasks on worker threads with low latency. We added a spinloop in the scheduler that significantly improves Shadow's runtime performance. Some simulations see more than a 2x runtime performance improvement (for example 160 minutes to 47 minutes in a 5% Tor network simulation).
Supported platforms
We have removed several of our supported platforms. Specifically, we've dropped support for Ubuntu 18.04, Fedora 34/35/36, and CentOS Stream 8. We've also dropped support for Clang, and set a minimum-supported Linux kernel version of 5.4, which requires installing a backports kernel on Debian 10.
Stability guarantees
We've updated our "stability guarantees" document with the following changes:
- Updated the filenames in Shadow's host-data directories to reflect the removal of the hostname prefix.
- Added the ability to drop supported platforms in minor releases if the platforms no longer receive free updates and support from the distribution's developer.
- Shadow no longer guarantees the order in which simulated process IDs (PIDs) are assigned.
- Shadow will not change the criteria for the minimum supported Linux kernel version as documented in our supported platforms. This still allows us to increase the minimum kernel version as a result of dropping support for a platform.
Additional changes
Minor changes
- Support the
MSG_TRUNC
flag for unix sockets. #2841 - Support the
TIMER_ABSTIME
flag forclock_nanosleep
. #2854 - Removed the
--profile
,--include
, and--library
setup script options. - Added partial support for the
epoll_pwait2
syscall. - Implemented the
clone3
syscall. Thread libraries we're aware of that useclone3
were gracefully falling back toclone
, but eventually they may not do so. This also reduces noise in shadow's log about an unimplemented syscall being attempted. - Shadow no longer requires
/dev/shm
to be executable.
Bug fixes
- Fixed a memory leak of about 16 bytes per thread due to failing to unregister exited threads with a watchdog thread. This is unlikely to have been noticeable effect in typical simulations. In particular the per-thread data was already getting freed when the whole process exited, so it would only affect a process that created and terminated many threads over its lifetime.
- Simulated Processes are now reaped and deallocated after the exit, reducing run-time memory usage when processes exit over the course of the simulation. This was unlikely to have affected most users, since Shadow currently doesn't support
fork
, so any simulation has a fixed number of processes, all of which are explicitly specified in shadow's config. - Fixed a potential race condition when exiting managed threads that did not have the
clear_child_tid
attribute set. This is unlikely to have affected most software running under Shadow, since most thread APIs use this attribute. - Changed an error value in
clock_nanosleep
andnanosleep
fromENOSYS
toENOTSUP
. - A managed process that tries to call the
execve
syscall will now get an error instead of escaping the Shadow simulation. #2718 - Stopped overriding libc's
getcwd
with an incorrect wrapper that was returning-1
instead ofNULL
on errors. - A call to
epoll_ctl
with an unknown operation will returnEINVAL
. - Fixed a bug that caused Shadow to panic in some cases when a simulated thread exits. #2913
- Fixed a bug causing
host_options
to undo any changes made tohost_option_defaults
.
Full changelog
Thanks to contributions from @robgjansen,...
v3.0.0-pre
Summary
The dev team had accumulated a large set of breaking changes that would require a major version bump. In this release, we have focused on clearing our breaking changes queue and merging those improvements. Because these are breaking changes, this release has bumped our major version from 2 to 3.
This release is marked as a pre-release because, although our CI tests are passing, we haven't had as long of a testing period as we usually do. Additionally, we have some additional internal improvements we intend to make prior to the full 3.0.0 release. We believe this pre-release should be stable, but please file issues for any bugs you find. Thanks!
Primary user-facing changes since v2.5.0
MAJOR changes (breaking):
- Removed deprecated python scripts that only worked on Shadow 1.x config files and topologies.
- Shadow no longer implicitly searches its working directory for executables to be run under the simulation. If you wish to specify a path relative to Shadow's working directory, prefix that path with
./
. - Shadow now always enables support for YAML merge keys and extension fields.
The experimental configuration option that previously enabled this support,use_extended_yaml
, has been removed. - Removed the host
pcap_directory
configuration option and replaced it with a newpcap_enabled
option. - A host's data files (files in
<data-dir>/hosts/<hostname>/
) are no longer prefixed with the hostname. For example a file that was previously namedshadow.data/hosts/server/server.curl.1000.stdout
is now namedshadow.data/hosts/server/curl.1000.stdout
. - The
clang
C compiler is no longer supported. - Host names are restricted to the patterns documented in hostname(7). #2856
- The per-process option
stop_time
has been replaced withshutdown_time
. When set, the signal specified byshutdown_signal
(a new option) will be sent to the process at the specified time. While shadow previously sentSIGKILL
at a process'sstop_time
, the defaultshutdown_signal
isSIGTERM
to better support graceful shutdown. - The minimum version of
cmake
has been bumped from 3.2 to 3.13.4. - The minimum version of
glib
has been bumped from 2.32 to 2.58. - Generated pcap files are now named using their interface name instead of their IP address. For example "lo.pcap" and "eth0.pcap" instead of "127.0.0.1.pcap" and "11.0.0.1.pcap".
- The process
environment
configuration option now takes a map instead of a semicolon-delimited string. - Removed the
quantity
options for hosts and processes. It's now recommended to use YAML anchors and merge keys instead. - Renamed the
host_defaults
configuration field tohost_option_defaults
and renamed the host'soptions
field tohost_options
. - Shadow now interprets a process still running at the end of the simulation as an error by default. This can be overridden by the new per-process option
expected_final_state
. #2886 - The per-process
.exitcode
file has been removed due to its confusing semantics, and the newexpected_final_state
attribute replacing its primary use-case. #2906 - Shadow no longer guarantees the order in which simulated process IDs (PIDs) are assigned. #2908
MINOR changes (backwards-compatible):
- Support the
MSG_TRUNC
flag for unix sockets. #2841 - Support the
TIMER_ABSTIME
flag forclock_nanosleep
. #2854 - The experimental config option
use_shim_syscall_handler
has been removed. This optimization is now always enabled. - It is now an error to set a process's
stop_time
orstart_time
to be after the simulation'sstop_time
. - Sub-second configuration values are now allowed for all time-related options, including
start_time
,stop_time
, etc. - Removed the
--profile
,--include
, and--library
setup script options. - Added partial support for the
epoll_pwait2
syscall. - Enabled CPU spinning in Shadow's scheduler. This significantly improves Shadow's runtime performance, but may have higher CPU and power/battery usage. #2877
PATCH changes (bugfixes):
- Fixed a memory leak of about 16 bytes per thread due to failing to unregister exited threads with a watchdog thread. This is unlikely to
have been noticeable effect in typical simulations. In particular the per-thread data was already getting freed when the whole process exited, so it would only affect a process that created and terminated many threads over its lifetime. - Fixed a potential race condition when exiting managed threads that did not have the
clear_child_tid
attribute set. This is unlikely to have affected most software running under Shadow, since most thread APIs use this attribute. - Changed an error value in
clock_nanosleep
andnanosleep
fromENOSYS
toENOTSUP
. - A managed process that tries to call the
execve
syscall will now get an error instead of escaping the Shadow simulation. #2718 - Stopped overriding libc's
getcwd
with an incorrect wrapper that was returning-1
instead ofNULL
on errors. - A call to
epoll_ctl
with an unknown operation will returnEINVAL
. - Simulated Processes are now reaped and deallocated after the exit, reducing run-time memory usage when processes exit over the course of the simulation. This was unlikely to have affected most users, since Shadow currently doesn't support
fork
, so any simulation has a fixed number of processes, all of which are explicitly specified in shadow's config.
All Merged Pull Requests
- Clear changelog for new release by @robgjansen in #2803
- Add rust syscall handlers for readv/writev and related syscalls by @stevenengler in #2799
- Use posix_spawn instead of vfork + exec by @sporksmith in #2800
- Unregister threads with ChildPidWatcher by @sporksmith in #2806
- Cleanup syscall_types import redirection by @sporksmith in #2807
- Support
sendmsg()
andrecvmsg()
for tcp wrapper by @stevenengler in #2805 - Convert SysCallResult to Rust and rename to SyscallResult by @sporksmith in #2808
- ChildPidWatcher more robust unregistration by @sporksmith in #2809
- ChildPidWatcher: fix a race condition when unregistering a pid by @sporksmith in #2814
- Rename plugin pointer types by @stevenengler in #2813
- Add syscall handlers for
sendmsg()
andrecvmsg()
by @stevenengler in #2811 - Improve the strace formatting of
msghdr
by @stevenengler in #2815 - Move misc code to rust by @stevenengler in #2817
- Remove old setup commands from tor minimal test by @stevenengler in #2819
- Bump tempfile from 3.4.0 to 3.5.0 in /src by @dependabot in #2818
- Convert ManagedThread to Rust by @sporksmith in #2812
- Remove deprecated python tools by @sporksmith in #2825
- Fix some incorrect buffer sizes in the memory manager by @stevenengler in #2826
- Do not implicitly search current directory for executables by @sporksmith in #2824
- Add a generic type to
ForeignPtr
by @stevenengler in #2827 - Simplify memory manager
read_vals
by @stevenengler in #2829 - ChildPidWatcher improvements by @sporksmith in #2823
- Split CHANGELOG by semver-level by @sporksmith in #2828
- Always use YAML extended syntax by @sporksmith in #2831
- Add a
write
method to the memory manager by @stevenengler in #2830 - Add type to thread's
tid_address
by @stevenengler in #2834 - CI: don't run on multiple base images by @sporksmith in #2832
- Add email address for reporting security issues by @stevenengler in #2835
- Change definition of
io::write_partial
by @stevenengler in #2838 - Change the
pcap_directory
config option topcap_enabled
by @stevenengler in #2840 - Move and better-document waiting for thread exit by @sporksmith in #2839
- Remove hostname prefix from hosts' data files by @stevenengler in #2833
- Log less in ManagedThread::wait_for_native_exit by @sporksmith in #2843
- Add
MSG_TRUNC
support for unix sockets by @stevenengler in #2841 - Update end date for NSF sponsorship by @robgjansen in #2844
- Rename
RecvmsgReturn.bytes_read
field toreturn_val
by @stevenengler in #2845 - Update our semver guidelines and supported platforms by @stevenengler in https://github.com/shadow/shad...
v2.5.0
Summary
In this release, we continue our transition from C to Rust. Most of the changes included in the release are backend changes that support our continued Rust migration. In particular, we've made important progress on migrating some of Shadow's core components, including Host, Process, Thread, and networking code. We also fixed some bugs and made some other changes to improve the experience for users as described below.
This release is intended to be the last stable release in the v2.x series. We have accumulated a fair number of issues that require a major version bump to complete as described in this discussion post, so we intend to take care of these issues within the next couple of weeks. As a result, the next stable release will mark the start of the v3.x series.
Primary user-facing changes since v2.4.0
- Added a contributor code of conduct.
https://github.com/shadow/shadow/blob/8003656d94fe781902f8b09420d994963a81c62c/CODE_OF_CONDUCT.md - Set the shim library's stdout/stderr to the shim log file. This should only
affect simulations that use experimental features to disable interposition. #2725 - Removed the experimental options
preload_spin_max
anduse_explicit_block_message
.
These options were to support an execution model where Shadow workers ran on different
CPU cores than the managed threads they were controlling, and each side would "spin"
while waiting for a message from the other side. After extensive benchmarking we found
that this was rarely a significant win, and dropped support for this behavior while
migrating the core IPC functionality to Rust. - Changed the order that events are processed in Shadow. Some simulations may
see improved runtime performance. #2522 - Removed the experimental Dockerfile and related documentation. This is
unrelated to running Shadow in Docker following the existing supported
documentation, and we continue to support running Shadow in Docker. - Fixed the offset calculation in preadv/preadv2/pwritev/pwritev2 to correctly
handle negative offsets and large offsets. #2802
All Merged Pull Requests
- Clear changelog, minor update to version bump docs by @robgjansen in #2696
- Add
DescriptorHandle
and returnResult
from descriptor table methods by @stevenengler in #2691 - Close the
LegacyFile
when the lastLegacyFileCounter
is dropped by @stevenengler in #2698 - Simplify the host's networking cleanup by @stevenengler in #2697
- Update syscall handlers to take explicit argument types by @stevenengler in #2664
- MemoryManager: coalesce initial adjacent compatible regions by @sporksmith in #2699
- process_continue: take thread id instead of thread* by @sporksmith in #2700
- Delete unused process_insertThread by @sporksmith in #2701
- Reordered line styles in
plot-shadow.py
by @stevenengler in #2702 - Plan for interior mutability in Thread / ThreadRef by @sporksmith in #2703
- Migrate packet forwarding functionality to rust by @robgjansen in #2632
- Update to latest tgen ref by @robgjansen in #2690
- Relay design improvements by @robgjansen in #2706
- Make ThreadRef the owner of the C Thread and rename to Thread by @sporksmith in #2705
- Remove unused preload tests by @stevenengler in #2708
- Refill token buckets less often on slow connections by @robgjansen in #2707
- Remove unused async priority queue by @stevenengler in #2714
- Push local packets directly to netif, stop using events by @robgjansen in #2711
- Support rust
InetSocket
s in the network interface by @stevenengler in #2713 - Remove docker bits since we don't support it by @robgjansen in #2719
- Remove unused countdown latch by @robgjansen in #2720
- Don't run workflows on merge to main by @sporksmith in #2722
- Remove github coverage workflow by @sporksmith in #2723
- Remove unused C utility code by @stevenengler in #2721
- Code of conduct by @stevenengler in #2174
- CI: remove debug builds from the primary test matrix by @sporksmith in #2724
- Set shim's stdout/stderr to the shim log file by @stevenengler in #2725
- SysCallHandler: don't store pointers to Process and Thread by @sporksmith in #2731
- Support Rust listeners on C
LegacyFile
s by @stevenengler in #2715 - Fix dangling socket pointer in the network interface by @stevenengler in #2738
- Rename
TcpSocket
toLegacyTcpSocket
by @stevenengler in #2741 - Add a Rust PluginPhysicalPtr type by @sporksmith in #2740
- Add RootedCell by @sporksmith in #2739
- Support
getsockopt()
andsetsockopt()
for tcp wrapper by @stevenengler in #2742 - Support rust listeners for the tcp wrapper by @stevenengler in #2744
- Rename
Worker::thread_id()
toWorker::worker_id()
by @stevenengler in #2746 - Thread: remove more circular references and usages by @sporksmith in #2748
- Rooted*: make public APIs inline by @sporksmith in #2750
- Bump bindgen from 0.63.0 to 0.64.0 in /src by @dependabot in #2732
- Add SendPointer by @sporksmith in #2751
- Convert Thread to Rust by @sporksmith in #2752
- Remove optional dependencies by @stevenengler in #2756
- Move the "Shadow Output" docs to the "Developer Guide" section by @stevenengler in #2755
- Prioritize packet events over local events by @stevenengler in #2522
- Remove recommended modules/tools by @stevenengler in #2760
- Fix backwards timeout condition by @stevenengler in #2759
- Support
shutdown()
for tcp wrapper by @stevenengler in #2745 - Add additional epoll tests by @stevenengler in #2758
- Support explicit return types in the syscall handlers by @stevenengler in #2762
- Support
listen()
for tcp wrapper by @stevenengler in #2763 - Support
connect()
for tcp wrapper by @stevenengler in #2764 - Bump tempfile from 3.3.0 to 3.4.0 in /src by @dependabot in #2767
- Support
accept()
for tcp wrapper by @stevenengler in #2768 - Remove unused event queue ffi code by @stevenengler in #2769
- Add SelfContainedChannel by @sporksmith in #2757
- Convert most syscall helper types to Rust by @sporksmith in #2765
- VirtualAddressSpaceIndependent derive macro: support enums and unions by @sporksmith in #2771
- VirtualAddressSpaceIndpendent: use trait bounds in derive macro by @sporksmith in #2772
- Convert ShimEvent to Rust by @sporksmith in #2773
- Derive VirtualAddressSpaceIndependent for ShimEvent by @sporksmith in #2774
- Rename 'tcp.rs' to 'legacy_tcp.rs' by @stevenengler in #2780
- Bump rayon from 1.6.1 to 1.7.0 in /src by @dependabot in #2781
- Add forgotten changelog entries by @stevenengler in #2782
- Upgrade rust version to 1.68.0 by @stevenengler in #2783
- Make read/write offsets an
Option
by @stevenengler in #2784 - Remove reference count from
SysCallHandler
by @stevenengler in #2789 - Update versions of CI tools and rust dependencies by @stevenengler in #2788
- Convert Ipc to Rust by @sporksmith in #2775
- Bump bitflags from 1.3.2 to 2.0.1 in /src by @dependabot in #2792
- Bump syn from 1.0.109 to 2.0.0 in /src by @dependabot in #2793
- ShimEvent: convert from union to enum by @sporksmith in #2776
- Remove deprecated ShimEvents and handlers by @spork...
v2.4.0
Summary
In this release, we continue our transition from C to Rust. Most of the changes included in the release are backend changes that support our continued Rust migration. However, we also fixed many bugs and made some other changes to improve the experience for users as described below.
We intend additional work following this release to focus on changes to some of Shadow's core networking components, including the TCP stack and other facilities for forwarding packets between nodes. This is somewhat higher risk work that could result in bugs that affect Shadow's network performance and stability. We are issuing this v2.4.0 release now to ensure that users have a stable version of Shadow that they can use while we work on the high risk networking code.
Primary user-facing changes since v2.3.0
- Fixed an uncommon memory leak in
epoll_ctl
. #2586 - Tests that use shadow and tgen now use the binaries from
$PATH
and not
~/.local/bin
. #2572 - Shadow now forces the use of a specific Rust version using a
rust-toolchain.toml
file. #2614 - Added official support for Fedora 37. #2687
- Fixed a bug that could leak closed UDP sockets. #2594
- Emulate
sched_{get,set}affinity
syscalls. #2602 - Emulate reading from
/sys/devices/system/cpu/possible
and
/sys/devices/system/cpu/online
. #2602 - Fixed the TCP header sizes in pcap files. #2620
- Various minor improvements to the experimental strace logger (improved
formatting of strings, buffers, and socket addresses, added logging of
vdso-handled syscalls, etc). - Added etcd and wget2 examples to the
examples/
directory. #2637, #2659 - Improved the line styles in plotting script. #2638
- Support higher-level host-specific log levels. #2645
- Fixed a bug where a socket can receive packets that were intended for a
different socket. #2593
All Merged Pull Requests
- Build shadow once for extra tests by @sporksmith in #2572
- Remove
Transport
by @stevenengler in #2578 - Delete unused hostc_setup by @sporksmith in #2582
- SelfContainedMutex: move from shadow-shim-helper-rs to shadow_shmem by @sporksmith in #2567
- Host::add_application: take Rust objects and remove unsafe by @sporksmith in #2583
- Move process list to Rust Host and delete HostCInternal by @sporksmith in #2584
- Remove "Tor Tests" ci badge from readme by @stevenengler in #2585
- Fix epoll memory leak by @stevenengler in #2586
- Add
SyscallHandler::legacy_syscall
helper function by @stevenengler in #2588 - Add
InetSocket
enum and placeholderTcpSocket
struct by @stevenengler in #2589 - Reset changelog by @stevenengler in #2591
- Extra Tests CI: Improve Rust caching by @sporksmith in #2575
- Make clippy happy by @trinity-1686a in #2587
- Bump nix from 0.25.0 to 0.26.1 in /src by @dependabot in #2580
- Update network interface association/disassociation by @stevenengler in #2594
- Add initial version of the
TCP
wrapper by @stevenengler in #2595 - Remove rust code that is no longer used by @stevenengler in #2598
- Support
getsockname
/getpeername()
for tcp wrapper by @stevenengler in #2599 - Support
ioctl()
for tcp wrapper by @stevenengler in #2600 - Add support for rust sockets in the network interface by @stevenengler in #2603
- Don't refcount Process by @sporksmith in #2601
- Remove
Host::setup()
by @stevenengler in #2606 - Add examples for dynamic shadow config generation by @sporksmith in #2605
- Changed systemctl TasksMax Parameter by @Ti-ger in #2609
- Move host networking to new
NetworkNamespace
object by @stevenengler in #2607 - emulate sched_{get,set}affinity and
sysconf(_SC_NPROCESSORS_*)
by @trinity-1686a in #2602 - TypedPluginPtr: quote code in doc comment by @sporksmith in #2613
- regular_file.c: don't treat 0 as a bad file descriptor by @sporksmith in #2616
- Bump memoffset from 0.7.1 to 0.8.0 in /src by @dependabot in #2615
- maintainer playbook: Handle project-issues by @sporksmith in #2610
- Enforce clippy by @sporksmith in #2614
- Support
bind()
for tcp wrapper by @stevenengler in #2611 - Make the Rust Process own the C Process by @sporksmith in #2612
- Rename borrowing accessors by @sporksmith in #2623
- Changed two debug logs to trace by @stevenengler in #2627
- Add rust
TcpSocket
to network interface asLegacySocket
by @stevenengler in #2626 - Simplify syscall logging by @stevenengler in #2624
- Remove unused router code by @robgjansen in #2628
- Use 0 as last refill time for new token buckets by @robgjansen in #2629
- Fix an issue with the
_test_implicit_bind
test by @stevenengler in #2630 - TGen speed test threshold by @robgjansen in #2633
- Issue 2619; fixes TCP header sizes in pcap files by @rwails in #2620
- Bump once_cell from 1.16.0 to 1.17.0 in /src by @dependabot in #2635
- updating tgen getting started notes to better address the ../../.. prefix by @rwails in #2634
- Add example (and test) for etcd by @stevenengler in #2637
- Improve line styles in plotting script by @stevenengler in #2638
- Small code improvements by @stevenengler in #2641
- Check for specific socket association before wildcard by @stevenengler in #2640
- Show nanoseconds in strace log by @stevenengler in #2642
- Bump lzma-rs from 0.2.0 to 0.3.0 in /src by @dependabot in #2644
- Migrate most of Process from C to Rust by @sporksmith in #2631
- Update dependencies by @stevenengler in #2647
- Log syscalls that are handled in the shim by @stevenengler in #2646
- Simplify strace logging in
Process
by @stevenengler in #2649 - Support displaying more-complex syscall arguments by @stevenengler in #2648
- Process: move thread list to Rust by @sporksmith in #2650
- Bump Rust to 1.66.1 by @sporksmith in #2655
- Log socket addresses in the syscall logger by @stevenengler in #2652
- Allow higher-level host log levels by @stevenengler in #2645
- Fix
sched_{get,set}affinity
bug by @stevenengler in #2657 - Added example for Wget2 by @stevenengler in #2659
- Convert some Process code to Rust by @sporksmith in #2651
- bind-shadow test: extend timeout by @sporksmith in #2663
- Remove unneeded workaround in
syscallhandler_epoll_ctl
by @stevenengler in #2660 - Bump clap from 4.0.32 to 4.1.0 in /src by @dependabot in #2666
- Don't stall when already delivered packets are queued by @robgjansen in #2667
- Convert remaining Process code to Rust by @sporksmith in #2661
- Change log msg level from info to debug by @stevenengler in #2672
- Improve strace string logging by @stevenengler in #2669
- Remove C Process by @sporksmith in #2665
- Warn if running with root privileges by @stevenengler in #2670
- Support
InetSocket
in the tracker by @stevenengler in #2671 - Multithread death by @sporksmith in #2680
- Bump which from 4.3.0 to 4.4.0 in /src ...
v2.3.0
Summary
Shadow v2.3.0 is a minor release that contains many bug fixes as well as a large push to convert more code from C to Rust; ~54% of our code is now written in Rust compared to just 39% in C. We have incorporated many improvements to Shadow's design as we migrate to Rust, making the code easier to understand, better tested, and easier to maintain. We plan to continue our focus on migrating code to Rust in our next release.
Primary user-facing changes since v2.2.0:
- If running Shadow in Docker, you should use
--tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1024g
rather than
--shm-size=1024g
to mount/dev/shm
as executable. This fixes errors when
the managed process maps executable pages. #2400 - Added latency modeling and potential thread-yield to rdtsc emulation, allowing
managed code to avoid deadlock in busy-loops that use only the rdtsc
instruction and no syscalls. #2314 - The build now internally uses
pkg-config
to locate glib, instead of a custom
cmake module. This is the recommended way of getting the
appropriate glib compile flags, and works better in non-standard layouts such
as in a guix environment. - The
setup
script now has a--search
option, which can be used to add
additional directories to search for pkg-config files, C headers, and
libraries. It obsoletes the options--library
and--include
. - Fixed a bug causing
mmap
to fail when called on a file descriptor that was
opened withO_NOFOLLOW
. #2353 - Bare executable names are now resolved by searching shadow's
PATH
.
Previously these were interpreted as relative to the current directory. For
backwards compatibility, Shadow will currently prefer a binary in that location
if one is found but log a warning. Such cases should be disambiguated by using
an absolute path or prefixing with./
. - Fixed order-of-operations bug in CoDel control law that could lead to an
unexpected packet drop schedule. We think the bug could have caused Shadow to
slightly more aggressively drop packets that have already been sitting in the
CoDel queue for longer than 110 milliseconds. Based on the results of some Tor
network simulations, the bug didn't appear to affect Tor network performance
enough to lead us to believe that previous Tor simulations are invalid. #2479 - Changed the default scheduler from
thread-per-host
tothread-per-core
,
which has better performance on most machines. - Experimental host heartbeat log messages are enabled by default
(experimental.host_heartbeat_interval
defaults to"1 sec"
), but the format
of these messages is not stable. - Some of Shadow's emulated syscalls and object allocations are counted and
written to ashadow.data/sim-stats.json
file. - Improved experimental strace logging for
brk
,mmap
,munmap
,mremap
,
mprotect
,open
, andopenat
syscalls. - Several small simulation examples were added to an
examples/
directory. - Fixed the file access mode for stdin in the managed process (changed from
O_WRONLY
toO_RDONLY
). - Fixed support for
readv
andwritev
syscalls, and added support for
preadv
andpwritev
. - Fixed a rare crash in Shadow's shim while logging. #2459
- Set the
ifa_netmask
field ingetifaddrs()
to improve compatibility with
Node.js applications. #2456 - Shadow no longer depends on its absolute installed location, allowing the
installation directory to be safely moved. #2391 - Shadow now emulated
PR_SET_DUMPABLE
, allowing it to work for programs that
try to disable memory inspection. #2370 - Added new test cases to check Shadow's simulated network performance. These
new tests help us verify that Shadow's network stack is capable of
facilitating high-bandwidth transfers when using a single TCP stream or when
using many streams in parallel, and across networks with various latency and
bandwidth characteristics. Since we run the tests as part of our CI, it is now
much more likely that we will notice when we make changes that significantly
reduces Shadow's simulated network performance. We plan to expand the cases
that we test in future releases. #2549
New Contributors
- @Congyu-Liu made their first contribution in #2353
- @zyansheep made their first contribution in #2447
- @valdaarhun made their first contribution in #2474
Thanks also to Shadow devs @sporksmith, @stevenengler, and @robgjansen!
Full Changelog
The full changelog can be viewed here: v2.2.0...v2.3.0
v2.2.0
Summary
Shadow v2.2.0 is a rather small minor release that contains mostly bug fixes but also some new support for dup()
ing file descriptors. We believe that our bug fixes improved Shadow's stability enough to warrant a release.
Rust became the majority language in Shadow in this release, and we plan to focus the next release on continuing our Rust migration.
Here is log of the primary user-facing changes we made since the previous release:
-
We have removed ptrace-mode, and the associated experimental options
use-o-n-waitpid-workaround
and--interpose-method
. ptrace-mode was an
alternative to Shadow's current interposition mechanism that usesLD_PRELOAD
andseccomp
. This change should be transparent to most users, since it hasn't
been the default for several releases, and was only accessible via experimental
options. See #1945 -
dup()
and related syscalls are now supported for all file descriptors -
Fixed behavior when multiple threads are blocked in
epoll_wait
on the same epoll
file description. #2260 -
Fixed bugs causing
timerfd_settime
to not reset the internal timer's
expiration count (#2279), and not cancel
previously scheduled timer-fire events (#2282). -
Fixed a panic when patching the VDSO in newer kernels, such as those in Ubuntu 22.04.
#2273 -
Fixed the errno returned from calling
connect()
on a unix socket. This
fixes agetaddrinfo()
test failure on some systems.
#2286 -
Fixed minor memory leaks. #2249
All Merged Pull Requests
- Added empty CHANGELOG file by @stevenengler in #2221
- Delete ptrace by @sporksmith in #2222
- Fix default value of 'use_syscall_counters' in docs by @stevenengler in #2224
- Simplified the logging init, and other small code simplifications by @stevenengler in #2225
- Replace 'Controller' with a rust version by @stevenengler in #2226
- Prepare manager for rust conversion by @stevenengler in #2230
- Upgrade bindgen/cbindgen versions and fix double bindings by @stevenengler in #2234
- Revert "Change random/seed/ordering behaviour to match older Shadow" by @stevenengler in #2236
- Bump anyhow from 1.0.57 to 1.0.58 in /src by @dependabot in #2232
- Bump syn from 1.0.96 to 1.0.98 in /src by @dependabot in #2231
- Replace 'execinfo.h' backtrace with a rust-generated backtrace by @stevenengler in #2237
- Bump quote from 1.0.18 to 1.0.19 in /src by @dependabot in #2233
- Bump quote from 1.0.19 to 1.0.20 in /src by @dependabot in #2238
- Fixed typos by @stevenengler in #2239
- Remove handle field from
LegacyDescriptor
by @stevenengler in #2240 - Renamed
descriptor_
functions tolegacydesc_
by @stevenengler in #2241 - Don't warn if null pointer address isn't mapped into shadow by @stevenengler in #2244
- Restructure descriptors to support dup() for all descriptor/file types by @stevenengler in #2242
- Remove
ownerProcess
field fromLegacyDescriptor
by @stevenengler in #2245 - Fixed misc memory leaks by @stevenengler in #2249
- Fix unsafe accesses of host objects from the manager by @stevenengler in #2248
- Compress CI artifacts by @stevenengler in #2251
- Changes to controller, manager, and scheduler by @stevenengler in #2255
- Add 'shadow' and 'linux' labels to tests by @stevenengler in #2252
- Bump regex from 1.5.6 to 1.6.0 in /src by @dependabot in #2257
- Bump serde from 1.0.137 to 1.0.138 in /src by @dependabot in #2253
- Bump once_cell from 1.12.0 to 1.13.0 in /src by @dependabot in #2256
- epoll_wait: re-block when events were consumed by another thread by @sporksmith in #2261
- Re-enable fcntl locking for regular files by @sporksmith in #2259
- More organization changs to the manager / worker pool by @stevenengler in #2262
- Improve gdb documentation by @sporksmith in #2265
- Pre-allocate space in shared memory files by @sporksmith in #2267
- Add 'x86-64' to supported platforms doc by @stevenengler in #2272
- Updated versions used in the CI workflows by @stevenengler in #2278
- Ensure timerfd_settime resets expiration count by @sporksmith in #2279
- Add support for smaller vdso trampolines by @sporksmith in #2275
- shadow version string: get commit hash and dirty status more robustly by @sporksmith in #2281
- Replace the C
Manager
with a rust version by @stevenengler in #2277 - timerfd: ensure previously scheduled events are cancelled when re-arming by @sporksmith in #2282
- Rename
LegacyDescriptor
toLegacyFile
by @stevenengler in #2284 - Return
ENOENT
for unix socketconnect()
with pathname address by @stevenengler in #2287 - Reorganization before converting router* to Rust by @robgjansen in #2285
- Fixed a warning in the unix socket code by @stevenengler in #2288
- Fix
getaddrinfo()
error on some systems by @stevenengler in #2292 - Tweak shadow tests CI matrix by @sporksmith in #2283
- Added entries to CHANGELOG by @stevenengler in #2291
- Add
debug_panic
macro by @stevenengler in #2294
Full Changelog: v2.1.0...v2.2.0
v2.1.0
Shadow v2.1.0 is a minor release following our significant redesign of Shadow in v2.0.0. See the v2.0.0 release notes for more details about Shadow's new multi-process architecture.
This v2.1.0 release has largely focused on improving support for running various types of applications in Shadow while smoothing some of the rough edges introduced in v2.0.0.
We plan to focus our next release on rust migration, and we expect that Rust will become Shadow's primary programming language in v2.2.0!
New features
- support for applications that depend on signals
- support for dynamically linked Go applications
- support for abstract-named unix sockets
- VDSO-based function interposition
- simulation progress messages (
general.progress
) - custom capture sizes for the pcap writer (
host_defaults.pcap_capture_size
) - system call latency modeling (
general.model_unblocked_syscall_latency
). This feature allows Shadow to escape some "busy loops" it couldn't before, avoiding deadlock in e.g. some versions of curl, iperf, libopenblas, and the golang runtime. - a
--debug-hosts
option to make debugging managed processes easier
New system calls
select()
getitimer()
setitimer()
SYS_rseq
New supported platforms
- Ubuntu 22.04
- Fedora 35 and 36
Changes
- significantly improved determinism
O_DIRECT
flag (packet mode) support for pipesioctl()
support for pipes- SYN packets have a sequence number and are now retransmitted if they are not acknowledged
- TCP sockets have finite-sized accept queues/backlogs
listen()
can be called more than once for TCP sockets to set the backlog- improved error reporting if a process did not start or was killed unexpectedly (for example by the OS)
- improved
ioctl()
file flag handling for regular files TCP_NODELAY
can be enabled for TCP sockets- added limited support for the
TCP_CONGESTION
socket option - pcap writer supports UDP packets
- improved behaviour when one thread closes a file while another thread is blocked on a syscall that uses that file
Bug fixes
- fixed broken eventfd and exit syscalls
- fixed behaviour when reading/writing to a pipe
- fixed memory leak when calling
getservbyname_r()
- fixed incorrect return value for
getaddrinfo()
- host log level is propagated to the shim log level
- fixed crash on systems with 64-bit inodes
We've made many other internal improvements, added new test cases, and expanded our documentation.
v2.0.0
Shadow v2.0.0 is released!
Version 2 is the first major version bump since Shadow v1.0.0 was tagged over a decade ago!
In Shadow v2, we completely redesigned the architecture for executing and interacting with applications running in Shadow. To understand the importance of the redesign, let's first look at the previous design and its limitations.
Limitations of the previous version 1 design
In the previous version 1, the underlying architecture was that Shadow would load applications as plugins into the Shadow process space, and as of v1.12.0 the plugins were loaded into independent namespaces in the Shadow process space. This underlying plugin-based architecture had several limitations:
-
Compatibility: The domain of supported applications was limited to those that are compiled as position-independent libraries (PIC) or executables (PIE) that export their symbols to the dynamic symbol table (rdynamic), are dynamically linked to libc, and make all system calls through libc. Rebuilding applications so that they could be loaded into Shadow was tedious, and impossible if the source code was not available (e.g., closed-source software).
-
Correctness: Relying solely on preloading as a mechanism to intercept system calls is unreliable because only dynamically linked functions (e.g., those in libc) can be intercepted using LD_PRELOAD; system calls invoked via statically linked code or assembly instructions could leak outside of the simulation and cause errors.
-
Maintainability: A custom dynamic loader was required to load more than 16 plugin namespaces at once, and a portable threading library was used to support multi-threaded applications (these used to account for 62k LoC in Shadow). libc functions with nontrivial functionality would need to be reimplemented in order to intercept the system calls they make.
All of these issues lead to stability problems in Shadow and limited its use to niche applications like Tor.
Our new v2 design
In version 2, we designed a new architecture for executing and interacting with applications running in Shadow.
In our new design, applications are executed as standard Linux processes and hooked into the simulation through the system call interface using standard kernel facilities (primarily using preloading with seccomp as a backstop). This design overcomes many of the limitations of the previous plugin architecture:
-
the simulator can now execute any existing application without rebuilding it, given just a binary executable and its command line arguments.
-
Linux kernel subsystems guarantee reliable process isolation and correct system call interception.
-
The maintenance of a custom loader, threading libraries, and reimplemented libc functions is no longer required.
The new design allows Shadow to focus on supporting core functionality, i.e. system calls, rather than designing and maintaining code to work around issues brought about by the v1 design limitations. Separating Shadow from applications through a system call interface will enable Shadow to become a much more general purpose tool supporting a large number of use-cases.
Performance
One of our primary concerns while considering a new design was performance: Shadow is designed to run large-scale distributed systems, and we wanted to make sure that any inter-process communication overhead necessitated by the new multi-process design would not significantly detract from Shadow's target use-cases.
We're happy to report that the performance in our new v2 design is in most cases comparable to or faster than the performance of the v1 design! This means the v2 design is an all-around win relative to the v1 design, and it should significantly improve the Shadow user experience.
Rust
During the development of the new architecture, we began the process of migrating Shadow's programming language from C to Rust to further improve stability and correctness. We have made big improvements in this regard, prioritizing user-facing changes into this v2.0.0 release. The most noticeable user-facing change is that we updated our subsystems for specifying command line arguments and Shadow config files. We hope our new yaml-based config files are easier to manually read and edit than the old xml format.
We have further Rust migration tasks planned in future releases. We don't think the remaining migration work will be very noticeable from a user perspective, but please bear with us as we dust some cobwebs and continue our transition to a safer language.
More information
Our new design is the focus of a research paper that will appear at the 2022 USENIX Annual Technical conference. See our design document for more details on the new v2 design and a reference to our published research article.