Skip to content

v3.0.0

Compare
Choose a tag to compare
@robgjansen robgjansen released this 18 May 22:31
· 1202 commits to main since this release
v3.0.0
e502d20

Summary

The dev team had accumulated a large set of breaking changes that would require a major version bump. In this release, we have focused on clearing our breaking changes queue and merging those improvements. Because these are breaking changes, this release has bumped our major version from 2 to 3. This release also significantly improves the runtime performance compared to Shadow 2.5.0.

Configuration format

  • Shadow no longer implicitly searches its working directory for executables to be run under the simulation. If you wish to specify a process path relative to Shadow's working directory, prefix that path with ./.

  • Shadow now supports YAML merge keys and extension fields. This allows you to combine YAML maps using the << key.

    Example:

    # an "extension field" that we use to store common host options
    x-host-client: &host-client
      bandwidth_up: 10Mbps
      bandwidth_down: 10Mbps
    hosts:
      client1:
        # merge the fields from the extension field above
        <<: *host-client
        processes: ...
      client2:
        <<: *host-client
        processes: ...
  • Removed the quantity options for hosts and processes. It's now recommended to use YAML anchors and merge keys instead.

    Shadow 2.x:

    hosts:
      client:
        quantity: 3
        processes: ...

    Shadow 3.x:

    hosts:
      client1: &client
        processes: ...
      # copy all fields from 'client1'
      client2: *client
      # copy all fields from 'client1' and add additional fields
      client3:
        <<: *client
        ip_addr: 152.21.4.24
  • Renamed the host_defaults field to host_option_defaults and renamed the host's options field to host_options.

    Shadow 2.x:

    host_defaults:
      ...
    hosts:
      client:
        options:
          ...

    Shadow 3.x:

    host_option_defaults:
      ...
    hosts:
      client:
        host_options:
          ...
  • Removed the host pcap_directory configuration option and replaced it with a new pcap_enabled option.

    Shadow 2.x:

    hosts:
      client:
        options:
          pcap_directory: ./

    Shadow 3.x:

    hosts:
      client:
        host_options:
          pcap_enabled: true
  • Host names are restricted to the patterns documented in hostname(7).

  • The process environment configuration option now takes a map instead of a semicolon-delimited string.

    Shadow 2.x:

    hosts:
      client:
        processes:
        - path: curl
          environment: ENV_A=1;ENV_B=foo

    Shadow 3.x:

    hosts:
      client:
        processes:
        - path: curl
          environment:
          - ENV_A: "1"
          - ENV_B: foo
  • The per-process option stop_time has been replaced with shutdown_time. When set, the signal specified by shutdown_signal (a new option) will be sent to the process at the specified time. While shadow previously sent SIGKILL at a process's stop_time, the default shutdown_signal is SIGTERM to better support graceful shutdown.

    Shadow 2.x:

    hosts:
      client:
        processes:
        - path: curl
          stop_time: 10s

    Shadow 3.x:

    hosts:
      client:
        processes:
        - path: curl
          shutdown_time: 10s
          shutdown_signal: SIGKILL
  • A new expected_final_state allows you to specify the expected state of the process at the end of the simulation. The supported states are exited, signaled, or running. If any process is not in the correct state at the end of the simulation, Shadow will return a non-zero exit code. The default expected_final_state is exited with code 0.

    In Shadow 2.x the behaviour was to consider any processes which exited with code 0, OR which were still running at the end of the simulation, as a success. Shadow 3.x does not support this specific behaviour, and you must choose a single state.

    Example:

    hosts:
      server:
        processes:
        - path: nginx
          # we expect nginx to run until the end of the simulation
          expected_final_state: running
  • Added support for a parallelism value of 0, which allows Shadow to choose a reasonable parallelism (we currently use the number of physical cores in Shadow's affinity/cgroup). The default value for parallelism has also been changed from 1 to 0.

  • It is now an error to set a process' shutdown_time or start_time to be after the simulation's stop_time.

  • Sub-second configuration values are now allowed for all time-related options, including start_time, stop_time, etc.

  • Removed and updated various experimental options including use_shim_syscall_handler, interface_qdisc, and use_extended_yaml.

File structure

  • A host's data files (files in <data-dir>/hosts/<hostname>/) are no longer prefixed with the hostname. For example a file that was previously named shadow.data/hosts/server/server.curl.1000.stdout is now named shadow.data/hosts/server/curl.1000.stdout.
  • The per-process .exitcode file has been removed due to its confusing semantics, and the new expected_final_state attribute replacing its primary use-case.
  • Generated pcap files are now named using their interface name instead of their IP address. For example "lo.pcap" and "eth0.pcap" instead of "127.0.0.1.pcap" and "11.0.0.1.pcap".

Performance

Shadow's scheduler is very performance-sensitive and needs to run tasks on worker threads with low latency. We added a spinloop in the scheduler that significantly improves Shadow's runtime performance. Some simulations see more than a 2x runtime performance improvement (for example 160 minutes to 47 minutes in a 5% Tor network simulation).

drawing

Supported platforms

We have removed several of our supported platforms. Specifically, we've dropped support for Ubuntu 18.04, Fedora 34/35/36, and CentOS Stream 8. We've also dropped support for Clang, and set a minimum-supported Linux kernel version of 5.4, which requires installing a backports kernel on Debian 10.

Stability guarantees

We've updated our "stability guarantees" document with the following changes:

  • Updated the filenames in Shadow's host-data directories to reflect the removal of the hostname prefix.
  • Added the ability to drop supported platforms in minor releases if the platforms no longer receive free updates and support from the distribution's developer.
  • Shadow no longer guarantees the order in which simulated process IDs (PIDs) are assigned.
  • Shadow will not change the criteria for the minimum supported Linux kernel version as documented in our supported platforms. This still allows us to increase the minimum kernel version as a result of dropping support for a platform.

Additional changes

Minor changes

  • Support the MSG_TRUNC flag for unix sockets. #2841
  • Support the TIMER_ABSTIME flag for clock_nanosleep. #2854
  • Removed the --profile, --include, and --library setup script options.
  • Added partial support for the epoll_pwait2 syscall.
  • Implemented the clone3 syscall. Thread libraries we're aware of that use clone3 were gracefully falling back to clone, but eventually they may not do so. This also reduces noise in shadow's log about an unimplemented syscall being attempted.
  • Shadow no longer requires /dev/shm to be executable.

Bug fixes

  • Fixed a memory leak of about 16 bytes per thread due to failing to unregister exited threads with a watchdog thread. This is unlikely to have been noticeable effect in typical simulations. In particular the per-thread data was already getting freed when the whole process exited, so it would only affect a process that created and terminated many threads over its lifetime.
  • Simulated Processes are now reaped and deallocated after the exit, reducing run-time memory usage when processes exit over the course of the simulation. This was unlikely to have affected most users, since Shadow currently doesn't support fork, so any simulation has a fixed number of processes, all of which are explicitly specified in shadow's config.
  • Fixed a potential race condition when exiting managed threads that did not have the clear_child_tid attribute set. This is unlikely to have affected most software running under Shadow, since most thread APIs use this attribute.
  • Changed an error value in clock_nanosleep and nanosleep from ENOSYS to ENOTSUP.
  • A managed process that tries to call the execve syscall will now get an error instead of escaping the Shadow simulation. #2718
  • Stopped overriding libc's getcwd with an incorrect wrapper that was returning -1 instead of NULL on errors.
  • A call to epoll_ctl with an unknown operation will return EINVAL.
  • Fixed a bug that caused Shadow to panic in some cases when a simulated thread exits. #2913
  • Fixed a bug causing host_options to undo any changes made to host_option_defaults.

Full changelog

Thanks to contributions from @robgjansen, @stevenengler, @sporksmith, @jtracey, @dependabot