New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance regression #1699
Comments
One hypothesis is that the difference is due to a change in the behavior of the Shadow event scheduler and how we are calculating the run-ahead time (the time that each worker thread is allowed to execute into the future) for each scheduling round. We use a conservative scheduling algorithm, so we must always guarantee that time only moves forward. The run-ahead time represents the earliest time from "now" at which any host could cause a new event to occur at any other host. Generally, hosts are only sending "packet" events to other hosts, and so we use network latency as the run-ahead time. More precisely, this is how we compute the run-ahead time:
The run-ahead calculated in the v2.0.0-pre.4 version of the code is an upper bound on what will be calculated in the 154a11d version. It's possible that, due to a smaller run-ahead time, Shadow is not able to parallelize work as effectively, leading to lower CPU utilization and longer real times to execute simulations. It's also possible that the run-ahead time is smaller now primarily due to the fixes in #1611 and not due to computing the run-ahead at startup. We have an experiment planned to investigate. |
We discovered the following:
|
We believe we found a second performance issue: in preload mode, hostname to address lookups using a custom Shadow syscall number were being sent to the kernel and failing. This would cause the shim to fall back to doing scans over an We think this was introduced in c0d1064, and should be fixed in #1710. |
Fix hostname to addr lookups Fix getaddrinfo so that looking up hostnames through Shadow's custom SYS_shadow_hostname_to_addr_ipv4 syscall works again in preload mode and enables us to avoid scans over the /etc/hosts file. Also enabled the getaddrinfo tests so they run in Shadow, and did some refactoring. I believe this is a second performance fix for #1699 (in addition to setting a min runahead of 1 millisecond), but need an experiment to verify.
It turns out the /etc/hosts scan was not affecting performance all that much after all (see #1710 (comment)). |
Set default min runahead to 1ms Larger values for the --runahead option improves our ability to parallelize work across workers, leading to faster run-times. 1 millisecond was the lowest possible minimum in Shadow 1.x, and I think a reasonable default since most latencies for our target networks are in terms of milliseconds. refs #1699
We believe we have discovered a performance regression between shadow v2.0.0-pre.4 and 154a11d.
We should investigate.
The text was updated successfully, but these errors were encountered: