Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance regression #1699

Closed
robgjansen opened this issue Oct 5, 2021 · 5 comments
Closed

Investigate performance regression #1699

robgjansen opened this issue Oct 5, 2021 · 5 comments
Assignees
Labels
Component: Main Composing the core Shadow executable Priority: Critical Major issue that requires immediate attention Tag: Performance Related to improving shadow's run-time Type: Bug Error or flaw producing unexpected results

Comments

@robgjansen
Copy link
Member

We believe we have discovered a performance regression between shadow v2.0.0-pre.4 and 154a11d.

  • In v2.0.0-pre.4: a 10% Tor network took about 10 real hours to complete 1 simulated hour. We believe CPU utilization to be high (>90%) across all cores during the simulation.
  • In 154a11d, using the same machine, a 10% Tor network took about 25 real hours to complete 1 simulated hour. CPU utilization hovers between 40-50%.

We should investigate.

@robgjansen robgjansen added Priority: Critical Major issue that requires immediate attention Type: Bug Error or flaw producing unexpected results Component: Main Composing the core Shadow executable Tag: Performance Related to improving shadow's run-time labels Oct 5, 2021
@robgjansen robgjansen added this to the Code health and maintenance milestone Oct 5, 2021
@robgjansen robgjansen added this to To do in Release v2.0 via automation Oct 5, 2021
@robgjansen
Copy link
Member Author

robgjansen commented Oct 5, 2021

One hypothesis is that the difference is due to a change in the behavior of the Shadow event scheduler and how we are calculating the run-ahead time (the time that each worker thread is allowed to execute into the future) for each scheduling round.

We use a conservative scheduling algorithm, so we must always guarantee that time only moves forward. The run-ahead time represents the earliest time from "now" at which any host could cause a new event to occur at any other host. Generally, hosts are only sending "packet" events to other hosts, and so we use network latency as the run-ahead time.

More precisely, this is how we compute the run-ahead time:

  • In v2.0.0-pre.4, we compute it as the min over the inter-host latencies only for those hosts that are communicating with one another. I.e., we start at a large value, and then whenever a packet is sent between a pair of hosts, we lower the run-ahead time if the latency between those hosts is smaller. (This was the intended behavior, though see Incorrect calculation for minimum time jump #1611)
  • In 154a11d, we have a new network graph implementation in which we calculate the run-ahead time at startup as the min latency between all pairs of hosts in the config file whether or not they ever communicate.

The run-ahead calculated in the v2.0.0-pre.4 version of the code is an upper bound on what will be calculated in the 154a11d version. It's possible that, due to a smaller run-ahead time, Shadow is not able to parallelize work as effectively, leading to lower CPU utilization and longer real times to execute simulations. It's also possible that the run-ahead time is smaller now primarily due to the fixes in #1611 and not due to computing the run-ahead at startup. We have an experiment planned to investigate.

@robgjansen
Copy link
Member Author

We discovered the following:

  • In a 10% Tor network generated using tornettools, Shadow quickly (within a few simulated minutes) converges the runahead value to the minimum latency across all pairs of hosts in the config file even when using the dynamically computed min latency algorithm. So the dynamic algorithm doesn't help much in this case (but could still help in more sparsely populated networks).
  • The old C graph code read in latency from the graph in milliseconds, so any positive latency value less than 1 millisecond effectively got rounded up to 1 millisecond. The new rust graph code properly handles latency values less than 1 millisecond. Because the atlas graph used by tornettools has latency values less than 1ms, this means the rust code will effectively be running with smaller runaheads than it used to (which could hurt parallelization).

@robgjansen robgjansen moved this from To do to In progress in Release v2.0 Oct 8, 2021
@robgjansen robgjansen self-assigned this Oct 8, 2021
@robgjansen
Copy link
Member Author

We believe we found a second performance issue: in preload mode, hostname to address lookups using a custom Shadow syscall number were being sent to the kernel and failing. This would cause the shim to fall back to doing scans over an /etc/hosts file instead, which is less efficient and the overhead would be superlinear in the number of hosts (since n hosts are doing scans over a file containing n hosts).

We think this was introduced in c0d1064, and should be fixed in #1710.

robgjansen added a commit that referenced this issue Oct 8, 2021
Fix hostname to addr lookups

Fix getaddrinfo so that looking up hostnames through Shadow's custom SYS_shadow_hostname_to_addr_ipv4 syscall works again in preload mode and enables us to avoid scans over the /etc/hosts file.

Also enabled the getaddrinfo tests so they run in Shadow, and did some refactoring.

I believe this is a second performance fix for #1699 (in addition to setting a min runahead of 1 millisecond), but need an experiment to verify.
@robgjansen
Copy link
Member Author

It turns out the /etc/hosts scan was not affecting performance all that much after all (see #1710 (comment)).

robgjansen added a commit that referenced this issue Oct 11, 2021
Set default min runahead to 1ms

Larger values for the --runahead option improves our ability to parallelize work across workers, leading to faster run-times. 1 millisecond was the lowest possible minimum in Shadow 1.x, and I think a reasonable default since most latencies for our target networks are in terms of milliseconds.

refs #1699
@stevenengler
Copy link
Contributor

stevenengler commented Oct 13, 2021

We also tested the performance of a 1% network in preload mode with a 1 ms minimum runahead for 25 simulated minutes, and found no performance difference between commits bd13002 (with patches from fdeff64, 51662bd, 4a8a3f0, de1edc3, and 2dbbedd) and 4718871.

Release v2.0 automation moved this from In progress to Done Oct 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Main Composing the core Shadow executable Priority: Critical Major issue that requires immediate attention Tag: Performance Related to improving shadow's run-time Type: Bug Error or flaw producing unexpected results
Projects
No open projects
Development

No branches or pull requests

2 participants