Skip to content
This repository has been archived by the owner on Nov 21, 2018. It is now read-only.

Add a script to compare two compilers. #17

Merged
merged 4 commits into from
Sep 29, 2016

Conversation

nnethercote
Copy link
Contributor

This commit adds a new script, compare.py, that can be used to compared the
speed of two compilers. Example output:

futures-rs-test  4.689s vs  4.668s --> 1.004x faster (variance: 1.001x, 1.008x)
helloworld       0.232s vs  0.230s --> 1.007x faster (variance: 1.009x, 1.012x)
html5ever-2016-  7.670s vs  7.669s --> 1.000x faster (variance: 1.008x, 1.009x)
hyper.0.5.0      5.304s vs  5.308s --> 0.999x faster (variance: 1.007x, 1.005x)
inflate-0.1.0    4.849s vs  4.884s --> 0.993x faster (variance: 1.019x, 1.009x)
issue-32062-equ  0.400s vs  0.396s --> 1.009x faster (variance: 1.014x, 1.021x)
issue-32278-big  1.872s vs  1.833s --> 1.022x faster (variance: 1.021x, 1.018x)
jld-day15-parse  1.903s vs  1.875s --> 1.015x faster (variance: 1.006x, 1.002x)
piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x)
regex.0.1.30     2.622s vs  2.629s --> 0.997x faster (variance: 1.020x, 1.018x)
rust-encoding-0  3.269s vs  3.245s --> 1.007x faster (variance: 1.022x, 1.022x)
syntex-0.42.2    0.240s vs  0.242s --> 0.992x faster (variance: 1.011x, 1.004x)
syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x)

In support of this, the commit also does the following.

  • Clarifies the meaning of the touch target. It now touches or removes
    files in such a way that subsequent make invocations will rebuild
    just the crate of interest for each benchmark. Most of the touch
    targets required changing to achieve this.
  • Replaces use of the CARGO_BUILD environment variable with
    CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform
    -- they now all invoke cargo rust directly, and none of them
    specify -Ztime-passes -Zinput-stats. The commit also updates
    appropriately process.sh for this change.
  • Uses cargo clean more extensively in makefile clean targets.
  • Adds .PHONY declarations to all makefiles missing them. It can't
    hurt.
  • Rewrites README.md to reflect all the above changes, and to remove
    outdated and incorrect information.

@nnethercote
Copy link
Contributor Author

Some comments about the patch.

  • I'm unsure about the Cargo.lock changes. They happened automatically. Is this ok?
  • syntex is currently measuring the compilation speed of the last crate, which is trivial. (See the 0.240s time in the previous comment.) I'm pretty sure this is true for the perf.rust-lang.org runs as well. Doesn't seem right -- surely we want to measure the speed of the second last crate, which is the big slow one?
  • syntex-incr, in contrast, is measuring the time taken to compile all crates thanks to the cargo clean used for its touch target. At least, that's what's measured by compare.py... process.sh is still measuring just the last, trivial crate, because that's the only crate for which the -Ztime-passes output is produced.
    These syntax problems need working out.

@nnethercote
Copy link
Contributor Author

@nrc confirmed on IRC that perf.rust-lang.org has been measuring compilation time for the trivial crate for syntex ever since it got added on 2016-08-14.

@Mark-Simulacrum
Copy link
Contributor

cc @nikomatsakis for the syntex/syntax problem; he might be able to clarify what was intended, though I suspect you're right.

Copy link
Contributor

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great other than the one thing, which isn't required by any means.

I suspect the Cargo.lock changes are because you are using a non-nightly cargo, at least for the removal of checksums; once the Cargo.lock is changed I suspect the libc crate updated of its own accord. Not too important.

'regex.0.1.30',
'rust-encoding-0.3.0',
'syntex-0.42.2',
'syntex-0.42.2-incr-clean',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could collect this information at runtime; maybe even as a list of shell arguments?

I think we already somewhat do that, but I'm not sure that there are benefits to keeping a static list here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. It was helpful having the static list during development, but there's no need for it now. I will fix.

This commit adds a new script, `compare.py`, that can be used to compared the
speed of two compilers. Example output:
```
futures-rs-test  4.689s vs  4.668s --> 1.004x faster (variance: 1.001x, 1.008x)
helloworld       0.232s vs  0.230s --> 1.007x faster (variance: 1.009x, 1.012x)
html5ever-2016-  7.670s vs  7.669s --> 1.000x faster (variance: 1.008x, 1.009x)
hyper.0.5.0      5.304s vs  5.308s --> 0.999x faster (variance: 1.007x, 1.005x)
inflate-0.1.0    4.849s vs  4.884s --> 0.993x faster (variance: 1.019x, 1.009x)
issue-32062-equ  0.400s vs  0.396s --> 1.009x faster (variance: 1.014x, 1.021x)
issue-32278-big  1.872s vs  1.833s --> 1.022x faster (variance: 1.021x, 1.018x)
jld-day15-parse  1.903s vs  1.875s --> 1.015x faster (variance: 1.006x, 1.002x)
piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x)
regex.0.1.30     2.622s vs  2.629s --> 0.997x faster (variance: 1.020x, 1.018x)
rust-encoding-0  3.269s vs  3.245s --> 1.007x faster (variance: 1.022x, 1.022x)
syntex-0.42.2    0.240s vs  0.242s --> 0.992x faster (variance: 1.011x, 1.004x)
syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x)
```

In support of this, the commit also does the following.

- Clarifies the meaning of the `touch` target. It now touches or removes
  files in such a way that subsequent `make` invocations will rebuild
  just the crate of interest for each benchmark. Most of the `touch`
  targets required changing to achieve this.

- Replaces use of the CARGO_BUILD environment variable with
  CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform
  -- they now all invoke `cargo rust` directly, and none of them
  specify `-Ztime-passes -Zinput-stats`. The commit also updates
  appropriately `process.sh` for this change.

- Uses `cargo clean` more extensively in makefile `clean` targets.

- Adds `.PHONY` declarations to all makefiles missing them. It can't
  hurt.

- Rewrites `README.md` to reflect all the above changes, and to remove
  outdated and incorrect information.
@nnethercote
Copy link
Contributor Author

The new version has these changes.

  • I undid the Cargo.lock changes.
  • I made compare.py executable.
  • I removed the hard-coding of the test names.
  • I changed syntex-incr's makefile back to using RUSTFLAGS. I had moved the incremental options from RUSTFLAGS to after the -- but that means they only apply to the final crate, rather than all crates.

W.r.t. the syntex mess, I think the current patch doesn't change the current behaviour. So we could defer fixing that up to a follow-up if we wanted to.

@nikomatsakis
Copy link
Contributor

um yeah the syntex stuff is all messed up. My bad. I'm going to try and patch it up to do more what I intended. =)

subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
t1 = time.time()
subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
t2 = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is interesting to me. I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right? (Presumably, this is why we had to adjust all the make touch calls...)

I think we should land this PR as is, but I also think we should follow-up and merge the logic of the two scripts better. (For that matter, I think we should rewrite all these bash/python scripts into Rust with some common libraries, though if we just used Python that'd be fine too).

(In general scraping numbers out of the output feels like it will be more precise to me, though, and it gives finer-grained data. It'd be interesting, e.g., to be able to use this script to compare the timing for individual passes, and not just the overall run.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
t1 = time.time()
subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
t2 = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.


A historical record of timings is shown at: http://perf.rust-lang.org/. This
site makes use of the `process.sh` script plus some auxiliary scripts not in
this repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can note here that the site source and backend is in rust-lang-nursery/rustc-perf, but this isn't too important.

We are now timing specifically the syntex_syntax crate. Also, do not
remove the Cargo.lock file -- I want to ensure we continue to build the
same dependencies so our measurements are more reliable.
@nikomatsakis
Copy link
Contributor

I opened up nnethercote#1 that updates this PR to handle the syntex tests better.

@nnethercote
Copy link
Contributor Author

I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right?

I think measuring from outside the process is better because it includes process start-up and shutdown, which is what the user experiences. (But the difference between external and internal measurement is probably small in practice.)

As for additional rewriting, I'd lean towards Python rather than Rust, due to (a) no need for compilation, and (b) it's easy to do complex shell stuff in Python with subprocess.call(..., shell=True), which I've taken advantage of several times in process.py. (Or does Rust have something similar?)

@nrc nrc merged commit 39120d7 into rust-lang-deprecated:master Sep 29, 2016
@nnethercote nnethercote deleted the compare-script branch September 29, 2016 01:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants