Add a script to compare two compilers. #17

nnethercote · 2016-09-28T06:34:58Z

This commit adds a new script, compare.py, that can be used to compared the
speed of two compilers. Example output:

futures-rs-test  4.689s vs  4.668s --> 1.004x faster (variance: 1.001x, 1.008x)
helloworld       0.232s vs  0.230s --> 1.007x faster (variance: 1.009x, 1.012x)
html5ever-2016-  7.670s vs  7.669s --> 1.000x faster (variance: 1.008x, 1.009x)
hyper.0.5.0      5.304s vs  5.308s --> 0.999x faster (variance: 1.007x, 1.005x)
inflate-0.1.0    4.849s vs  4.884s --> 0.993x faster (variance: 1.019x, 1.009x)
issue-32062-equ  0.400s vs  0.396s --> 1.009x faster (variance: 1.014x, 1.021x)
issue-32278-big  1.872s vs  1.833s --> 1.022x faster (variance: 1.021x, 1.018x)
jld-day15-parse  1.903s vs  1.875s --> 1.015x faster (variance: 1.006x, 1.002x)
piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x)
regex.0.1.30     2.622s vs  2.629s --> 0.997x faster (variance: 1.020x, 1.018x)
rust-encoding-0  3.269s vs  3.245s --> 1.007x faster (variance: 1.022x, 1.022x)
syntex-0.42.2    0.240s vs  0.242s --> 0.992x faster (variance: 1.011x, 1.004x)
syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x)

In support of this, the commit also does the following.

Clarifies the meaning of the touch target. It now touches or removes
files in such a way that subsequent make invocations will rebuild
just the crate of interest for each benchmark. Most of the touch
targets required changing to achieve this.
Replaces use of the CARGO_BUILD environment variable with
CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform
-- they now all invoke cargo rust directly, and none of them
specify -Ztime-passes -Zinput-stats. The commit also updates
appropriately process.sh for this change.
Uses cargo clean more extensively in makefile clean targets.
Adds .PHONY declarations to all makefiles missing them. It can't
hurt.
Rewrites README.md to reflect all the above changes, and to remove
outdated and incorrect information.

nnethercote · 2016-09-28T06:39:37Z

Some comments about the patch.

I'm unsure about the Cargo.lock changes. They happened automatically. Is this ok?
syntex is currently measuring the compilation speed of the last crate, which is trivial. (See the 0.240s time in the previous comment.) I'm pretty sure this is true for the perf.rust-lang.org runs as well. Doesn't seem right -- surely we want to measure the speed of the second last crate, which is the big slow one?
syntex-incr, in contrast, is measuring the time taken to compile all crates thanks to the cargo clean used for its touch target. At least, that's what's measured by compare.py... process.sh is still measuring just the last, trivial crate, because that's the only crate for which the -Ztime-passes output is produced.
These syntax problems need working out.

nnethercote · 2016-09-28T07:17:40Z

@nrc confirmed on IRC that perf.rust-lang.org has been measuring compilation time for the trivial crate for syntex ever since it got added on 2016-08-14.

Mark-Simulacrum · 2016-09-28T12:54:51Z

cc @nikomatsakis for the syntex/syntax problem; he might be able to clarify what was intended, though I suspect you're right.

Mark-Simulacrum

Looks great other than the one thing, which isn't required by any means.

I suspect the Cargo.lock changes are because you are using a non-nightly cargo, at least for the removal of checksums; once the Cargo.lock is changed I suspect the libc crate updated of its own accord. Not too important.

Mark-Simulacrum · 2016-09-28T12:56:10Z

compare.py

+    'regex.0.1.30',
+    'rust-encoding-0.3.0',
+    'syntex-0.42.2',
+    'syntex-0.42.2-incr-clean',


It would be nice if we could collect this information at runtime; maybe even as a list of shell arguments?

I think we already somewhat do that, but I'm not sure that there are benefits to keeping a static list here.

True. It was helpful having the static list during development, but there's no need for it now. I will fix.

This commit adds a new script, `compare.py`, that can be used to compared the speed of two compilers. Example output: ``` futures-rs-test 4.689s vs 4.668s --> 1.004x faster (variance: 1.001x, 1.008x) helloworld 0.232s vs 0.230s --> 1.007x faster (variance: 1.009x, 1.012x) html5ever-2016- 7.670s vs 7.669s --> 1.000x faster (variance: 1.008x, 1.009x) hyper.0.5.0 5.304s vs 5.308s --> 0.999x faster (variance: 1.007x, 1.005x) inflate-0.1.0 4.849s vs 4.884s --> 0.993x faster (variance: 1.019x, 1.009x) issue-32062-equ 0.400s vs 0.396s --> 1.009x faster (variance: 1.014x, 1.021x) issue-32278-big 1.872s vs 1.833s --> 1.022x faster (variance: 1.021x, 1.018x) jld-day15-parse 1.903s vs 1.875s --> 1.015x faster (variance: 1.006x, 1.002x) piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x) regex.0.1.30 2.622s vs 2.629s --> 0.997x faster (variance: 1.020x, 1.018x) rust-encoding-0 3.269s vs 3.245s --> 1.007x faster (variance: 1.022x, 1.022x) syntex-0.42.2 0.240s vs 0.242s --> 0.992x faster (variance: 1.011x, 1.004x) syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x) ``` In support of this, the commit also does the following. - Clarifies the meaning of the `touch` target. It now touches or removes files in such a way that subsequent `make` invocations will rebuild just the crate of interest for each benchmark. Most of the `touch` targets required changing to achieve this. - Replaces use of the CARGO_BUILD environment variable with CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform -- they now all invoke `cargo rust` directly, and none of them specify `-Ztime-passes -Zinput-stats`. The commit also updates appropriately `process.sh` for this change. - Uses `cargo clean` more extensively in makefile `clean` targets. - Adds `.PHONY` declarations to all makefiles missing them. It can't hurt. - Rewrites `README.md` to reflect all the above changes, and to remove outdated and incorrect information.

nnethercote · 2016-09-28T20:07:38Z

The new version has these changes.

I undid the Cargo.lock changes.
I made compare.py executable.
I removed the hard-coding of the test names.
I changed syntex-incr's makefile back to using RUSTFLAGS. I had moved the incremental options from RUSTFLAGS to after the -- but that means they only apply to the final crate, rather than all crates.

W.r.t. the syntex mess, I think the current patch doesn't change the current behaviour. So we could defer fixing that up to a follow-up if we wanted to.

nikomatsakis · 2016-09-28T20:19:03Z

um yeah the syntex stuff is all messed up. My bad. I'm going to try and patch it up to do more what I intended. =)

nikomatsakis · 2016-09-28T20:27:18Z

compare.py

+        subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
+        t1 = time.time()
+        subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
+        t2 = time.time()


So this is interesting to me. I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right? (Presumably, this is why we had to adjust all the make touch calls...)

I think we should land this PR as is, but I also think we should follow-up and merge the logic of the two scripts better. (For that matter, I think we should rewrite all these bash/python scripts into Rust with some common libraries, though if we just used Python that'd be fine too).

(In general scraping numbers out of the output feels like it will be more precise to me, though, and it gives finer-grained data. It'd be interesting, e.g., to be able to use this script to compare the timing for individual passes, and not just the overall run.)

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

Mark-Simulacrum · 2016-09-28T20:38:25Z

compare.py

+        subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
+        t1 = time.time()
+        subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
+        t2 = time.time()


Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

Mark-Simulacrum · 2016-09-28T20:40:30Z

README.md

+
+A historical record of timings is shown at: http://perf.rust-lang.org/. This
+site makes use of the `process.sh` script plus some auxiliary scripts not in
+this repository.


We can note here that the site source and backend is in rust-lang-nursery/rustc-perf, but this isn't too important.

We are now timing specifically the syntex_syntax crate. Also, do not remove the Cargo.lock file -- I want to ensure we continue to build the same dependencies so our measurements are more reliable.

nikomatsakis · 2016-09-28T20:58:37Z

I opened up nnethercote#1 that updates this PR to handle the syntex tests better.

Fix syntex benchmarks

nnethercote · 2016-09-29T01:10:22Z

I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right?

I think measuring from outside the process is better because it includes process start-up and shutdown, which is what the user experiences. (But the difference between external and internal measurement is probably small in practice.)

As for additional rewriting, I'd lean towards Python rather than Rust, due to (a) no need for compilation, and (b) it's easy to do complex shell stuff in Python with subprocess.call(..., shell=True), which I've taken advantage of several times in process.py. (Or does Rust have something similar?)

Mark-Simulacrum reviewed Sep 28, 2016

View reviewed changes

nnethercote force-pushed the compare-script branch from 96e22e2 to 853db5f Compare September 28, 2016 20:04

nikomatsakis reviewed Sep 28, 2016

View reviewed changes

lock syntex version

eca02b9

Mark-Simulacrum approved these changes Sep 28, 2016

View reviewed changes

update the syntex-0.42.2 makefiles

17e6309

We are now timing specifically the syntex_syntax crate. Also, do not remove the Cargo.lock file -- I want to ensure we continue to build the same dependencies so our measurements are more reliable.

Merge pull request #1 from nikomatsakis/fix-syntex-benchmarks

7afc56b

Fix syntex benchmarks

nrc merged commit 39120d7 into rust-lang-deprecated:master Sep 29, 2016

nnethercote deleted the compare-script branch September 29, 2016 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script to compare two compilers. #17

Add a script to compare two compilers. #17

nnethercote commented Sep 28, 2016

nnethercote commented Sep 28, 2016

nnethercote commented Sep 28, 2016

Mark-Simulacrum commented Sep 28, 2016

Mark-Simulacrum left a comment

Mark-Simulacrum Sep 28, 2016

nnethercote Sep 28, 2016

nnethercote commented Sep 28, 2016

nikomatsakis commented Sep 28, 2016

nikomatsakis Sep 28, 2016

Mark-Simulacrum Sep 28, 2016

Mark-Simulacrum Sep 28, 2016

Mark-Simulacrum Sep 28, 2016

nikomatsakis commented Sep 28, 2016

nnethercote commented Sep 29, 2016

Add a script to compare two compilers. #17

Add a script to compare two compilers. #17

Conversation

nnethercote commented Sep 28, 2016

nnethercote commented Sep 28, 2016

nnethercote commented Sep 28, 2016

Mark-Simulacrum commented Sep 28, 2016

Mark-Simulacrum left a comment

Choose a reason for hiding this comment

Mark-Simulacrum Sep 28, 2016

Choose a reason for hiding this comment

nnethercote Sep 28, 2016

Choose a reason for hiding this comment

nnethercote commented Sep 28, 2016

nikomatsakis commented Sep 28, 2016

nikomatsakis Sep 28, 2016

Choose a reason for hiding this comment

Mark-Simulacrum Sep 28, 2016

Choose a reason for hiding this comment

Mark-Simulacrum Sep 28, 2016

Choose a reason for hiding this comment

Mark-Simulacrum Sep 28, 2016

Choose a reason for hiding this comment

nikomatsakis commented Sep 28, 2016

nnethercote commented Sep 29, 2016