New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script to compare two compilers. #17

Merged
merged 4 commits into from Sep 29, 2016

Conversation

Projects
None yet
4 participants
@nnethercote
Contributor

nnethercote commented Sep 28, 2016

This commit adds a new script, compare.py, that can be used to compared the
speed of two compilers. Example output:

futures-rs-test  4.689s vs  4.668s --> 1.004x faster (variance: 1.001x, 1.008x)
helloworld       0.232s vs  0.230s --> 1.007x faster (variance: 1.009x, 1.012x)
html5ever-2016-  7.670s vs  7.669s --> 1.000x faster (variance: 1.008x, 1.009x)
hyper.0.5.0      5.304s vs  5.308s --> 0.999x faster (variance: 1.007x, 1.005x)
inflate-0.1.0    4.849s vs  4.884s --> 0.993x faster (variance: 1.019x, 1.009x)
issue-32062-equ  0.400s vs  0.396s --> 1.009x faster (variance: 1.014x, 1.021x)
issue-32278-big  1.872s vs  1.833s --> 1.022x faster (variance: 1.021x, 1.018x)
jld-day15-parse  1.903s vs  1.875s --> 1.015x faster (variance: 1.006x, 1.002x)
piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x)
regex.0.1.30     2.622s vs  2.629s --> 0.997x faster (variance: 1.020x, 1.018x)
rust-encoding-0  3.269s vs  3.245s --> 1.007x faster (variance: 1.022x, 1.022x)
syntex-0.42.2    0.240s vs  0.242s --> 0.992x faster (variance: 1.011x, 1.004x)
syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x)

In support of this, the commit also does the following.

  • Clarifies the meaning of the touch target. It now touches or removes
    files in such a way that subsequent make invocations will rebuild
    just the crate of interest for each benchmark. Most of the touch
    targets required changing to achieve this.
  • Replaces use of the CARGO_BUILD environment variable with
    CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform
    -- they now all invoke cargo rust directly, and none of them
    specify -Ztime-passes -Zinput-stats. The commit also updates
    appropriately process.sh for this change.
  • Uses cargo clean more extensively in makefile clean targets.
  • Adds .PHONY declarations to all makefiles missing them. It can't
    hurt.
  • Rewrites README.md to reflect all the above changes, and to remove
    outdated and incorrect information.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 28, 2016

Contributor

Some comments about the patch.

  • I'm unsure about the Cargo.lock changes. They happened automatically. Is this ok?
  • syntex is currently measuring the compilation speed of the last crate, which is trivial. (See the 0.240s time in the previous comment.) I'm pretty sure this is true for the perf.rust-lang.org runs as well. Doesn't seem right -- surely we want to measure the speed of the second last crate, which is the big slow one?
  • syntex-incr, in contrast, is measuring the time taken to compile all crates thanks to the cargo clean used for its touch target. At least, that's what's measured by compare.py... process.sh is still measuring just the last, trivial crate, because that's the only crate for which the -Ztime-passes output is produced.
    These syntax problems need working out.
Contributor

nnethercote commented Sep 28, 2016

Some comments about the patch.

  • I'm unsure about the Cargo.lock changes. They happened automatically. Is this ok?
  • syntex is currently measuring the compilation speed of the last crate, which is trivial. (See the 0.240s time in the previous comment.) I'm pretty sure this is true for the perf.rust-lang.org runs as well. Doesn't seem right -- surely we want to measure the speed of the second last crate, which is the big slow one?
  • syntex-incr, in contrast, is measuring the time taken to compile all crates thanks to the cargo clean used for its touch target. At least, that's what's measured by compare.py... process.sh is still measuring just the last, trivial crate, because that's the only crate for which the -Ztime-passes output is produced.
    These syntax problems need working out.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 28, 2016

Contributor

@nrc confirmed on IRC that perf.rust-lang.org has been measuring compilation time for the trivial crate for syntex ever since it got added on 2016-08-14.

Contributor

nnethercote commented Sep 28, 2016

@nrc confirmed on IRC that perf.rust-lang.org has been measuring compilation time for the trivial crate for syntex ever since it got added on 2016-08-14.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

cc @nikomatsakis for the syntex/syntax problem; he might be able to clarify what was intended, though I suspect you're right.

Contributor

Mark-Simulacrum commented Sep 28, 2016

cc @nikomatsakis for the syntex/syntax problem; he might be able to clarify what was intended, though I suspect you're right.

@Mark-Simulacrum

Looks great other than the one thing, which isn't required by any means.

I suspect the Cargo.lock changes are because you are using a non-nightly cargo, at least for the removal of checksums; once the Cargo.lock is changed I suspect the libc crate updated of its own accord. Not too important.

Show outdated Hide outdated compare.py
'regex.0.1.30',
'rust-encoding-0.3.0',
'syntex-0.42.2',
'syntex-0.42.2-incr-clean',

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

It would be nice if we could collect this information at runtime; maybe even as a list of shell arguments?

I think we already somewhat do that, but I'm not sure that there are benefits to keeping a static list here.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

It would be nice if we could collect this information at runtime; maybe even as a list of shell arguments?

I think we already somewhat do that, but I'm not sure that there are benefits to keeping a static list here.

This comment has been minimized.

@nnethercote

nnethercote Sep 28, 2016

Contributor

True. It was helpful having the static list during development, but there's no need for it now. I will fix.

@nnethercote

nnethercote Sep 28, 2016

Contributor

True. It was helpful having the static list during development, but there's no need for it now. I will fix.

Add a script to compare two compilers.
This commit adds a new script, `compare.py`, that can be used to compared the
speed of two compilers. Example output:
```
futures-rs-test  4.689s vs  4.668s --> 1.004x faster (variance: 1.001x, 1.008x)
helloworld       0.232s vs  0.230s --> 1.007x faster (variance: 1.009x, 1.012x)
html5ever-2016-  7.670s vs  7.669s --> 1.000x faster (variance: 1.008x, 1.009x)
hyper.0.5.0      5.304s vs  5.308s --> 0.999x faster (variance: 1.007x, 1.005x)
inflate-0.1.0    4.849s vs  4.884s --> 0.993x faster (variance: 1.019x, 1.009x)
issue-32062-equ  0.400s vs  0.396s --> 1.009x faster (variance: 1.014x, 1.021x)
issue-32278-big  1.872s vs  1.833s --> 1.022x faster (variance: 1.021x, 1.018x)
jld-day15-parse  1.903s vs  1.875s --> 1.015x faster (variance: 1.006x, 1.002x)
piston-image-0. 12.910s vs 12.932s --> 0.998x faster (variance: 1.010x, 1.006x)
regex.0.1.30     2.622s vs  2.629s --> 0.997x faster (variance: 1.020x, 1.018x)
rust-encoding-0  3.269s vs  3.245s --> 1.007x faster (variance: 1.022x, 1.022x)
syntex-0.42.2    0.240s vs  0.242s --> 0.992x faster (variance: 1.011x, 1.004x)
syntex-0.42.2-i 48.252s vs 48.070s --> 1.004x faster (variance: 1.011x, 1.006x)
```

In support of this, the commit also does the following.

- Clarifies the meaning of the `touch` target. It now touches or removes
  files in such a way that subsequent `make` invocations will rebuild
  just the crate of interest for each benchmark. Most of the `touch`
  targets required changing to achieve this.

- Replaces use of the CARGO_BUILD environment variable with
  CARGO_RUSTC_OPTS. This means that all makefiles are now more uniform
  -- they now all invoke `cargo rust` directly, and none of them
  specify `-Ztime-passes -Zinput-stats`. The commit also updates
  appropriately `process.sh` for this change.

- Uses `cargo clean` more extensively in makefile `clean` targets.

- Adds `.PHONY` declarations to all makefiles missing them. It can't
  hurt.

- Rewrites `README.md` to reflect all the above changes, and to remove
  outdated and incorrect information.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 28, 2016

Contributor

The new version has these changes.

  • I undid the Cargo.lock changes.
  • I made compare.py executable.
  • I removed the hard-coding of the test names.
  • I changed syntex-incr's makefile back to using RUSTFLAGS. I had moved the incremental options from RUSTFLAGS to after the -- but that means they only apply to the final crate, rather than all crates.

W.r.t. the syntex mess, I think the current patch doesn't change the current behaviour. So we could defer fixing that up to a follow-up if we wanted to.

Contributor

nnethercote commented Sep 28, 2016

The new version has these changes.

  • I undid the Cargo.lock changes.
  • I made compare.py executable.
  • I removed the hard-coding of the test names.
  • I changed syntex-incr's makefile back to using RUSTFLAGS. I had moved the incremental options from RUSTFLAGS to after the -- but that means they only apply to the final crate, rather than all crates.

W.r.t. the syntex mess, I think the current patch doesn't change the current behaviour. So we could defer fixing that up to a follow-up if we wanted to.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Sep 28, 2016

Collaborator

um yeah the syntex stuff is all messed up. My bad. I'm going to try and patch it up to do more what I intended. =)

Collaborator

nikomatsakis commented Sep 28, 2016

um yeah the syntex stuff is all messed up. My bad. I'm going to try and patch it up to do more what I intended. =)

subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
t1 = time.time()
subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
t2 = time.time()

This comment has been minimized.

@nikomatsakis

nikomatsakis Sep 28, 2016

Collaborator

So this is interesting to me. I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right? (Presumably, this is why we had to adjust all the make touch calls...)

I think we should land this PR as is, but I also think we should follow-up and merge the logic of the two scripts better. (For that matter, I think we should rewrite all these bash/python scripts into Rust with some common libraries, though if we just used Python that'd be fine too).

(In general scraping numbers out of the output feels like it will be more precise to me, though, and it gives finer-grained data. It'd be interesting, e.g., to be able to use this script to compare the timing for individual passes, and not just the overall run.)

@nikomatsakis

nikomatsakis Sep 28, 2016

Collaborator

So this is interesting to me. I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right? (Presumably, this is why we had to adjust all the make touch calls...)

I think we should land this PR as is, but I also think we should follow-up and merge the logic of the two scripts better. (For that matter, I think we should rewrite all these bash/python scripts into Rust with some common libraries, though if we just used Python that'd be fine too).

(In general scraping numbers out of the output feels like it will be more precise to me, though, and it gives finer-grained data. It'd be interesting, e.g., to be able to use this script to compare the timing for individual passes, and not just the overall run.)

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

subprocess.check_call('make touch > /dev/null 2>&1', shell=True)
t1 = time.time()
subprocess.check_call('make > /dev/null 2>&1', env=make_env, shell=True)
t2 = time.time()

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

Agreed that finer-grained timing is a good idea, but can definitely be done later.

Rewriting these bash and python scripts in Rust or Python has been a goal of mine for a while, but since they "worked" I wasn't inclined to change anything, especially since I'm not 100% clear on what the current situation is as to what is used and what isn't.

A historical record of timings is shown at: http://perf.rust-lang.org/. This
site makes use of the `process.sh` script plus some auxiliary scripts not in
this repository.

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

We can note here that the site source and backend is in rust-lang-nursery/rustc-perf, but this isn't too important.

@Mark-Simulacrum

Mark-Simulacrum Sep 28, 2016

Contributor

We can note here that the site source and backend is in rust-lang-nursery/rustc-perf, but this isn't too important.

update the syntex-0.42.2 makefiles
We are now timing specifically the syntex_syntax crate. Also, do not
remove the Cargo.lock file -- I want to ensure we continue to build the
same dependencies so our measurements are more reliable.
@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Sep 28, 2016

Collaborator

I opened up nnethercote#1 that updates this PR to handle the syntex tests better.

Collaborator

nikomatsakis commented Sep 28, 2016

I opened up nnethercote#1 that updates this PR to handle the syntex tests better.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 29, 2016

Contributor

I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right?

I think measuring from outside the process is better because it includes process start-up and shutdown, which is what the user experiences. (But the difference between external and internal measurement is probably small in practice.)

As for additional rewriting, I'd lean towards Python rather than Rust, due to (a) no need for compilation, and (b) it's easy to do complex shell stuff in Python with subprocess.call(..., shell=True), which I've taken advantage of several times in process.py. (Or does Rust have something similar?)

Contributor

nnethercote commented Sep 29, 2016

I feel like measuring time from outside the process is pretty imprecise, though at the scales we're currently talking about this might not matter much. But also, it's just a different measurement than what the normal process script is doing, right?

I think measuring from outside the process is better because it includes process start-up and shutdown, which is what the user experiences. (But the difference between external and internal measurement is probably small in practice.)

As for additional rewriting, I'd lean towards Python rather than Rust, due to (a) no need for compilation, and (b) it's easy to do complex shell stuff in Python with subprocess.call(..., shell=True), which I've taken advantage of several times in process.py. (Or does Rust have something similar?)

@nrc nrc merged commit 39120d7 into rust-lang-deprecated:master Sep 29, 2016

@nnethercote nnethercote deleted the nnethercote:compare-script branch Sep 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment