Fix SR-4590 compare_perf_tests.py fails when new benchmarks are added #8974

palimondo · 2017-04-24T21:35:22Z

Also fix phantom “number” test result parsed from Totals.

Broken out of #8793.

palimondo · 2017-04-24T21:35:38Z

@gottesmm Please review

gottesmm · 2017-04-24T22:30:57Z

@palimondo Can you put in a warning for the new tests that were added? I think you would just do a set subtraction and do a simple print.

gottesmm · 2017-04-24T22:35:43Z

Also, why is the digit test not necessary any more?

gottesmm · 2017-04-24T22:36:48Z

Another small concern of mine is about the row column numbers. Are you sure you got those correct? @atrick is more familiar with this script, so he may be the right person to just quickly confirm that.

atrick · 2017-04-24T23:04:33Z

I glanced at the diff but I don't understand how removing all that logic for finding the min/max scores pertains to this bug.
Please make sure that any changes you make to this script continue to handle the concatenated output from multiple runs of the benchmark driver.

Incidentally, I once had an earlier version of the script that handled added and removed tests. That shouldn't be hard.

palimondo · 2017-04-24T23:05:15Z

New tests that were added are coming in the fix for SR-4601 - separate PR, after we land the refactoring.

That whole code around row[MIN].isdigit() must have been some legacy from who-knows-when. It looks like there was a time when Benchamark_Driver wasn't reporting aggregate stats, but repeated tests were printed out multiple time? Given the current output from Benchmark_Driver, this works just fine.

I wasn't changing row column numbers… if you mean going from len(row) > 7 to len(row) > 8 in that test: MEDIAN = 7 and it is followed by MAX_RSS which makes for total of 9 columns.

palimondo · 2017-04-24T23:20:41Z

@atrick:

Please make sure that any changes you make to this script continue to handle the concatenated output from multiple runs of the benchmark driver.

What?! Where, how? I don't understand. When is it invoked like that?

atrick · 2017-04-24T23:21:13Z

The compare script needs to handle multiple invocations of the driver. That's literally the only way that I use the driver.

1/benchmark_driver > out1
1/benchmark_driver >> out1
...

2/benchmark_driver > out2
2/benchmark_driver >> out2
...

compare out1 out2

Also, I usually only rerun some subset of tests and concatenate those to the same output.

palimondo · 2017-04-24T23:38:09Z

Oh, my… I can see how's the compare_perf_test.py used by other scripts. There was no documentation about your use case anywhere. 🤷‍♂️🙎‍♂️

palimondo · 2017-04-24T23:40:23Z

@atrick What you describe here seems like manual version of the --rerun option you suggested in SR-4669. Am I correct?

atrick · 2017-04-24T23:53:23Z

Well, compare_perf_test was around long before the driver. I've always used a script similar to that. Mishal made it work with CI.
As I've been saying (over and over), I always run multiple invocations of the driver. I will actually use compare_perf_test on just a single set of results just to aggregate the information, but that's not so common.

Also fix phantom “number” test result parsed from Totals.

palimondo · 2017-04-25T00:12:45Z

That was just great. 🙅‍♂️
You should have taken https://bugs.swift.org/browse/SR-4590 when I filed it.

palimondo · 2017-04-25T00:15:34Z

@atrick Please go ahead and do the book-keeping in Jira, too. Thanks!

palimondo · 2017-04-25T00:28:15Z

Sorry @atrick, I thought the conflict was due to your commit, but it was @moiseev.

Why didn't you take SR-4590 when I filed it 11 days ago? I filed PR #8793 some 10 days ago. Now you jump in and mix it all up. Have you seen the refactoring I did there? We are duplicating efforts… Why?

atrick · 2017-04-25T00:38:34Z

@palimondo I think you misunderstood. I'm not fixing SR-4590. I'm saying I'm surprised that it's broken because I expected it to be able to handle added/removed benchmarks.

The confusing thing is that your bug title and this PR title is unrelated to the functionality that you're removing.

palimondo · 2017-04-25T00:59:15Z

@atrick It wasn't you. This PR is broken out from #8793 by request from @gottesmm to land it per-partes. @moiseev just landed the fix for SR-4590 in #8923. And a logical change in sorting order. 👍

I'm upset as my changes in #8793 do that too, as a part of much bigger refactoring of compare_perf_test.py - please have a look, both of you! In order to keep such massive change safe, I've been using diff to validate all outputs from legacy version to match output from my refactored script.

Your quick fixes make this unnecessarily hard, as I must rebase on top of your changes. The legacy script contains a ton of little quirks I had to replicate, now I need to redo that again. Its quite upsetting.

palimondo · 2017-04-25T01:01:06Z

I thought that filing bugs, that nobody took and I did self assign and opening a PR with fixes is enough to coordinate the work on this part of code to prevent conflict. I was apparently mistaken.

atrick · 2017-04-25T02:35:03Z

Ok. Thanks for working at this. Rebasing is always frustrating. But what you're experiencing is pretty normal. The reality is that I miss a lot of bug and commit activity, especially when I'm on vacation for a week then working on a deadline.

Fix SR-4590 compare_perf_tests.py fails when new benchmarks are added

fee490f

Also fix phantom “number” test result parsed from Totals.

palimondo force-pushed the SR-4590 branch from 9f12fdc to fee490f Compare April 25, 2017 00:06

palimondo closed this Apr 25, 2017

palimondo deleted the SR-4590 branch April 25, 2017 00:14

palimondo mentioned this pull request Apr 25, 2017

Fix SR-4601 Report Added and Removed Benchmarks in Performance Comparison #8991

Merged

Fix SR-4590 compare_perf_tests.py fails when new benchmarks are added #8974

Fix SR-4590 compare_perf_tests.py fails when new benchmarks are added #8974

Uh oh!

Conversation

palimondo commented Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Apr 24, 2017

Uh oh!

gottesmm commented Apr 24, 2017

Uh oh!

gottesmm commented Apr 24, 2017

Uh oh!

gottesmm commented Apr 24, 2017

Uh oh!

atrick commented Apr 24, 2017

Uh oh!

palimondo commented Apr 24, 2017

Uh oh!

palimondo commented Apr 24, 2017

Uh oh!

atrick commented Apr 24, 2017

Uh oh!

palimondo commented Apr 24, 2017

Uh oh!

palimondo commented Apr 24, 2017

Uh oh!

atrick commented Apr 24, 2017

Uh oh!

palimondo commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Apr 25, 2017

Uh oh!

palimondo commented Apr 25, 2017

Uh oh!

atrick commented Apr 25, 2017

Uh oh!

palimondo commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Apr 25, 2017

Uh oh!

atrick commented Apr 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

palimondo commented Apr 24, 2017 •

edited

Loading

palimondo commented Apr 25, 2017 •

edited

Loading

palimondo commented Apr 25, 2017 •

edited

Loading