Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sage-patchbot improvements #13950

Closed
robertwb opened this issue Jan 13, 2013 · 25 comments
Closed

Sage-patchbot improvements #13950

robertwb opened this issue Jan 13, 2013 · 25 comments

Comments

@robertwb
Copy link
Contributor

Spkg up at http://sage.math.washington.edu/home/robertwb/patches/patchbot-1.2.5.spkg Most notably, this adds:

  • Startup-time plugin.
  • Non-ascii plugin.
  • Spkg inspection.
  • Better redundancy avoidance.

CC: @vbraun

Component: packages: optional

Author: Robert Bradshaw

Reviewer: Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/13950

@vbraun
Copy link
Member

vbraun commented Jan 13, 2013

comment:2

Looks good to me!

The non-ascii plugin causes lots of false positives. It is fine to use non-ascii in thedocumentation if the correct coding was specified in the file. While non-ascii in the actual code would be wrong, of course. Though that can't be checked by looking at the patch only, but requires some actual understanding of the code...

@vbraun
Copy link
Member

vbraun commented Jan 13, 2013

comment:3

Is this ticket ready for review?

@robertwb
Copy link
Contributor Author

comment:4

OK, a couple more tweaks on spkg detection. Yeah, the non-ascii probably has some false-positives, but as we've seen it's better than false-negatives going in, and in any case the blue means "check this out/are you sure you want to do this" rather than "this is a blocker." Thanks for looking at this (and running a patchbot!)

@robertwb

This comment has been minimized.

@vbraun
Copy link
Member

vbraun commented Jan 15, 2013

comment:5

Looks good to me. Can you commit the change to the SPKG.txt?

@vbraun
Copy link
Member

vbraun commented Jan 15, 2013

comment:6

I get this mysterious error:

[...]
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 814.2 seconds
Tests -- 814 seconds

Apply -- 19 seconds
Build -- 1 seconds
plugins.commit_messages -- 0 seconds
plugins.coverage -- 2 seconds
plugins.non_ascii -- 0 seconds
plugins.startup_time -- 52 seconds
plugins.startup_modules -- 1 seconds
Tests -- 814 seconds
2013-01-15 15:53:13 +0000
890 seconds
Reporting 0 PluginFailed
0 PluginFailed
ok
Done reporting 0



Failing tests in your install: PluginFailed. Continue anyways? [y/N] 

@robertwb

This comment has been minimized.

@robertwb
Copy link
Contributor Author

comment:7

OK, new spkg.

It looks like something happened on your system around [..., 0.718256950378418, 0.7461650371551514, 0.8157329559326172, ...]. I've changed it to be more aggressive about keeping only the lower timings and switch between ticket/pristine one more time.

@vbraun
Copy link
Member

vbraun commented Jan 15, 2013

comment:8

I still got "Reporting 0 PluginFailed"....

@robertwb
Copy link
Contributor Author

comment:9

I am mystified. Here's the log

http://patchbot.sagemath.org/log/0/Fedora/18/x86_64/3.7.2-201.fc18.x86_64/volker-desktop.stp.dias.ie/2013-01-15%2023:18:53%20+0000?plugin=plugins.startup_time

The timings are right there (and I even ignore the top 15, but that doesn't impact the results much here as there's little warm-up needed).

What the patchbot did is

for k in range(10):
    os.system("sage -b 0")
    for n in range(8):
        os.system("sage -c ''")

and, with a high statistical significance, it took longer when k was even then when k was odd. Unless I'm completely off on the math here...

Any ideas?

@vbraun
Copy link
Member

vbraun commented Jan 16, 2013

comment:10

The even/odd values of k are main vs ticket?

Main:   0.65512 sec (25 samples, std_dev=0.00494)
Ticket: 0.65906 sec (25 samples, std_dev=0.00382)

Just eyeballing things, they are within one standard deviation. So Student-t is about 3. Which makes it pretty unlikely.

I think its dangerous to remove the top 15 values here, though. It makes the distribution artificially narrower, which inflates the t-value.

Also, I was using the computer at the time and its possible that I did something that was periodic with the frequency of the tests. E.g. click on web links in a ~5 second rhythm. In other words, I'm not sure that it is valid to treat the samples as statistically independent. Its probably better to randomize which test counts as main and ticket for the startup.

With 70% confidence, startup time increased by at least 0.5%
With 75% confidence, startup time increased by at least 0.47%
With 90% confidence, startup time increased by at least 0.36%
With 95% confidence, startup time increased by at least 0.29%
With 99% confidence, startup time increased by at least 0.17%
With 99.7% confidence, startup time increased by at least 0.1%
With 99.9% confidence, startup time increased by at least 0.035%

I'm a bit confused here. You can't be 100% sure that the startup time increased at all. Which statistical test is going on? (Yes I could read the source ;-)

@robertwb

This comment has been minimized.

@robertwb
Copy link
Contributor Author

comment:11

Replying to @vbraun:

The even/odd values of k are main vs ticket?

Main:   0.65512 sec (25 samples, std_dev=0.00494)
Ticket: 0.65906 sec (25 samples, std_dev=0.00382)

Just eyeballing things, they are within one standard deviation. So Student-t is about 3. Which makes it pretty unlikely.

I think its dangerous to remove the top 15 values here, though. It makes the distribution artificially narrower, which inflates the t-value.

Really, what we're trying to do is estimate the minimum startup time, and cut out the possibility of some short-term activity messing up the timings. I'm open for other suggestions.

Also, I was using the computer at the time and its possible that I did something that was periodic with the frequency of the tests. E.g. click on web links in a ~5 second rhythm. In other words, I'm not sure that it is valid to treat the samples as statistically independent. Its probably better to randomize which test counts as main and ticket for the startup.

I'd think it'd already be randomized based on when the test starts. I alternate back and forth to try to eliminate any bias, but I suppose this won't catch something periodic.

With 70% confidence, startup time increased by at least 0.5%
With 75% confidence, startup time increased by at least 0.47%
With 90% confidence, startup time increased by at least 0.36%
With 95% confidence, startup time increased by at least 0.29%
With 99% confidence, startup time increased by at least 0.17%
With 99.7% confidence, startup time increased by at least 0.1%
With 99.9% confidence, startup time increased by at least 0.035%

I'm a bit confused here. You can't be 100% sure that the startup time increased at all. Which statistical test is going on? (Yes I could read the source ;-)

Argh, found the error: https://github.com/robertwb/sage-patchbot/blob/master/src/plugins.py#L244 I should be using variance, not standard deviation. I'm using Welch's t-test (well, actually computing various confidence intervals around the observed mean.)

Thanks for taking a deeper look at this. Wonder why it never tripped up on my computer. Now I just hope we're doing enough samples to ever detect anything...

@vbraun
Copy link
Member

vbraun commented Jan 16, 2013

comment:12

Welch t-test is appropriate, but I don't think this is right:

err = math.sqrt(s1**2 * (n1-1) / n1 + s2**2 * (n2-1) / n2)

The t-value should asymptotically be proportional to sqrt(n), so that more samples = more certain that the observed difference is actual effect (if sample means and stddevs remain unchanged).

@robertwb
Copy link
Contributor Author

comment:13

Originally I had

err = math.sqrt(s1**2 / n1 + s2**2 / n2)

which, after more thought (and actually looking it up) was correct, so I've changed it back. I've also changed it so that the number of runs on each branch varies, so as to avoid periodic issues. You're right about taking the max influencing the distribution, but it seems the simplest way to avoid one or two random actions on the system completely throwing things off.

@robertwb

This comment has been minimized.

@vbraun
Copy link
Member

vbraun commented Jan 19, 2013

comment:14

Aren't the large outliers (in one direction only!) a sign that the distribution is not normal? The standard technique to deal with that would be a non-parametric (=does not depend on actual values but only on comparison of samples) test, and in particular Wilcox rank-sum: http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U You wouldn't have to change much, its just a different test statistic that you plug into the cumulative distribution function.

@robertwb
Copy link
Contributor Author

comment:15

Yes, that looks better suited, though for your data it still gives z=2.76 (!). Is there any way to get at the probability of the magnitude of the change (e.g. could I reject/accept the null hypothosis [x-0.001 for x in ticket_timings], or is this statistically unsound (I haven't taken stats since undergrad)).

@vbraun
Copy link
Member

vbraun commented Jan 19, 2013

comment:16

I think its perfectly fine to re-run the test with a shift in the timings to estimate the probability of a certain magnitude increase.

One of the hidden assumptions of the test is that the distribution, even though it is not normal, is at least the same for the two populations that you are comparing. Thats at least not changed by the constant shift.

@robertwb
Copy link
Contributor Author

comment:17

OK, I've implemented the Mann-Whitney U-test which seems to be a much better fit. PTAL.

@robertwb

This comment has been minimized.

@vbraun
Copy link
Member

vbraun commented Jan 28, 2013

Reviewer: Volker Braun

@vbraun
Copy link
Member

vbraun commented Jan 28, 2013

comment:18

Sounds good to me!

@vbraun
Copy link
Member

vbraun commented Jan 28, 2013

Author: Robert Bradshaw

@haraldschilly
Copy link
Member

comment:20

spkg on the servers is updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants