Sage-patchbot improvements #13950

robertwb · 2013-01-13T10:15:11Z

Spkg up at http://sage.math.washington.edu/home/robertwb/patches/patchbot-1.2.5.spkg Most notably, this adds:

Startup-time plugin.
Non-ascii plugin.
Spkg inspection.
Better redundancy avoidance.

CC: @vbraun

Component: packages: optional

Author: Robert Bradshaw

Reviewer: Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/13950

vbraun · 2013-01-13T11:41:14Z

comment:2

Looks good to me!

The non-ascii plugin causes lots of false positives. It is fine to use non-ascii in thedocumentation if the correct coding was specified in the file. While non-ascii in the actual code would be wrong, of course. Though that can't be checked by looking at the patch only, but requires some actual understanding of the code...

vbraun · 2013-01-13T11:59:57Z

comment:3

Is this ticket ready for review?

robertwb · 2013-01-15T06:45:04Z

comment:4

OK, a couple more tweaks on spkg detection. Yeah, the non-ascii probably has some false-positives, but as we've seen it's better than false-negatives going in, and in any case the blue means "check this out/are you sure you want to do this" rather than "this is a blocker." Thanks for looking at this (and running a patchbot!)

vbraun · 2013-01-15T15:38:00Z

comment:5

Looks good to me. Can you commit the change to the SPKG.txt?

vbraun · 2013-01-15T18:12:09Z

comment:6

I get this mysterious error:

[...]
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 814.2 seconds
Tests -- 814 seconds

Apply -- 19 seconds
Build -- 1 seconds
plugins.commit_messages -- 0 seconds
plugins.coverage -- 2 seconds
plugins.non_ascii -- 0 seconds
plugins.startup_time -- 52 seconds
plugins.startup_modules -- 1 seconds
Tests -- 814 seconds
2013-01-15 15:53:13 +0000
890 seconds
Reporting 0 PluginFailed
0 PluginFailed
ok
Done reporting 0



Failing tests in your install: PluginFailed. Continue anyways? [y/N]

robertwb · 2013-01-15T18:59:10Z

comment:7

OK, new spkg.

It looks like something happened on your system around [..., 0.718256950378418, 0.7461650371551514, 0.8157329559326172, ...]. I've changed it to be more aggressive about keeping only the lower timings and switch between ticket/pristine one more time.

vbraun · 2013-01-15T22:42:27Z

comment:8

I still got "Reporting 0 PluginFailed"....

robertwb · 2013-01-16T00:05:09Z

comment:9

I am mystified. Here's the log

http://patchbot.sagemath.org/log/0/Fedora/18/x86_64/3.7.2-201.fc18.x86_64/volker-desktop.stp.dias.ie/2013-01-15%2023:18:53%20+0000?plugin=plugins.startup_time

The timings are right there (and I even ignore the top 15, but that doesn't impact the results much here as there's little warm-up needed).

What the patchbot did is

for k in range(10):
    os.system("sage -b 0")
    for n in range(8):
        os.system("sage -c ''")

and, with a high statistical significance, it took longer when k was even then when k was odd. Unless I'm completely off on the math here...

Any ideas?

vbraun · 2013-01-16T00:57:38Z

comment:10

The even/odd values of k are main vs ticket?

Main:   0.65512 sec (25 samples, std_dev=0.00494)
Ticket: 0.65906 sec (25 samples, std_dev=0.00382)

Just eyeballing things, they are within one standard deviation. So Student-t is about 3. Which makes it pretty unlikely.

I think its dangerous to remove the top 15 values here, though. It makes the distribution artificially narrower, which inflates the t-value.

Also, I was using the computer at the time and its possible that I did something that was periodic with the frequency of the tests. E.g. click on web links in a ~5 second rhythm. In other words, I'm not sure that it is valid to treat the samples as statistically independent. Its probably better to randomize which test counts as main and ticket for the startup.

With 70% confidence, startup time increased by at least 0.5%
With 75% confidence, startup time increased by at least 0.47%
With 90% confidence, startup time increased by at least 0.36%
With 95% confidence, startup time increased by at least 0.29%
With 99% confidence, startup time increased by at least 0.17%
With 99.7% confidence, startup time increased by at least 0.1%
With 99.9% confidence, startup time increased by at least 0.035%

I'm a bit confused here. You can't be 100% sure that the startup time increased at all. Which statistical test is going on? (Yes I could read the source ;-)

robertwb · 2013-01-16T02:07:56Z

comment:11

Replying to @vbraun:

The even/odd values of k are main vs ticket?
Main:   0.65512 sec (25 samples, std_dev=0.00494)
Ticket: 0.65906 sec (25 samples, std_dev=0.00382)
Just eyeballing things, they are within one standard deviation. So Student-t is about 3. Which makes it pretty unlikely.

I think its dangerous to remove the top 15 values here, though. It makes the distribution artificially narrower, which inflates the t-value.

Really, what we're trying to do is estimate the minimum startup time, and cut out the possibility of some short-term activity messing up the timings. I'm open for other suggestions.

Also, I was using the computer at the time and its possible that I did something that was periodic with the frequency of the tests. E.g. click on web links in a ~5 second rhythm. In other words, I'm not sure that it is valid to treat the samples as statistically independent. Its probably better to randomize which test counts as main and ticket for the startup.

I'd think it'd already be randomized based on when the test starts. I alternate back and forth to try to eliminate any bias, but I suppose this won't catch something periodic.

With 70% confidence, startup time increased by at least 0.5%
With 75% confidence, startup time increased by at least 0.47%
With 90% confidence, startup time increased by at least 0.36%
With 95% confidence, startup time increased by at least 0.29%
With 99% confidence, startup time increased by at least 0.17%
With 99.7% confidence, startup time increased by at least 0.1%
With 99.9% confidence, startup time increased by at least 0.035%

I'm a bit confused here. You can't be 100% sure that the startup time increased at all. Which statistical test is going on? (Yes I could read the source ;-)

Argh, found the error: https://github.com/robertwb/sage-patchbot/blob/master/src/plugins.py#L244 I should be using variance, not standard deviation. I'm using Welch's t-test (well, actually computing various confidence intervals around the observed mean.)

Thanks for taking a deeper look at this. Wonder why it never tripped up on my computer. Now I just hope we're doing enough samples to ever detect anything...

vbraun · 2013-01-16T11:08:27Z

comment:12

Welch t-test is appropriate, but I don't think this is right:

err = math.sqrt(s1**2 * (n1-1) / n1 + s2**2 * (n2-1) / n2)

The t-value should asymptotically be proportional to sqrt(n), so that more samples = more certain that the observed difference is actual effect (if sample means and stddevs remain unchanged).

robertwb · 2013-01-19T00:31:37Z

comment:13

Originally I had

err = math.sqrt(s1**2 / n1 + s2**2 / n2)

which, after more thought (and actually looking it up) was correct, so I've changed it back. I've also changed it so that the number of runs on each branch varies, so as to avoid periodic issues. You're right about taking the max influencing the distribution, but it seems the simplest way to avoid one or two random actions on the system completely throwing things off.

vbraun · 2013-01-19T12:35:42Z

comment:14

Aren't the large outliers (in one direction only!) a sign that the distribution is not normal? The standard technique to deal with that would be a non-parametric (=does not depend on actual values but only on comparison of samples) test, and in particular Wilcox rank-sum: http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U You wouldn't have to change much, its just a different test statistic that you plug into the cumulative distribution function.

robertwb · 2013-01-19T19:33:14Z

comment:15

Yes, that looks better suited, though for your data it still gives z=2.76 (!). Is there any way to get at the probability of the magnitude of the change (e.g. could I reject/accept the null hypothosis [x-0.001 for x in ticket_timings], or is this statistically unsound (I haven't taken stats since undergrad)).

vbraun · 2013-01-19T21:32:13Z

comment:16

I think its perfectly fine to re-run the test with a shift in the timings to estimate the probability of a certain magnitude increase.

One of the hidden assumptions of the test is that the distribution, even though it is not normal, is at least the same for the two populations that you are comparing. Thats at least not changed by the constant shift.

robertwb · 2013-01-20T06:43:40Z

comment:17

OK, I've implemented the Mann-Whitney U-test which seems to be a much better fit. PTAL.

vbraun · 2013-01-28T19:27:37Z

Reviewer: Volker Braun

vbraun · 2013-01-28T19:27:37Z

comment:18

Sounds good to me!

vbraun · 2013-01-28T19:27:37Z

Author: Robert Bradshaw

haraldschilly · 2013-01-30T16:30:45Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sage-patchbot improvements #13950

Sage-patchbot improvements #13950

robertwb commented Jan 13, 2013

vbraun commented Jan 13, 2013

vbraun commented Jan 13, 2013

robertwb commented Jan 15, 2013

This comment has been minimized.

vbraun commented Jan 15, 2013

vbraun commented Jan 15, 2013

This comment has been minimized.

robertwb commented Jan 15, 2013

vbraun commented Jan 15, 2013

robertwb commented Jan 16, 2013

vbraun commented Jan 16, 2013

This comment has been minimized.

robertwb commented Jan 16, 2013

vbraun commented Jan 16, 2013

robertwb commented Jan 19, 2013

This comment has been minimized.

vbraun commented Jan 19, 2013

robertwb commented Jan 19, 2013

vbraun commented Jan 19, 2013

robertwb commented Jan 20, 2013

This comment has been minimized.

vbraun commented Jan 28, 2013

vbraun commented Jan 28, 2013

vbraun commented Jan 28, 2013

haraldschilly commented Jan 30, 2013

Sage-patchbot improvements #13950

Sage-patchbot improvements #13950

Comments

robertwb commented Jan 13, 2013

vbraun commented Jan 13, 2013

vbraun commented Jan 13, 2013

robertwb commented Jan 15, 2013

This comment has been minimized.

vbraun commented Jan 15, 2013

vbraun commented Jan 15, 2013

This comment has been minimized.

robertwb commented Jan 15, 2013

vbraun commented Jan 15, 2013

robertwb commented Jan 16, 2013

vbraun commented Jan 16, 2013

This comment has been minimized.

robertwb commented Jan 16, 2013

vbraun commented Jan 16, 2013

robertwb commented Jan 19, 2013

This comment has been minimized.

vbraun commented Jan 19, 2013

robertwb commented Jan 19, 2013

vbraun commented Jan 19, 2013

robertwb commented Jan 20, 2013

This comment has been minimized.

vbraun commented Jan 28, 2013

vbraun commented Jan 28, 2013

vbraun commented Jan 28, 2013

haraldschilly commented Jan 30, 2013