better numerical accuracy testing #10952

robertwb · 2011-03-17T08:10:07Z

If a line contains tol or tolerance, numerical results are only
verified to the given tolerance. This may be prefixed by abs[olute] or rel[ative] to specify whether to measure absolute or relative error; defaults to relative error except when the expected value is exactly zero:

        sage: RDF(pi)                               # abs tol 1e-5 
        3.14159 
        sage: [10^n for n in [0.0 .. 4]]            # rel tol 2e-4 
        [0.9999, 10.001, 100.01, 999.9, 10001]

This can be useful when the exact output is subject to rounding error and/or processor floating point arithmetic variation.

.zero_at(epsilon) methods, to fix noisy (and signed) zeroes; see for example zero_at() method for RDF/CDF vectors #11848.

Apply

attachment: 10952-tol-bin.2.patch
attachment: trac_10952-ref.patch
to the Sage scripts repository.

Apply

attachment: 10952-tol-doc.2.patch
attachment: trac_10952-reviewer-docs-v3.patch
to the Sage library repository.

CC: @jasongrout @kcrisman

Component: doctest coverage

Keywords: sd32 noise noisy doctest failure error tolerance

Author: Robert Bradshaw, Rob Beezer

Reviewer: Jason Grout, Mariah Lenox, William Stein, John Palmieri

Merged: sage-4.7.2.alpha3

Issue created by migration from https://trac.sagemath.org/ticket/10952

The text was updated successfully, but these errors were encountered:

robertwb · 2011-03-17T08:11:34Z

Attachment: noise.patch.gz

jasongrout · 2011-03-17T13:45:04Z

comment:2

Amazing speed getting this up! Shouldn't "((?:abs(?:solute)?)" be "((?:abs(?:olute)?)" ?

jasongrout · 2011-03-17T13:51:02Z

comment:3

Nitpicky: This comment is now incorrect:

# following three: used only for parsing only_optional; list of comments

Also, something should be added to the docs about this new option. Perhaps here: http://sagemath.org/doc/developer/conventions.html#further-conventions-for-automated-testing-of-examples

robertwb · 2011-03-18T06:30:31Z

apply only this patch to the bin repo

robertwb · 2011-03-18T06:30:47Z

Attachment: 10952-tol-bin.patch.gz

robertwb · 2011-03-18T06:31:46Z

comment:4

Attachment: 10952-tol-doc.patch.gz

Thanks for the comments, new patches attached.

kiwifb · 2011-04-24T03:14:50Z

comment:6

Would this works with test of that kind as well:

File "/usr/share/sage/devel/sage-main/sage/stats/hmm/chmm.pyx", line 579:
    sage: m.viterbi([-2,-1,.1,0.3])
Expected:
    ([1, 1, 1, 0], -9.5660236533785135)
Got:
    ([1, 1, 1, 0], -9.566023653378513)

or even

File "/usr/share/sage/devel/sage-main/sage/categories/examples/semigroups.py", line 32:
    sage: S.some_elements()
Expected:
    [3, 42, 'a', 3.3999999999999999, 'raton laveur']
Got:
    [3, 42, 'a', 3.4, 'raton laveur']

(test failures courtesy of my work on python-2.7)

robertwb · 2011-04-24T04:58:20Z

comment:7

Yes, it would (though you do have to explicitly annotate the test with a tolerance, and if Python 2.7 consistently gives, e.g., 3.4 then we should change the doctest rather than make it fuzzy.)

sagetrac-mariah · 2011-05-13T14:19:23Z

comment:8

In trying to review this ticket, I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. Yet I get

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.50000000000000 2 1.00000000000000e-12

Am I missing something?

robertwb · 2011-05-14T00:24:00Z

comment:9

No, that is correct.

The example in the description is an example of something that would pass doctests (even though the true output doesn't match exactly, you're seeing what the actual output is).

jasongrout · 2011-05-14T00:32:03Z

comment:10

Another nitpicky: the doc patch misspells "arithmetic". Definitely not big enough to stop a positive review, but also not big enough to be worth my effort in making a new patch right now.

robertwb · 2011-05-14T00:36:56Z

comment:11

Attachment: 10952-tol-doc.2.patch.gz

Argh. Fixed.

jasongrout · 2011-05-14T00:39:48Z

comment:12

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

kcrisman · 2011-05-14T11:22:36Z

comment:13

Replying to @jasongrout:

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

I have to admit that would be convenient. But wouldn't that create a privileged class of commit people who don't need review? At least that seems to be the model for the projects I'm familiar with.

kcrisman · 2011-05-14T11:22:36Z

Reviewer: Jason Grout

kcrisman · 2011-05-14T11:22:36Z

Author: Robert Bradshaw

jasongrout · 2011-05-14T15:02:00Z

comment:14

That person would be the release manager. He would be the only one able (or supposed to) merge into the master branch.

On the other hand, given our problems with finding release managers who could do the entire release process, maybe sharing the burden to commit patches to master wouldn't be a bad idea.

sagetrac-mariah · 2011-05-16T15:51:12Z

comment:15

I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. I then modified a doctest to include the lines in
the ticket description

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.499999 2.0001 0

I then did 'sage -b' again. When I run the doctest, I get

% ./sage -t  -long -force_lib "devel/sage/sage/symbolic/units.py"
sage -t -long -force_lib "devel/sage/sage/symbolic/units.py"
**********************************************************************
File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/devel/sage/sage/symbolic/units.py", line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
Exception raised:
    Traceback (most recent call last):
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[6]>", line 1
         res =  print "The answers are", RealNumber('1.5'), Integer(2), RealNumber('1e-12') # tol 1e-3###line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
                    ^
     SyntaxError: invalid syntax
**********************************************************************

Note that if I put the lines

    sage: RDF(pi)                               # abs tol 1e-5
    3.14159

in the doctest, then the doctest passes.

nexttime · 2011-09-09T04:00:06Z

comment:31

Replying to @jhpalmieri:

I think the regexp should something like

' ((\.[0-9]+|[0-9]+(\.[0-9]*)?)e[+-]?[0-9]+)'`

But that doesn't match 1ee7 :(

You can of course match funny things first, i.e. use more general patterns, but you then have to check the matched expression further before passing it to float().

The whole idea isn't that bad, but carries the danger that people simply increase the tolerance (too much) just to make doctests pass somehow, without caring where the variations originate from.

jhpalmieri · 2011-09-09T05:01:22Z

comment:32

But float(1ee7) raises an error. Why would we want to match that?

Re the issue of raising tolerance, I hope that referees will keep an eye on that sort of thing. Dave Kirkby, for example, is a strong advocate of not arbitrarily raising tolerance.

robertwb · 2011-09-09T05:20:21Z

Attachment: 10952-tol-bin.2.patch.gz

robertwb · 2011-09-09T05:26:57Z

comment:33

I've fixed the issue with blanklines terminating example blocks. As for the regular expression for floats, I don't see any need to make it more complicated--I'd rather match miss-typed numbers and raise an error than silently ignore them. If you feel strongly, this could be changed.

Referees should keep an eye on precision. This is back to my philosophy that a computer should run doctests and a human read the code (though we could add a patchbot plugin to flag drops in precision). It's not a new issue--we've been using "..." for quite a while now which is less flexible and more dangerous (as it can match more than just a single number).

nexttime · 2011-09-09T05:48:36Z

comment:35

How about emitting the extra code only if # tol ... is used at all in the file?

That's a simple """... %s ...""" % (tolerance_code if uses_tol else "").

robertwb · 2011-09-09T06:15:53Z

comment:36

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

nexttime · 2011-09-09T06:28:28Z

comment:37

Replying to @robertwb:

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

Added complexity? One boolean variable, initialized to false and set to True in only a few places?

This not only reduces the file size and run (import!) time but also avoids more potential name clashes.

Why should one add useless code to each and every file to doctest not using # tol?

robertwb · 2011-09-09T08:49:12Z

comment:38

Either we make this variable a global (ugly) or return it on from call to doc_preparse along with the parsed docstring (also ugly). All to save a fraction of a block (~1KB) of an ephemeral file and < 1ms on an already fairly expensive operation. If we're looking to cut fat, it'd be better to avoid the quadratic-time creation of the doctest file string.

The name clash is a red herring--if we avoid it only because # tol is not used, that's a potentially latent bug in my mind. The function could be renamed if need be.

jhpalmieri · 2011-09-09T22:09:48Z

comment:39

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures:

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py"
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", iled example:
    check_with_tolerance('''
        3.14159
    ''', res, 9.9999999999999998e-13, 'abs')
Exception raised:
    Traceback (most recent call last):
      File "/Applications/sage/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/Applications/sage/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/Applications/sage/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[11]>", line 3, in <module>
        ''', res, 9.9999999999999998e-13, 'abs')
      File "/Users/palmieri/.sage//tmp/new.py", line 35, in check_with_tolerance
        assert abs(expected_value - actual_value) < epsilon, "Out of tolerance %s vs %s" % (expected_value, actual_value)
    AssertionError: Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   2 of  13 in __main__.example_0
***Test Failed*** 2 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [3.9 s]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look. For me, this gives this sort of output (with a slightly different file):

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/new.py"   
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 6:
    sage: RDF(pi)                               # abs tol 1e-6
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 16:
    sage: RDF(pi)                               # abs tol 1e-8
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   3 of  16 in __main__.example_0
***Test Failed*** 3 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [4.2 s]

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

Finally, the issue with doctests in conventions.rst is, I think, that the various triple quotes """ confuse the parsing routine (the function pythonify_rst in particular). This belongs on another ticket. That file contains enough incomplete code stubs that I wonder if it shouldn't be doctested at all...

jhpalmieri · 2011-09-09T22:10:12Z

scripts repo; apply on top of other scripts patch

nexttime · 2011-09-09T22:31:48Z

comment:40

Attachment: trac_10952-ref.patch.gz

Replying to @jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only

sage: foo   # tol 1ee7
anything

should always pass. ;-)

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

robertwb · 2011-09-22T21:36:08Z

comment:41

Replying to @nexttime:

Replying to @jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only
sage: foo   # tol 1ee7
anything
should always pass. ;-)

Currently, it raises an error indicating that the doctest itself is bad.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

Yes, it's a tradeoff between adding "just a little" python parsing overhead to the testing infrastructure or "just a little" human parsing overhead to the testing infrastructure. I'd rather put the burden on machines than developers in this case.

The reviewer patches look good, certainly an improvement, so I'm setting this back to positive review unless there's something else that needs to be taken care of.

nexttime · 2011-09-26T02:00:02Z

Changed keywords from sd32 to sd32 noise noisy doctest failure error tolerance

nexttime · 2011-09-27T17:41:07Z

Merged: sage-4.7.2.alpha3

kcrisman · 2012-02-10T20:06:36Z

comment:44

I'd like to draw the attention of folks here to #12493 comment:7. Apparently one can't do only-optional tests along with tol tests at the same time. My guess is that not too many people use only-optional and the tol stuff is pretty new.

In fact, I only found one occurrence in 5.0.beta3. Can that be right? This has been in Sage for months!

sage: search_src(" tol ","#")
symbolic/integration/integral.py:587:        sage: error.numerical_approx() # abs tol 10e-10

Anyway, even if my analysis is wrong (let's hope it's easier than that), I figure the people here can give a quick diagnosis of #12493.

kcrisman · 2012-02-10T20:07:22Z

comment:45

Actually, #12493 comment:8 is even better! Optional and tol don't play well together.

robertwb added t: tests labels Mar 17, 2011

robertwb assigned sagetrac-mvngu Mar 17, 2011

This comment has been minimized.

Sign in to view

robertwb added the s: needs review label Mar 17, 2011

sagetrac-mariah mannequin added s: needs info and removed s: needs review labels May 13, 2011

kcrisman added s: needs review and removed s: needs info labels May 14, 2011

sagetrac-mariah mannequin added s: needs work and removed s: needs review labels May 16, 2011

jhpalmieri added s: needs work and removed s: positive review labels Sep 9, 2011

This comment has been minimized.

Sign in to view

nexttime mannequin added s: needs review and removed s: needs work labels Sep 9, 2011

This comment has been minimized.

Sign in to view

robertwb added s: positive review and removed s: needs review labels Sep 22, 2011

This comment has been minimized.

Sign in to view

nexttime mannequin removed the s: positive review label Sep 27, 2011

nexttime mannequin closed this as completed Sep 27, 2011

jhpalmieri mentioned this issue Sep 27, 2011

allow doctest script to handle docstrings with triple single quotes #8708

Closed

This was referenced Mar 7, 2013

Update doctest tolerance to work with print statement. #11336

Closed

Update doctesting framework #12415

Closed

better numerical accuracy testing #10952

better numerical accuracy testing #10952

Comments

robertwb commented Mar 17, 2011

robertwb commented Mar 17, 2011

This comment has been minimized.

jasongrout commented Mar 17, 2011

jasongrout commented Mar 17, 2011

robertwb commented Mar 18, 2011

robertwb commented Mar 18, 2011

robertwb commented Mar 18, 2011

kiwifb commented Apr 24, 2011

robertwb commented Apr 24, 2011

sagetrac-mariah mannequin commented May 13, 2011

robertwb commented May 14, 2011

jasongrout commented May 14, 2011

robertwb commented May 14, 2011

jasongrout commented May 14, 2011

kcrisman commented May 14, 2011

kcrisman commented May 14, 2011

kcrisman commented May 14, 2011

jasongrout commented May 14, 2011

sagetrac-mariah mannequin commented May 16, 2011

nexttime mannequin commented Sep 9, 2011

jhpalmieri commented Sep 9, 2011

robertwb commented Sep 9, 2011

robertwb commented Sep 9, 2011

This comment has been minimized.

nexttime mannequin commented Sep 9, 2011

robertwb commented Sep 9, 2011

nexttime mannequin commented Sep 9, 2011

robertwb commented Sep 9, 2011

jhpalmieri commented Sep 9, 2011

This comment has been minimized.

jhpalmieri commented Sep 9, 2011

nexttime mannequin commented Sep 9, 2011

robertwb commented Sep 22, 2011

This comment has been minimized.

nexttime mannequin commented Sep 26, 2011

nexttime mannequin commented Sep 27, 2011

kcrisman commented Feb 10, 2012

kcrisman commented Feb 10, 2012