Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better numerical accuracy testing #10952

Closed
robertwb opened this issue Mar 17, 2011 · 64 comments
Closed

better numerical accuracy testing #10952

robertwb opened this issue Mar 17, 2011 · 64 comments

Comments

@robertwb
Copy link
Contributor

If a line contains tol or tolerance, numerical results are only
verified to the given tolerance. This may be prefixed by abs[olute] or rel[ative] to specify whether to measure absolute or relative error; defaults to relative error except when the expected value is exactly zero:

        sage: RDF(pi)                               # abs tol 1e-5 
        3.14159 
        sage: [10^n for n in [0.0 .. 4]]            # rel tol 2e-4 
        [0.9999, 10.001, 100.01, 999.9, 10001] 

This can be useful when the exact output is subject to rounding error and/or processor floating point arithmetic variation.


Related:


Apply

  1. attachment: 10952-tol-bin.2.patch
  2. attachment: trac_10952-ref.patch
    to the Sage scripts repository.

Apply

  1. attachment: 10952-tol-doc.2.patch
  2. attachment: trac_10952-reviewer-docs-v3.patch
    to the Sage library repository.

CC: @jasongrout @kcrisman

Component: doctest coverage

Keywords: sd32 noise noisy doctest failure error tolerance

Author: Robert Bradshaw, Rob Beezer

Reviewer: Jason Grout, Mariah Lenox, William Stein, John Palmieri

Merged: sage-4.7.2.alpha3

Issue created by migration from https://trac.sagemath.org/ticket/10952

@robertwb
Copy link
Contributor Author

Attachment: noise.patch.gz

@robertwb

This comment has been minimized.

@jasongrout
Copy link
Member

comment:2

Amazing speed getting this up! Shouldn't "((?:abs(?:solute)?)" be "((?:abs(?:olute)?)" ?

@jasongrout
Copy link
Member

comment:3

Nitpicky: This comment is now incorrect:

# following three: used only for parsing only_optional; list of comments 

Also, something should be added to the docs about this new option. Perhaps here: http://sagemath.org/doc/developer/conventions.html#further-conventions-for-automated-testing-of-examples

@robertwb
Copy link
Contributor Author

apply only this patch to the bin repo

@robertwb
Copy link
Contributor Author

Attachment: 10952-tol-bin.patch.gz

@robertwb
Copy link
Contributor Author

comment:4

Attachment: 10952-tol-doc.patch.gz

Thanks for the comments, new patches attached.

@kiwifb
Copy link
Member

kiwifb commented Apr 24, 2011

comment:6

Would this works with test of that kind as well:

File "/usr/share/sage/devel/sage-main/sage/stats/hmm/chmm.pyx", line 579:
    sage: m.viterbi([-2,-1,.1,0.3])
Expected:
    ([1, 1, 1, 0], -9.5660236533785135)
Got:
    ([1, 1, 1, 0], -9.566023653378513)

or even

File "/usr/share/sage/devel/sage-main/sage/categories/examples/semigroups.py", line 32:
    sage: S.some_elements()
Expected:
    [3, 42, 'a', 3.3999999999999999, 'raton laveur']
Got:
    [3, 42, 'a', 3.4, 'raton laveur']

(test failures courtesy of my work on python-2.7)

@robertwb
Copy link
Contributor Author

comment:7

Yes, it would (though you do have to explicitly annotate the test with a tolerance, and if Python 2.7 consistently gives, e.g., 3.4 then we should change the doctest rather than make it fuzzy.)

@sagetrac-mariah
Copy link
Mannequin

sagetrac-mariah mannequin commented May 13, 2011

comment:8

In trying to review this ticket, I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. Yet I get

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.50000000000000 2 1.00000000000000e-12

Am I missing something?

@robertwb
Copy link
Contributor Author

comment:9

No, that is correct.

The example in the description is an example of something that would pass doctests (even though the true output doesn't match exactly, you're seeing what the actual output is).

@jasongrout
Copy link
Member

comment:10

Another nitpicky: the doc patch misspells "arithmetic". Definitely not big enough to stop a positive review, but also not big enough to be worth my effort in making a new patch right now.

@robertwb
Copy link
Contributor Author

comment:11

Attachment: 10952-tol-doc.2.patch.gz

Argh. Fixed.

@jasongrout
Copy link
Member

comment:12

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

@kcrisman
Copy link
Member

comment:13

Replying to @jasongrout:

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

I have to admit that would be convenient. But wouldn't that create a privileged class of commit people who don't need review? At least that seems to be the model for the projects I'm familiar with.

@kcrisman
Copy link
Member

Reviewer: Jason Grout

@kcrisman
Copy link
Member

Author: Robert Bradshaw

@jasongrout
Copy link
Member

comment:14

That person would be the release manager. He would be the only one able (or supposed to) merge into the master branch.

On the other hand, given our problems with finding release managers who could do the entire release process, maybe sharing the burden to commit patches to master wouldn't be a bad idea.

@sagetrac-mariah
Copy link
Mannequin

sagetrac-mariah mannequin commented May 16, 2011

comment:15

I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. I then modified a doctest to include the lines in
the ticket description

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.499999 2.0001 0

I then did 'sage -b' again. When I run the doctest, I get

% ./sage -t  -long -force_lib "devel/sage/sage/symbolic/units.py"
sage -t -long -force_lib "devel/sage/sage/symbolic/units.py"
**********************************************************************
File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/devel/sage/sage/symbolic/units.py", line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
Exception raised:
    Traceback (most recent call last):
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[6]>", line 1
         res =  print "The answers are", RealNumber('1.5'), Integer(2), RealNumber('1e-12') # tol 1e-3###line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
                    ^
     SyntaxError: invalid syntax
**********************************************************************

Note that if I put the lines

    sage: RDF(pi)                               # abs tol 1e-5
    3.14159

in the doctest, then the doctest passes.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 9, 2011

comment:31

Replying to @jhpalmieri:

I think the regexp should something like

' ((\.[0-9]+|[0-9]+(\.[0-9]*)?)e[+-]?[0-9]+)'`

But that doesn't match 1ee7 :(

You can of course match funny things first, i.e. use more general patterns, but you then have to check the matched expression further before passing it to float().

The whole idea isn't that bad, but carries the danger that people simply increase the tolerance (too much) just to make doctests pass somehow, without caring where the variations originate from.

@jhpalmieri
Copy link
Member

comment:32

But float(1ee7) raises an error. Why would we want to match that?

Re the issue of raising tolerance, I hope that referees will keep an eye on that sort of thing. Dave Kirkby, for example, is a strong advocate of not arbitrarily raising tolerance.

@robertwb
Copy link
Contributor Author

robertwb commented Sep 9, 2011

Attachment: 10952-tol-bin.2.patch.gz

@robertwb
Copy link
Contributor Author

robertwb commented Sep 9, 2011

comment:33

I've fixed the issue with blanklines terminating example blocks. As for the regular expression for floats, I don't see any need to make it more complicated--I'd rather match miss-typed numbers and raise an error than silently ignore them. If you feel strongly, this could be changed.

Referees should keep an eye on precision. This is back to my philosophy that a computer should run doctests and a human read the code (though we could add a patchbot plugin to flag drops in precision). It's not a new issue--we've been using "..." for quite a while now which is less flexible and more dangerous (as it can match more than just a single number).

@robertwb

This comment has been minimized.

@nexttime nexttime mannequin added s: needs review and removed s: needs work labels Sep 9, 2011
@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 9, 2011

comment:35

How about emitting the extra code only if # tol ... is used at all in the file?

That's a simple """... %s ...""" % (tolerance_code if uses_tol else "").

@robertwb
Copy link
Contributor Author

robertwb commented Sep 9, 2011

comment:36

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 9, 2011

comment:37

Replying to @robertwb:

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

Added complexity? One boolean variable, initialized to false and set to True in only a few places?

This not only reduces the file size and run (import!) time but also avoids more potential name clashes.

Why should one add useless code to each and every file to doctest not using # tol?

@robertwb
Copy link
Contributor Author

robertwb commented Sep 9, 2011

comment:38

Either we make this variable a global (ugly) or return it on from call to doc_preparse along with the parsed docstring (also ugly). All to save a fraction of a block (~1KB) of an ephemeral file and < 1ms on an already fairly expensive operation. If we're looking to cut fat, it'd be better to avoid the quadratic-time creation of the doctest file string.

The name clash is a red herring--if we avoid it only because # tol is not used, that's a potentially latent bug in my mind. The function could be renamed if need be.

@jhpalmieri
Copy link
Member

comment:39

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures:

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py"
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", iled example:
    check_with_tolerance('''
        3.14159
    ''', res, 9.9999999999999998e-13, 'abs')
Exception raised:
    Traceback (most recent call last):
      File "/Applications/sage/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/Applications/sage/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/Applications/sage/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[11]>", line 3, in <module>
        ''', res, 9.9999999999999998e-13, 'abs')
      File "/Users/palmieri/.sage//tmp/new.py", line 35, in check_with_tolerance
        assert abs(expected_value - actual_value) < epsilon, "Out of tolerance %s vs %s" % (expected_value, actual_value)
    AssertionError: Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   2 of  13 in __main__.example_0
***Test Failed*** 2 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [3.9 s]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look. For me, this gives this sort of output (with a slightly different file):

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/new.py"   
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 6:
    sage: RDF(pi)                               # abs tol 1e-6
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 16:
    sage: RDF(pi)                               # abs tol 1e-8
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   3 of  16 in __main__.example_0
***Test Failed*** 3 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [4.2 s]

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

Finally, the issue with doctests in conventions.rst is, I think, that the various triple quotes """ confuse the parsing routine (the function pythonify_rst in particular). This belongs on another ticket. That file contains enough incomplete code stubs that I wonder if it shouldn't be doctested at all...

@jhpalmieri

This comment has been minimized.

@jhpalmieri
Copy link
Member

scripts repo; apply on top of other scripts patch

@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 9, 2011

comment:40

Attachment: trac_10952-ref.patch.gz

Replying to @jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only

sage: foo   # tol 1ee7
anything

should always pass. ;-)

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

@robertwb
Copy link
Contributor Author

comment:41

Replying to @nexttime:

Replying to @jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only

sage: foo   # tol 1ee7
anything

should always pass. ;-)

Currently, it raises an error indicating that the doctest itself is bad.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

Yes, it's a tradeoff between adding "just a little" python parsing overhead to the testing infrastructure or "just a little" human parsing overhead to the testing infrastructure. I'd rather put the burden on machines than developers in this case.

The reviewer patches look good, certainly an improvement, so I'm setting this back to positive review unless there's something else that needs to be taken care of.

@nexttime

This comment has been minimized.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 26, 2011

Changed keywords from sd32 to sd32 noise noisy doctest failure error tolerance

@nexttime
Copy link
Mannequin

nexttime mannequin commented Sep 27, 2011

Merged: sage-4.7.2.alpha3

@nexttime nexttime mannequin removed the s: positive review label Sep 27, 2011
@nexttime nexttime mannequin closed this as completed Sep 27, 2011
@kcrisman
Copy link
Member

comment:44

I'd like to draw the attention of folks here to #12493 comment:7. Apparently one can't do only-optional tests along with tol tests at the same time. My guess is that not too many people use only-optional and the tol stuff is pretty new.

In fact, I only found one occurrence in 5.0.beta3. Can that be right? This has been in Sage for months!

sage: search_src(" tol ","#")
symbolic/integration/integral.py:587:        sage: error.numerical_approx() # abs tol 10e-10

Anyway, even if my analysis is wrong (let's hope it's easier than that), I figure the people here can give a quick diagnosis of #12493.

@kcrisman
Copy link
Member

comment:45

Actually, #12493 comment:8 is even better! Optional and tol don't play well together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants