Match absolute tolerance of `assert_allclose` with `allclose` and `isclo... #4880

tonysyu · 2014-07-18T04:33:08Z

Note that rtol wasn't altered to reduce backward incompatibility. Using an absolute tolerance of 0 prevents many common cases from passing. Also, matching the absolute tolerance of allclose and isclose makes debugging much easier.

See discussion:

http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070602.html

…close` Note that `rtol` wasn't altered to reduce backward incompatibility. Using an absolute tolerance of 0 prevents many common cases from passing. Also, matching the absolute tolerance of `allclose` and `isclose` makes debugging much easier.

rgommers · 2014-07-18T05:58:47Z

LGTM, thanks Tony.

njsmith · 2014-07-18T09:42:09Z

Note that @josef-pkt has a non-trivial objection to this on the mailing list. I still think we ought to do it, but more discussion might be needed first.

josef-pkt · 2014-07-18T10:24:44Z

statsmodels has

399 assert_allclose(
1541 assert_almost_equal(

We are almost only using now assert_allclose, since we have a minimum version of numpy that has it. When I'm changing old tests then I replace assert_almost_equal with assert_allclose because it's more precise.
A rough guess is that 90% of our asser_allclose don't use the atol argument.
I find atol most of the time much worse than rtol because often the scale of the elements of an array are not very close to each other.

And if we have values close to zero, then often close to zero is much smaller, e.g.

assert_allclose(pvals, res.pvalues, rtol=5e-10, atol=1e-25)

changing the default dilutes the current test suite of packages that use assert_allclose, and we would start to convert and add atol=0 to all affected tests.

assert_allclose(res.model.endog - fitted, res.resid, rtol=1e-12)

tonysyu · 2014-07-18T13:06:13Z

And if we have values close to zero, then often close to zero is much smaller, e.g.

assert_allclose(pvals, res.pvalues, rtol=5e-10, atol=1e-25)

Maybe I'm missing something, but when I mentioned testing values close to zero, I didn't mean testing an array of all zeros. For example, if you have some values which should be 1 and some that should be 0, then round off error will cause the test to fail:

eps = np.finfo(np.double).eps
assert_allclose([1, 0], [1, eps], rtol=5e-10, atol=1e-25)

I think it's reasonable to set atol below 1e-8, but 1e-25 wouldn't help very common cases pass these tests. Something close to, but above, eps would be reasonable. That said, not matching the defaults for allclose and assert_allclose is incredibly counter-intuitive.

josef-pkt · 2014-07-18T20:19:30Z

@tonysyu As I tried to show on some examples on the mailing list.
In most test cases we use rtol because the relative magnitudes are not very close, and we want to test more for significant digits than decimals.
Similar happens for values close to zero. Very often there is no roundoff error based on values in the scale of 1, we can have 1e-20 or 1e-40 accurate to 5 or 10 digits, or in some cases we lump everything that is smaller than 1e-25 together as "essentially zero", and then we use atol=1e-25.
And in some cases we only have atol=1e-6 as in some cases nonlinear optimization.

But the general rule is that we are ok with rtol, but need atol>0 only in a minority of cases, and then atol=1e-8 would be much too large most of the time, forcing us to add atol=0 to the majority of cases instead of relying on the default.

tonysyu · 2014-07-19T20:30:18Z

@josef-pkt: I understand your use-case, and I can see that this change would cause problems. I just think my use-case is very common, and should be the default, but you probably think your use-case is more common. I'm not really sure how to break the deadlock.

njsmith · 2014-07-19T20:58:41Z

Tony- would adding zerotol argument with non-zero default, and setting the
threshold formula to

atol + rtol * desired + (zerotol if desired == 0 else 0)

handle your use case?

On Sat, Jul 19, 2014 at 9:30 PM, Tony S Yu notifications@github.com wrote:

@josef-pkt https://github.com/josef-pkt: I understand your use-case,
and I can see that this change would cause problems. I just think my
use-case is very common, and should be the default, but you probably think
your use-case is more common. I'm not really sure how to break the deadlock.

—
Reply to this email directly or view it on GitHub
#4880 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

josef-pkt · 2014-07-19T21:51:09Z

just browing a few pages on github code search "numpy assert_allclose"
https://github.com/search?p=1&q=assert_allclose+numpy&ref=cmdform&type=Code

almost 5000 results, very few define atol, and then some do in case of desired is zero, some don't even in that case.

numpy.testing.assert_allclose(Y.mean(axis=1), 0, atol=1e-10)
npt.assert_allclose(basis.eigval(0),3.278960295066-40.)
assert_allclose(n.forward([0]), [0])

argriffing · 2014-07-19T23:05:45Z

I don't have any data, but I'd guess that I usually use the default tolerances, and I rarely specify atol except to compare vs. arrays with a desired 0. I've wasted time tracking down problems due to different default tolerances in allclose vs. assert_allclose.

josef-pkt · 2014-07-20T06:03:58Z

I think agriffing's comment would be a good argument to set the default atol=0 in np.allclose. It improves the unit tests for "default" users.

assert np.allclose(design_matrix, design_matrix_R, rtol=1e-12, atol=0.) (from patsy)

In any case it will affect a lot of code, github search for np.allclose has 13,118 results (including repos that keep a copy of scipy or numpy)

njsmith · 2014-07-20T07:47:09Z

@josef-pkt: Does it help if I promise no one is going to go breaking all
your tests? No one is going to go breaking all your tests. But we do have a
problem here and I don't believe your solution is acceptable, so please try
to see the other point of view so we can come up that works for everyone?

I don't see how agriffing's comment is an argument for atol=0 being more
default user friendly at all. As you've noticed, patsy has >100 allclose
tests, and lots of them do contain exact zeros and no atol= arguments.
This default user wrote those with the default expectation that allclose(x,
0) and x == 0 were not the same. I think that's a pretty reasonable
assumption for someone to make.

The defaults really need to work ok, somehow, for someone who doesn't
understand floating point at all except that they've heard that you
shouldn't use ==. It won't be perfect, but I'd rather someone like that be
writing imperfect tests than writing no tests at all...
On 20 Jul 2014 07:04, "Josef Perktold" notifications@github.com wrote:

I think agriffing's comment would be a good argument to set the default
atol=0 in np.allclose. It improves the unit tests for "default" users.

assert np.allclose(design_matrix, design_matrix_R, rtol=1e-12, atol=0.)
(from patsy)

In any case it will affect a lot of code, github search for np.allclose
has 13,118 results (including repos that keep a copy of scipy or numpy)

—
Reply to this email directly or view it on GitHub
#4880 (comment).

rgommers · 2014-07-20T11:51:23Z

I like the zerotol compromise proposal for assert_allclose, it is backwards compatible except for the case where atol=0 doesn't work anyway. Not all zeros compare equal, but most do so this should work pretty well.

@njsmith are you proposing to change allclose to also use zerotol and atol=0, or leave the delta in signatures and just document that?

njsmith · 2014-07-20T12:59:38Z

Eh, if we're going to mess with this stuff we should probably do it right so we don't have to deal with it again later. Here's a concrete proposal to discuss:

allclose and assert_allclose both gain a zerotol= argument, and their threshold becomes: atol + rtol * abs(desired) + (zerotol if desired == 0 else 0)
For a release or two we leave the default for both as zerotol=0, so that there is no immediate behavioral change. This gives us:

       allclose:   rtol=1e-05   atol=1e-08   zerotol=0.0
        isclose:   rtol=1e-05   atol=1e-08   zerotol=0.0
assert_allclose:   rtol=1e-07   atol=0.0     zerotol=0.0

Eventually we want to move to something like:

       allclose:   rtol=1e-07   atol=0.0     zerotol=1e-08
        isclose:   rtol=1e-07   atol=0.0     zerotol=1e-08
assert_allclose:   rtol=1e-07   atol=0.0     zerotol=1e-08

(actual values to be determined!)

The way we do this is to emit a FutureWarning whenever we encounter a call which produces a different result under the two defaults. (By "a call" I mean "a particular combination of actual, desired and any explicit atol, rtol, zerotol values", i.e. if a call isn't using the defaults anyway it won't care if we change them.)

njsmith · 2014-07-20T13:04:53Z

I guess one downside to this proposal is that supporting 3 arguments indefinitely is kind of ugly, esp. when we'd be recommending that everyone use zerotol instead of atol. An alternative would be to support all three during the deprecation period (step 2), but in step 3 have atol be implemented only as a deprecated alias for zerotol.

pv · 2014-07-20T13:56:37Z

I don't really like zerotol --- the change will break backward compat in both functions at some point, and I don't think the kwarg has a valid use case in itself.

The easiest change in my opinion for backward compat would be to just bump assert_allclose tolerances up to match allclose. This should be combined with changing the default sentinel value of atol to def assert_allclose(..., atol='default'), raise FutureWarning if atol=='default', and have 'default' imply 1e-8. The value atol=None is also to be accepted, and as usual mean the same thing as 'default', but without FutureWarning. The FutureWarning needs to tell the user to choose between atol=None and atol=0. After a few releases, we change the default sentinel to atol=None and remove all warnings. The behavior of allclose is to remain unchanged --- we don't want to break stuff unnecessarily. This approach will allow people to easily locate places where they need to make changes.

The rtol bump from 1e-7 to 1e-5 does not need deprecation warnings IMHO.

Any change in any case will result to people having to revise hundreds of test cases using allclose or assert_allclose, and curse our name because we are changing stuff for aesthetic reasons. We need to be sure consistency is worth the hassle of changing anything at all, and I am undecided on this point. (Personally, I also think that the default tolerances should be more like atol=1e3*tiny, rtol=1e3*eps, but this would break even more things.)

rgommers · 2014-07-20T15:21:24Z

@pv changing to atol=None will not be possible for any code supporting older numpy versions, so that doesn't really help.

Released versions of libraries libraries like statsmodels and scipy starting to spew 100s of FutureWarnings when running their test suite with a newer numpy would also be quite annoying. Probably worse than just documenting the inconsistency.

pv · 2014-07-20T16:30:39Z

@rgommers: indeed, so the warning message should then recommend setting explicitly atol=1e-8 or atol=0.

The warning issue will probably turn up in any scheme that spews warnings. If it is done similarly as @njsmith suggests above (only on calls that produce different results), then it'll be slightly less bad.

rgommers · 2014-07-20T16:54:49Z

@pv I expect that the difference wil be huge. Maybe even no warnings at all for scipy, because all uses of assert_allclose that compare to an array which has a zero should have explicit nonzero atol. Maybe with the exception of a couple of tests that need to test for equality to zero.

There's no ideal solution here, but to me 100s of warnings is worse than just documenting the inconsistency.

njsmith · 2014-07-20T17:32:48Z

AFAIU, the issue is that there is lots of code that depends on
assert_allclose's default atol=0 (like statsmodels), and also lots of code
that depends on allclose's default atol > 0. So it is nice to have a
solution in which neither big pile of code needs to be changed to add
explicit atol= arguments everywhere.

Also, both of these kinds of code will continue to be written, and it is
nice if we can provide a tool that will seamlessly support both usecases
without one group or the other being forced to constantly think about atol=.

The idea of zerotol is that in fact this is what everyone really wants, and
that it will allow support both uses cases without anyone having to rewrite
a bunch of code.

(Assuming for the moment that we don't formally deprecate atol= anytime
soon, then I think the plan described above will actually produce few or
zero warnings with existing code bases? Of course this is an empirical
question.)

On Sun, Jul 20, 2014 at 5:30 PM, Pauli Virtanen notifications@github.com
wrote:

@rgommers https://github.com/rgommers: indeed, so the warning message
should then recommend setting explicitly atol=1e-8 or atol=0.

The warning issue will probably turn up in any scheme that spews warnings.
If it is done similarly as @njsmith https://github.com/njsmith suggests
above (only on calls that produce different results), then it'll be
slightly less bad.

—
Reply to this email directly or view it on GitHub
#4880 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

charris · 2014-07-20T17:33:34Z

I lean towards Ralf on this one, the number of fixes that would need to go in would be enormous. That said, it might be possible to make a script to do that job and submit PR's to the bigger users of assert_allclose, then make the change.

josef-pkt · 2014-07-20T18:01:00Z

I'm against changing assert_allclose to the large defaults of allclose now mainly for strategic reason. I think in the long term it's better if the scipy related testing culture keeps increasing and uses tools that have reasonably strict tests as default (instead of automatically defaulting to true for any small values).

I also agree with Ralf that any change that would produce a huge number of warnings if someone updates numpy but doesn't have a statsmodels that handles it would be pretty bad.

In terms of actual changes to the code, I think I would just add a wrapper function in a statsmodels.testing module and change the imports instead of trying to change most of the assert_equal. That's inconvenient since we are very used to from numpy.testing import assert_xxx, but it wouldn't be an excessive amount of work.

I think Nathaniel's proposal is ok. That removing assert equal for exact zeros will not have much effect on statsmodels, for example. There might be just a few cases that would be affected. (I guess most intentional cases use assert_equal in that case.)

charris · 2014-07-20T18:07:51Z

Another possibility is to make the change(s), and if the test fails, then repeat with the old default and if it passes, raise a FutureWarning. That would probably get rid of most of the warnings.

josef-pkt · 2014-07-20T18:19:21Z

@charris that only works if you make the test requirement more strict, but not if you weaken them.
If our unit tests pass at atol=0, then they also pass at atol>0. That's essentially the same as giving out no warnings at all.

pv · 2014-07-20T18:20:45Z

Re: zerotol --- the expected results are often not manually typed exact values, and can contain e.g. numbers of order eps instead of zeros, in which case atol needs to be given, so it cannot be deprecated. In other cases, exact zeros in expected can also be produced accidentally, e.g. by underflows.

Additionally (also in manually typed cases), you usually don't want different tolerances for zero and nonzero values if atol= is given. So in practice, adding a nonzero zerotol= default value requires changing atol=x to atol=x, zerotol=0 just to avoid potential accidental loosening of the tests (there are ~400 instances of atol= in scipy, ie., it's present in about half of the calls to assert_allclose).

njsmith · 2014-07-20T18:54:29Z

Pauli- yeah, the very nature of this problem means that all solutions have
imperfections that we have to weigh. I'm happy to forget about deprecating
atol=. Regarding the other two points you raise -- accidental exact zeros
in 'desired' that trigger accidentally lenient checking, and
double-counting of atol + zerotol -- my intuition is that these zerotol
sins are much milder than the pure-atol sins? Right now by default you
either get lenient checking or not at random depending on which of two
apparently equivalent functions you used, for all values, not just the
occasional accidental zero. And you're right that there's a potential
problem with using atol=x instead atol=x, zerotol=0, but my first guess is
that this is a much smaller problem than some of the other ones we're
talking about here, because (a) if we're talking about manually typed cases
then you almost certainly want atol=0, zerotol=x, (b) atol=x, zerotol=x is
almost never going to matter in practice, because 'x' is probably only set
to a rough order of magnitude to start with -- a little perturbation in how
current tests are handled is inevitable, the hope is that we can keep it as
little as possible. AND of course these are just guesses -- the advantage
of the transition strategy described above is that we would get to see
which proportion of tests it actually makes a difference on before actually
committing to anything.

On Sun, Jul 20, 2014 at 7:20 PM, Pauli Virtanen notifications@github.com
wrote:

Re: zerotol --- the expected results are often not manually typed exact
values, and can contain e.g. numbers of order eps instead of zeros, in
which case atol needs to be given, so it cannot be deprecated. In other
cases, exact zeros in expected can also be produced accidentally, e.g. by
underflows.

Additionally, also in manually typed cases, you don't want different
tolerances for zero and nonzero values if atol= is given. So in practice,
adding a nonzero zerotol requires changing atol=x to atol=x, zerotol=0
just to avoid breakage (there are ~400 instances of atol= in scipy, ie.,
it's present in about half of the calls to assert_allclose).

—
Reply to this email directly or view it on GitHub
#4880 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

pv · 2014-07-20T20:16:31Z

I think in cases where you know what to put into zerotol, just putting it into atol is numerically better.

Re: (b) assert_allclose(atol=1e-14) differs from assert_allclose(atol=1e-14, zerotol=0), by ~half floating point precision. Existing code that uses atol to avoid exact zero comparisons (esp. in manually typed results) gets this upward bump in test tolerances. If backward compat is the aim, it would be best to just ignore zerotol if atol is given, or to ignore it unless explicitly given.

In any case, 3rd party projects will need to do some review on their tests. I'd guess that for the most part, tolerance changes have little effect, since often the point of allclose tests is to check that results are not ridiculously wrong. However, the intent of each test is not easy to guess, and at least for Scipy it varies, so someone needs to go through the grep outputs. Changes to allclose will also cause some behavior changes in non-test code (e.g. in scipy.signal and scipy.linalg).

njsmith · 2014-07-20T21:53:05Z

Hmm, yeah, I can see an argument for making the end-goal default something
like rtol=1e-whatever, atol=None, zerotol=1e-whatever, and if atol is given
it overrides zerotol.

What do you all think? Obviously the current (a) inconsistency, (b) lack of
sensible defaults for handling both exact-zeros and small-value
calculations is causing problems. Is something like this zerotol idea (1) a
good all-round default, (b) worth the disruption to get to?

On Sun, Jul 20, 2014 at 9:16 PM, Pauli Virtanen notifications@github.com
wrote:

I think in cases where you know what to put into zerotol, just putting it
into atol is numerically better.

Re: (b) assert_allclose(atol=1e-14) differs from assert_allclose(atol=1e-14,
zerotol=0), by ~half floating point precision. Existing code that uses
atol to avoid exact zero comparisons (esp. in manually typed results)
gets this upward bump in test tolerances. If backward compat is the aim, it
would be best to just ignore zerotol if atol is given, or to ignore it
unless explicitly given.

In any case, 3rd party projects will need to do some review on their
tests. I'd guess that for the most part, tolerance changes have little
effect, since often the point of allclose tests is to check that results
are not ridiculously wrong. However, the intent of each test is not easy to
guess, and at least for Scipy it varies, so someone needs to go through the
grep outputs.

—
Reply to this email directly or view it on GitHub
#4880 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

rgommers · 2014-07-21T05:33:55Z

My summary:

really we'd like assert_allclose(a, b, atol=1e3*tiny, rtol=1e3*eps)
we can't get there without a lot of disruption --> not worth it
introducing zerotol is a compromise to get similar behavior with less disruption at the cost of an extra kwarg
it still requires people to check their tests for unwanted bumps in precision
not clear if allclose should then also get zerotol, because it may change numerical precision of non-test code.

My opinion: the current situation isn't that bad that it justifies the effort, especially if we can't end up with the optimal solution after all. +0.5 for just documenting current behavior better.

charris · 2014-08-04T21:35:15Z

I'm with @rgommers on this one, so closing. If anyone feels strongly that it still needs to be addressed, please reopen.

njsmith · 2014-08-04T21:53:57Z

Noting for the record that I actually think a nonzero zerotol is actually a
better all-around default than either version of allclose currently has,
but I don't have the bandwidth to push for this right now.

On Mon, Aug 4, 2014 at 10:35 PM, Charles Harris notifications@github.com
wrote:

I'm with @rgommers https://github.com/rgommers on this one, so closing.
If anyone feels strongly that it still needs to be addressed, please reopen.

—
Reply to this email directly or view it on GitHub
#4880 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

rgommers added the component: numpy.testing label Jul 18, 2014

charris closed this Aug 4, 2014

njsmith mentioned this pull request Dec 12, 2017

numpy.isclose vs math.isclose #10161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match absolute tolerance of `assert_allclose` with `allclose` and `isclo... #4880

Match absolute tolerance of `assert_allclose` with `allclose` and `isclo... #4880

tonysyu commented Jul 18, 2014

rgommers commented Jul 18, 2014

njsmith commented Jul 18, 2014

josef-pkt commented Jul 18, 2014

tonysyu commented Jul 18, 2014

josef-pkt commented Jul 18, 2014

tonysyu commented Jul 19, 2014

njsmith commented Jul 19, 2014

josef-pkt commented Jul 19, 2014

argriffing commented Jul 19, 2014

josef-pkt commented Jul 20, 2014

njsmith commented Jul 20, 2014

rgommers commented Jul 20, 2014

njsmith commented Jul 20, 2014

njsmith commented Jul 20, 2014

pv commented Jul 20, 2014

rgommers commented Jul 20, 2014

pv commented Jul 20, 2014

rgommers commented Jul 20, 2014

njsmith commented Jul 20, 2014

charris commented Jul 20, 2014

josef-pkt commented Jul 20, 2014

charris commented Jul 20, 2014

josef-pkt commented Jul 20, 2014

pv commented Jul 20, 2014

njsmith commented Jul 20, 2014

pv commented Jul 20, 2014

njsmith commented Jul 20, 2014

rgommers commented Jul 21, 2014

charris commented Aug 4, 2014

njsmith commented Aug 4, 2014

Match absolute tolerance of assert_allclose with allclose and `isclo... #4880

Match absolute tolerance of assert_allclose with allclose and `isclo... #4880

Conversation

tonysyu commented Jul 18, 2014

rgommers commented Jul 18, 2014

njsmith commented Jul 18, 2014

josef-pkt commented Jul 18, 2014

tonysyu commented Jul 18, 2014

josef-pkt commented Jul 18, 2014

tonysyu commented Jul 19, 2014

njsmith commented Jul 19, 2014

josef-pkt commented Jul 19, 2014

argriffing commented Jul 19, 2014

josef-pkt commented Jul 20, 2014

njsmith commented Jul 20, 2014

rgommers commented Jul 20, 2014

njsmith commented Jul 20, 2014

njsmith commented Jul 20, 2014

pv commented Jul 20, 2014

rgommers commented Jul 20, 2014

pv commented Jul 20, 2014

rgommers commented Jul 20, 2014

njsmith commented Jul 20, 2014

charris commented Jul 20, 2014

josef-pkt commented Jul 20, 2014

charris commented Jul 20, 2014

josef-pkt commented Jul 20, 2014

pv commented Jul 20, 2014

njsmith commented Jul 20, 2014

pv commented Jul 20, 2014

njsmith commented Jul 20, 2014

rgommers commented Jul 21, 2014

charris commented Aug 4, 2014

njsmith commented Aug 4, 2014

Match absolute tolerance of `assert_allclose` with `allclose` and `isclo... #4880

Match absolute tolerance of `assert_allclose` with `allclose` and `isclo... #4880