Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Overflow in tan and tanh for large complex values #3010

Closed
wants to merge 35 commits into from

Conversation

ewmoore
Copy link
Contributor

@ewmoore ewmoore commented Feb 22, 2013

This is a fix for gh-2321. I've imported the implementation for ctanh from FreeBSD's math library which handles large arguments correctly. Which fixes this bug and the equivalent bug in np.tan.

The problem here is that importing the function into npy_math_complex.c and adding a config check causes us to use the implementation from glibc on Linux, which also has this bug. Although it seems to have been fixed in glibc last April (http://sourceware.org/bugzilla/show_bug.cgi?id=11521).

I see several things that could be done here, the easiest is probably to use our version of ctanh unconditionally. Although there are a multitude of ways to go about that, for instance should I remove the implementation from npy_math_complex.c and just place it directly in umath/funcs.inc, where the buggy version was? Or should I just remove the config check and add a note about it somewhere? Or some other choice all together.

Some kind of test will also need to be added, but I thought I'd get some input on how to handle this before doing anything else.

@ewmoore
Copy link
Contributor Author

ewmoore commented Feb 22, 2013

Actually I don't think that tanh has any tests at all for real or complex values. This also appears true for some of the other ufuncs.

@charris
Copy link
Member

charris commented Feb 23, 2013

Looks good at a first glance. Some or the indentation looks off, the numpy standard is no hard tabs and 4 space indentations. There is also a long line or two that could maybe be wrapped.

As to your questions, I don't know the answers ;) I'm not sure there are any real drawbacks to having our own version always used except maybe speed. Working around compiler/library bugs is always a bit unsettling. There used to be the WTF-MathExtras from Apple dealing with similar problems. We could maybe run some accuracy checks during configuration, but I suspect that would be the start of a long, twisty road.

@njsmith
Copy link
Member

njsmith commented Feb 23, 2013

Yeah, this is a tricky situation, isn't it... I guess the thing I'd be
worried most about is that in many cases our local version of a function
like 'ctanh' is not going to be as well-maintained as the upstream version.
E.g. following your link, I see that glibc's tanh has recently received bug
fixes to correctly handle denormal underflow and things; I have no idea
whether the version in this PR also handles such subtleties correctly, and
after we merge it we'll probably never touch it again.

I agree that 'accuracy checks' in general are unsettling, we probably don't
want to get into the business of checking for correct rounding and etc.
etc., but in this specific case it seems like there are three categories:
(1) poorly-maintained system libraries that return flat-out-wrong answers
for certain (large) inputs
(2) our local version, which we believe does not return flat out wrong
answers
(3) well-maintained system libraries that also do not return flat out
wrong answers, and probably receive more bug fixes overall
And it seems to me our preference should be (1) < (2) < (3)? And
fortunately in this case it is easy to tell whether the system library
falls into class (1) or (3), because we can just call tanh with some large
value and see if we get a nan back -- that's pretty unambiguous, no
fiddling around with ulps there. So maybe we should do that.

On Sat, Feb 23, 2013 at 5:11 AM, Charles Harris notifications@github.comwrote:

Looks good at a first glance. Some or the indentation looks off, the numpy
standard is no hard tabs and 4 space indentations. There is also a long
line or two that could maybe be wrapped.

As to your questions, I don't know the answers ;) I'm not sure there are
any real drawbacks to having our own version always used except maybe
speed. Working around compiler/library bugs is always a bit unsettling.
There used to be the WTF-MathExtras from Apple dealing with similar
problems. We could maybe run some accuracy checks during configuration, but
I suspect that would be the start of a long, twisty road.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-13985361.

@ewmoore
Copy link
Contributor Author

ewmoore commented Feb 27, 2013

I've been looking at this some more. And while I like the idea to just try calling the system ctanh, if it exists, to check if it has this bug, I suspect require that will break cross compiling. Aren't the windows builds cross compiled? Is it worth detecting that we are cross-compiling then? This seems, as charris suggested to be a little fraught. I'll note too that numpy's distutils includes a warning against running code on the target machine as part of the configure checks.

The implementation from FreeBSD that I've pulled in seems to have issues when the result will be a denormal, as was corrected in glibc. Although I think I understand where that comes from and might be able to fix it. FWIW the version in python's cmath module also suffers from this problem. For instance:

In [1]: import numpy as np

In [2]: import cmath

In [3]: np.tanh(372+1j)
Out[3]: (1+1.4821969375237396e-323j)

In [4]: cmath.tanh(372+1j)
Out[4]: (1+1.5e-323j)

The correct answer, according to Mathematica or Wolfram Alpha, at least, is 1+1.3952e-323j (everything smaller than ~1e-308 is denormal.)

The other though that I had is, do we really want to continue to use our own version of the complex functions when they are available in libm? There are a number of them in umath/funcs.inc.src for which we don't even check for a platform version. I'd propose moving all of them to npymath with checks, and probably importing the FreeBSD code for the ones we have custom code for currently. I suppose the drawback to that is possibly changing the behavior of these functions in corner cases. Although I suspect that all is not perfect there already.

Thoughts?

@charris
Copy link
Member

charris commented Feb 27, 2013

That sounds like a good idea to me. The npymath library hasn't been much worked on since it was created several years ago and could use a little TLC.

@pv
Copy link
Member

pv commented Feb 27, 2013

I don't think you can cross-compile Numpy, at least with Python's distutils, so that doesn't need to be a worry. The windows binaries are built either on native, or on Wine, I believe.

+1 from me for using platform complex functions, if available and determined to be reasonable. As a data point, our complex functions used to have accuracy issues (and some of them possibly still have).

@ewmoore
Copy link
Contributor Author

ewmoore commented Mar 8, 2013

This is going to fail the travis build again. I think that the currently common glibc's on linux don't have as good of complex functions as we have, even if the latest release has equal good implementations. So I'm going to disable using the functions from libc. The machinery to use them is in place and worth keeping, though. There are also a number of improvements I'll import to our versions from those used in python's cmath. There is no reason we shouldn't be as good as python.

@njsmith
Copy link
Member

njsmith commented Mar 8, 2013

Those improvements sound excellent. But I'm still worried about this plan of just disabling the platform functions, because even if we leapfrog past the platform versions now, they may pass us again in a few years, after we've all forgotten about this...

@ewmoore
Copy link
Contributor Author

ewmoore commented Mar 9, 2013

I agree with you in principle about a few years from now. Practically, I'm not entirely sure what the answer is since we (or perhaps I should say, I) don't really want to copy the test suite to C just to check for these things. I can write some spot checks for the tests that fail for me and include those, which I guess is my plan for now.

For reference the functions that cause test failures against eglibc 2.13-38 (debian wheezy) are: cpow, cacos, casin, catan, cacosh, casinh, catanh.

@ewmoore
Copy link
Contributor Author

ewmoore commented Apr 1, 2013

As a start for checking the system provided functions, I've written some tests for clog and ctanh which include everything from appendix G of the c99 standard and whatever tests are in our test suite. My current work is available here.

Attaching this to the numpy build doesn't seem so straightforward though. The tests need to run after we check if the functions exist, so a fine looking place to me is around line 200 of core/setup.py. However if I use config.try_run there is no way to get either the output of the run or its return code. So I'm a little stuck plugging this into the numpy build. Hopefully someone that knows more about distutils can provide some guidance.

@charris
Copy link
Member

charris commented Apr 1, 2013

@cournape Any suggestions?

@cournape
Copy link
Member

cournape commented Apr 6, 2013

There is no obvious good solution. We generally want to avoid running configuration tests, OTOH the risk of forgetting about our implementation is real. Maybe we could check explicitly for the GLIBC version ?

@ewmoore
Copy link
Contributor Author

ewmoore commented Aug 26, 2013

Sorry I dissapeared on this for so long. If I can get some pointers on how to find the glibc version at compile time, I'm happy to try and insert a check.

Also, I rebased this PR onto current master.

@juliantaylor
Copy link
Contributor

whats the reason for trying to avoid running configuration checks? Its pretty well suited for libm tests as its always available in the system paths.
Checking functions individually is preferable to a version check in this case.

@ewmoore
Copy link
Contributor Author

ewmoore commented Aug 30, 2013

For future reference commit 56e3b7d95a3906ec932d7ded39ccc1a1820e1a64 in this pull request fixes #2354.

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 4, 2013

Okay. I've made a stab at detecting the glibc version, and then adjusting.

I'm doing this using ctypes, to call a function unique to glibc. One would hope that we could just use python's platform.libc_ver() to find this information. However it doesn't work, and returns 2.7 instead of the correct 2.17 on my system. Obviously for this to be a complete solution, information for some more versions need to be included and possibly equivalent functions on win and mac.

Note that libc.so.6 is the linux standards base name for libc most places, (x86, x86_64, PPC32, PPC64, S390, S390X) but it's libc.so.6.1 on IA64. So I guess that issue would need to be handled too.

The Travis build should pass since it uses glibc 2.15 and for anything but 2.17 I'm forcing nearly all of our own functions right now.

@njsmith
Copy link
Member

njsmith commented Sep 5, 2013

Instead of checking the version number, the configuration check should
check whether the functions in question actually work. Like, call libc.tanh
and see what happens.

The libc issue is not too hard to fix, just have a function that tries
everything (libc.so.6, libc.so.6.1, msvcrt, etc.) and returns the one that
works.

On Thu, Sep 5, 2013 at 12:42 AM, Eric Moore notifications@github.comwrote:

Okay. I've made a stab at detecting the glibc version, and then adjusting.

I'm doing this using ctypes, to call a function unique to glibc. One would
hope that we could just use python's platform.libc_ver() to find this
information. However it doesn't work, and returns 2.7 instead of the
correct 2.17 on my system. Obviously for this to be a complete solution,
information for some more versions need to be included and possibly
equivalent functions on win and mac.

Note that libc.so.6 is the linux standards base name for libc most places,
(x86, x86_64, PPC32, PPC64, S390, S390X) but it's libc.so.6.1 on IA64. So I
guess that issue would need to be handled too.

The Travis build should pass since it uses glibc 2.15 and for anything but
2.17 I'm forcing nearly all of our own functions right now.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-23834530
.

@njsmith
Copy link
Member

njsmith commented Sep 5, 2013

Also, how does this interact with libraries like MKL that provide their own
trig functions? Are we using them now, and how does the build/link work so
that we do? Because we should make sure to check the version actually
available...

If it's easier we could also consider making this a runtime check, where
the first time tanh() or whatever is called we just spend 1 ms and figure
out which version is best.

On Thu, Sep 5, 2013 at 4:20 PM, Nathaniel Smith njs@pobox.com wrote:

Instead of checking the version number, the configuration check should
check whether the functions in question actually work. Like, call libc.tanh
and see what happens.

The libc issue is not too hard to fix, just have a function that tries
everything (libc.so.6, libc.so.6.1, msvcrt, etc.) and returns the one that
works.

On Thu, Sep 5, 2013 at 12:42 AM, Eric Moore notifications@github.comwrote:

Okay. I've made a stab at detecting the glibc version, and then adjusting.

I'm doing this using ctypes, to call a function unique to glibc. One
would hope that we could just use python's platform.libc_ver() to find
this information. However it doesn't work, and returns 2.7 instead of the
correct 2.17 on my system. Obviously for this to be a complete solution,
information for some more versions need to be included and possibly
equivalent functions on win and mac.

Note that libc.so.6 is the linux standards base name for libc most
places, (x86, x86_64, PPC32, PPC64, S390, S390X) but it's libc.so.6.1 on
IA64. So I guess that issue would need to be handled too.

The Travis build should pass since it uses glibc 2.15 and for anything
but 2.17 I'm forcing nearly all of our own functions right now.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-23834530
.

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 5, 2013

I put in a check on the glibc version based on @cournape's comment above.

I can call libm.tanh and friends using ctypes, but if we are going to add such checks why not just compile something and do the checks there? That would eliminate trying to figure out the library name or which library to use, and push that to the compiler which already knows how to do it. This does bring me back to where I was earlier though (see my earlier comment)-- I don't see a way to use distutils to compile, link, and run something that also allows me to retrieve either the output or a return code.

I don't have access to the MKL to know. Although unless it comes with a drop in replacement for libm, I don't think that we do. Might be a nice thing to do though. Using MKL for trig functions looks like #671

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 11, 2013

We need to make a decision on what we want to do here. As I see it we have a number of options:

  1. Punt on using any additional system C99 complex functions than we were using previously. This has been done before (inf/nan handling in umath is undefined (Trac #886) #1484), since there isn't an easy answer here. I can pull the non-build related changes out and submit a different pull request if this is our choice.
  2. Add build time checks in C. I wrote some checks for clog and ctanh (see linked gist above) but there isn't an obvious way to insert this into the build system.
  3. Add build time checks in python using ctypes.
  4. Add a check for the glibc version (and similar on other platforms) and use white lists. (This is currently partially implemented in this pull request.)

I don't see any benefit to option 3. It is just as much work as option 2 and with drawbacks such as needing a list of possible libm names.

Personally, I vote for option 4. It is the least invasive and improves on the current situation.

Also, I pushed a commit that (finally) makes travis green on this pull request.

@juliantaylor
Copy link
Contributor

-1 on option 4, maintaining a whitelist is a pain.
For me option 2 is the best, I'll check how to do it in the buildsystem, of the top of my head I don't see an issue.

@juliantaylor
Copy link
Contributor

config.try_run(body, libraries=["m"]) in check_math_capabilities of numpy/core/setup.py seems to work fine.
you can't get the exitcode directly, but you do get if it exited with 0 or not, so you have to run it once for every function.
e.g.

body = obody.replace("PYTESTFUNCTION", "test_clog")
if config.try_run(body, libraries=["m"]):
    moredefs.append("HAVE_CMATH_GOOD_CLOG")

with
return PYTESTFUNCTION() != 0 ? 0 : 1;
in main() of your gist

btw your gist returns that clog has bad branch cuts in glib 2.17, has this been filed as a bug against glibc?

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 12, 2013

It has not been filed as a bug against glibc, but I wouldn't yet trust that the code in the gist is correct. It has various issues as it stands. Give me some time and I'll see what I can do about option 4.

@njsmith
Copy link
Member

njsmith commented Sep 12, 2013

To make sure everyone's on the same page: what you're hearing from us is,
option 4 is not going to be accepted, so probably not something worth your
spending time on.
On 12 Sep 2013 06:29, "Eric Moore" notifications@github.com wrote:

It has not been filed as a bug against glibc, but I wouldn't yet trust
that the code in the gist is correct. It has various issues as it stands.
Give me some time and I'll see what I can do about option 4.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-24296007
.

@juliantaylor
Copy link
Contributor

why?

edit: mixed up opt 2 and 4, I agree option 4 is not good

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 12, 2013

@njsmith, that was my understanding from earlier. But what I don't 't know is what will be accepted. An earlier comment from @cournape suggested checking the glibc version. You've suggested adding build time checks in python using ctypes. Hence my "We need to make a decision..." post above.

@njsmith
Copy link
Member

njsmith commented Sep 12, 2013

I don't know what will be accepted either -- something good :-). I wasn't
saying ctypes was necessarily the way to go -- in fact my next message
raised the issue that making ctypes work at all seemed hard -- but just
that doing functionality checks using ctypes > doing version checks using
ctypes.

The ideal approach is one that checks the functionality rather than the
version, and does this in some sort of reliable and maintainable way. Build
time compiler-based checks and runtime checks seem like the two best
options for this that I've seen.

On Thu, Sep 12, 2013 at 1:40 PM, Eric Moore notifications@github.comwrote:

@njsmith https://github.com/njsmith, that was my understanding from
earlier. But what I don't 't know is what will be accepted. An earlier
comment from @cournape https://github.com/cournape suggested checking
the glibc version. You've suggested adding build time checks in python
using ctypes. Hence my "We need to make a decision..." post above.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-24314977
.

@juliantaylor
Copy link
Contributor

I wouldn't use ctypes, during configure you can build and run your test file easily. Then you define preprocessor macros and use those to chose during numpy build..

Doing it at runtime would be more flexible, but I doubt many people exchange the math library often during the lifetime of a system.

@njsmith
Copy link
Member

njsmith commented Sep 12, 2013

Linux systems do upgrade libc all the time without recompiling, but yeah,
who cares :-). The runtime check would only really be justified if it were
way easier to implement/more maintainable somehow. (Don't see how, but I
haven't looked very hard.)

On Thu, Sep 12, 2013 at 2:29 PM, Julian Taylor notifications@github.comwrote:

I wouldn't use ctypes, during configure we can compile C programs easily.
Se we can run your test file and check what is good, define preprocessor
macros and use these to chose during numpy build.

runtime checks would be more flexible, but I doubt many people exchange
there math library at runtime.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3010#issuecomment-24318731
.

@ewmoore
Copy link
Contributor Author

ewmoore commented Sep 12, 2013

Sorry for the confusion. I mixed up my option. I'll see what I can do about checking at build time in C.

The code from FreeBSD was lightly adapted to fit with the numpy style.

Several incorrect branch cut tests where fixed. (All of these for all of
the arc* functions that were changed have to do with getting the correct
branch cut for (x,0) or (0,x).)

With this commit npy_cacos(f), npy_casin(f), npy_cacosh(f), and
npy_casinh(f) pass all of the tests in test_c99complex.c.
Now we have our own implementations that can pass these tests and we
only link against the system versions if they can also pass.
Additionally, run the branch cut tests for complex64.
Annex G of the C99 standard does not define any special values for cpow,
allowing the implementation to be as simple as cpow(z,w) =
cexp(w*clog(z)).  We have a large number of tests for our cpow function,
both in the test suite and in test_c99complex.c. (There are actually
more in test_c99complex.c, since I'm doing all combinations from
TestCpow::test_scalar in test_umath_complex.py.)

As of right now, these tests probably mean that we will never use a
system cpow implemenation. This is fine, since we handle a large number
of edge cases in a sane way and at least glibc does not.

With this commit all 48 of our complex functions pass test_c99complex.c.
Some of the other tests in test_umath_complex.py are duplicates of these
tests.
The names and copyright dates were updated earlier.
@ewmoore
Copy link
Contributor Author

ewmoore commented Feb 26, 2014

Rebased.

@ewmoore
Copy link
Contributor Author

ewmoore commented Feb 27, 2014

So this failed the Travis tests since they now try to do a build with bento.

@charris
Copy link
Member

charris commented Jan 27, 2015

Current state on Fedora 21.

In [2]: np.tanh(372+1j)
Out[2]: (nan+nan*j)

Looks like library function is not used. gcc 4.9.2 gives 1.000000000000000000+0.000000000000000000i, where denormals look to be flushed to zero, probably a side affect of how the function compiles. The code for that is

#include <stdio.h>
#include <math.h>
#include <complex.h>

int main(int argc, char **args)
{
    double complex z = 372.0 + 1.0 * I;
    double complex res = ctanh(z);

    printf("%20.18f%+20.18fi\n", creal(res), cimag(res));
}

Do we have a policy on denormals? Do we need one?

I haven't followed this PR, but we should bring it to conclusion.

@charris
Copy link
Member

charris commented Jan 27, 2015

@ewmoore IIRC, the changes to the code generators is already merged and the documentation for frexp and ldexp moved.

I managed to get these patches applied without too much work, but no guarantee that the result was correct., There was one test failure

FAIL: test_ldexp_overflow (test_umath.TestLdexp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/charris/Workspace/numpy.git/build/testenv/lib64/python2.7/site-packages/numpy/core/tests/test_umath.py", line 548, in test_ldexp_overflow
    assert_equal(ncu.ldexp(2., imax), np.inf)
  File "/home/charris/Workspace/numpy.git/build/testenv/lib64/python2.7/site-packages/numpy/testing/utils.py", line 322, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 1.0
 DESIRED: inf

@charris
Copy link
Member

charris commented Jan 27, 2015

Rebased and squashed in #5510.

@charris charris closed this Jan 27, 2015
@charris
Copy link
Member

charris commented Jan 27, 2015

I'd be happy to just use it when there are no system supplied functions. GCC seems OK now. I think the thing to do is run the improved tests now and then and submit bug reports to the relevant projects. If they can't fix things, then we might consider using these functions instead.

@charris
Copy link
Member

charris commented Jan 27, 2015

One thing that does bother me is that the system complex functions do not seem to be detected, at least on Fedora 21.

@ewmoore
Copy link
Contributor Author

ewmoore commented Jan 28, 2015

I guess this discussion should move else where, but do you mean that the system functions are not detected here or with master? Do the system functions pass the tests?

@charris
Copy link
Member

charris commented Jan 28, 2015

With master, I've got that figured out now, which I assume you already did ;) I'm working through things and will probably take your work and modify it a bit.

charris referenced this pull request Feb 13, 2015
BUG: Simplified fix, Overflow in tan and tanh for large complex values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants