Intermittent NaNs appearing in results from calling shared-library blis from Julia #12

tkelman · 2014-07-03T02:17:32Z

CPU: Core2 Duo E8400 (old machine)
OS: Ubuntu 14.04, x86-64

Compiled BLIS reference configuration, setting BLIS_ENABLE_DYNAMIC_BUILD := yes. By itself, BLIS passes its own make test.

I'm calling into BLIS from Julia by the following steps:

git clone https://github.com/julialang/julia
cd julia
mkdir -p $PWD/usr/lib
cp /path/to/libblis.so $PWD/usr/lib
echo 'override USE_SYSTEM_BLAS = 1' > Make.user
echo 'override USE_BLAS64 = 0' >> Make.user
echo 'override LIBBLAS = -L$(build_libdir) -lblis -lm' >> Make.user
echo 'override LIBBLASNAME = libblis' >> Make.user
make -j8   # this will take quite a while - Julia has lots of big dependencies
make testall

This gives me a different failure each time I repeat make testall. Here are some examples, from the first couple of files in Julia's test suite (linalg1 and linalg2):
https://gist.github.com/47dbd5517c4a6f56fb2e
https://gist.github.com/51835795c8ada7c0f2a1
https://gist.github.com/552ec09e5e78d1cd3da7
https://gist.github.com/7cce0057bf9a3009a92a
https://gist.github.com/0f01364072634cc95be9
https://gist.github.com/9292fc3fe1a2e9afe09d

I'll see if I can translate a few of these test cases into C to figure out whether the problem is reproducible outside of Julia. I'll also try setting the BLIS integer size to 64-bit and delete the USE_BLAS64 = 0 line, see whether that changes anything.

The text was updated successfully, but these errors were encountered:

fgvanzee · 2014-07-03T19:54:51Z

Tyler took a look at your output and had an idea: is Julia calling BLIS via multiple threads? (The reference configuration is not thread-safe.)

tkelman · 2014-07-03T20:44:37Z

Aha. Well not exactly multiple threads, but the master julia process launches separate worker julia processes to run the tests in parallel. The subprocesses should have separate memory spaces in which they load dynamic libraries, I think. Even when I run the tests in a single julia process, I still get similar-looking non-reproducible results and intermittent NaNs though.

Is there a list of which configurations are or are not thread-safe anywhere? Or should I assume none of them are at this point?

fgvanzee · 2014-07-03T23:44:20Z

Is there a list of which configurations are or are not thread-safe anywhere? Or should
I assume none of them are at this point?

Sorry, I should have been more descriptive. The reference configuration isn't thread-safe because it is single-threaded and does not enable OpenMP. (OpenMP is currently the only threading model we support, but POSIX threads could be supported in the future.) In order to achieve thread safety, a critical section in the BLIS memory allocator must be enabled. Currently, it's only enabled when OpenMP is enabled, via BLIS_ENABLE_OPENMP.

Now, I'm not sure if the critical section works when BLIS is called from two different application threads. Tyler thinks that it may not do what we want it to do in that situation.

This all may be unrelated, of course. But it would be interesting to enable OpenMP and see if that fixes some or all of the NaNs.

tkelman · 2014-07-03T23:56:22Z

But it would be interesting to enable OpenMP and see if that fixes some or all of the NaNs.

Sure, happy to try. Where would that flag be located?

fgvanzee · 2014-07-03T23:58:49Z

Sure, happy to try. Where would that flag be located?

Just add

#define BLIS_ENABLE_OPENMP

to bli_config.h.

tkelman · 2014-07-04T02:46:40Z

OK, I also had to uncomment -fopenmp from CMISCFLAGS and add it to LDFLAGS (and LIBBLAS in Julia's Make.user). This is giving a different set of errors - either index out of bounds, or LAPACKException(6597069766656), or SingularException(140011638882304), or determinants of the wrong signs.

Trying with 64-bit integers in BLIS resulted in a segfault, I'll see if I can get a backtrace.

Maratyszcza · 2014-07-04T21:29:57Z

This kind of error might also result from reading uninitialized memory.
Could you try to run under valgrind memcheck to eliminate this possibility?

On Wednesday, July 2, 2014, Tony Kelman notifications@github.com wrote:

CPU: Core2 Duo E8400 (old machine)
OS: Ubuntu 14.04, x86-64

Compiled BLIS reference configuration, setting BLIS_ENABLE_DYNAMIC_BUILD
:= yes. By itself, BLIS passes its own make test.

I'm calling into BLIS from Julia https://github.com/julialang/julia by
the following steps:

git clone https://github.com/julialang/julia
cd julia
mkdir -p $PWD/usr/lib
cp /path/to/libblis.so $PWD/usr/lib
echo 'override USE_SYSTEM_BLAS = 1' > Make.user
echo 'override USE_BLAS64 = 0' >> Make.user
echo 'override LIBBLAS = -L$(build_libdir) -lblis' >> Make.user
echo 'override LIBBLASNAME = libblis' >> Make.user
make -j8 # this will take quite a while - Julia has lots of big dependencies
make testall

This gives me a different failure each time I repeat make testall. Here
are some examples, from the first couple of files in Julia's test suite (
linalg1 https://github.com/JuliaLang/julia/blob/master/test/linalg1.jl
and linalg2
https://github.com/JuliaLang/julia/blob/master/test/linalg2.jl):
https://gist.github.com/47dbd5517c4a6f56fb2e
https://gist.github.com/51835795c8ada7c0f2a1
https://gist.github.com/552ec09e5e78d1cd3da7
https://gist.github.com/7cce0057bf9a3009a92a
https://gist.github.com/0f01364072634cc95be9
https://gist.github.com/9292fc3fe1a2e9afe09d

I'll see if I can translate a few of these test cases into C to figure out
whether the problem is reproducible outside of Julia. I'll also try setting
the BLIS integer size to 64-bit and delete the USE_BLAS64 = 0 line, see
whether that changes anything.

—
Reply to this email directly or view it on GitHub
#12.

tkelman · 2014-07-04T21:59:20Z

~~Not sure whether this is useful: https://gist.github.com/tkelman/f03fb2b9779b7adac894~~ edit: whoops, definitely not useful, valgrind doesn't apply to processes run by make

Regarding the 64-bit integer segfaults, something seemed to be calling lapack dpotrf at the very end of creating Julia's system image, which then went into BLIS with what looked to be garbage dimensions. I wouldn't expect anything to be doing a Cholesky factorization at that moment, so not sure what was happening there.

Maratyszcza · 2014-07-04T22:07:10Z

I think you'll need to run it as
valgrind --tool=memcheck --trace-children=yes -- make testall
By default it does not instrument child processes.

On Fri, Jul 4, 2014 at 2:59 PM, Tony Kelman notifications@github.com
wrote:

Not sure whether this is useful:
https://gist.github.com/tkelman/f03fb2b9779b7adac894

Regarding the 64-bit integer segfaults, something seemed to be calling
lapack dpotrf at the very end of creating Julia's system image, which
then went into BLIS with what looked to be garbage dimensions. I wouldn't
expect anything to be doing a Cholesky factorization at that moment, so not
sure what was happening there.

—
Reply to this email directly or view it on GitHub
#12 (comment).

tkelman · 2014-07-04T22:26:48Z

Thanks yeah I realized that a few minutes after initially posting.

So there's a lot going on here https://gist.github.com/tkelman/c55da37d849d01585739 that I don't quite know how to make sense of, though the calls to lapack sgecon_ and dgecon_ look most relevant.

@ViralBShah now might not be a good time, but in case you want to play with an alternate BLAS at all...

tkelman · 2014-07-05T00:17:49Z

I think I've figured out what's happening. These first few tests in linalg are often going to lapack, which complicates matters. It was more useful to start at the tests that use blas directly.

Julia offers both in-place and allocating versions of blas calls. The allocating versions do not explicitly initialize the contents of the output array before calling blas, they just send in whatever garbage is in the memory and use beta = 0. BLIS appears to be following the mathematical statement of BLAS operations a bit more literally than the reference and other implementations I'm aware of, and is actually multiplying my input garbage by zero. When my input garbage has NaNs (or presumably +/- Inf as well, but that's not as likely to happen randomly), the output ends up with corresponding NaNs in the same locations.

Should BLIS be checking for beta == 0 and special-casing to an assign-only operation, instead of multiplying the input by 0? Many applications may have grown accustomed to relying on this behavior over the years without realizing it.

ViralBShah · 2014-07-05T13:18:55Z

@tkelman I want to definitely try this out, but only after the 0.3 release. Would be great to have a BLIS.jl julia package for now. You could just replace all calls to Array with zeros on the julia side to get the initializations zeroed out.

tkelman · 2014-07-05T14:43:09Z

@ViralBShah that's about what I thought you'd say right now, no worries. I'll see what I can do with Clang auto-wrapping and coming up with a package.

With this patch to Julia https://gist.github.com/a16ce380d174b67561cd to initialize the allocating BLAS calls to 0, I see BLAS tests passing reliably for Float32 and Float64. There appears to be a problem with the imaginary part of the result of cdotc, this is likely due to complex type convention. Julia had switched here JuliaLang/julia#5291 to use cblas for these calls due to some troubles with MKL, but BLIS does't provide a cblas compatibility API. Looks like no difference from setting BLIS_ENABLE_C99_COMPLEX.

The linalg tests still see a few nans, lapack might be making some similar beta == 0 initialization assumptions to what unpatched Julia was making.

fgvanzee · 2014-07-05T16:34:53Z

Should BLIS be checking for beta == 0 and special-casing to an assign-only operation,
instead of multiplying the input by 0? Many applications may have grown accustomed to
relying on this behavior over the years without realizing it.

Generally speaking, BLIS should already be employing this write vs. update optimization for beta == 0 cases. Could you give me more details about how your are running BLIS? Are you configuring with the reference configuration? Which operation is being called that causes the NaNs to pop up?

Looking at the reference gemm micro-kernel, I can't see any reason it would allow uninitialized memory to get in the way of a beta = 0 invocation.

Maybe it's a different operation altogether (e.g. level-1 / level-2) that is causing the problem?

tkelman · 2014-07-05T16:44:39Z

Are you configuring with the reference configuration?

Yes, only change here is BLIS_ENABLE_DYNAMIC_BUILD := yes.

Which operation is being called that cause the NaNs to pop up? Looking at the reference gemm micro-kernel

Not sure whether gemm was ever causing trouble, I'll see how many test failures I can cause from direct BLAS calls. syrk definitely was leaving input NaNs in place with beta = 0.

tkelman · 2014-07-05T17:12:33Z

I can cause a problem with sgemm_, but dgemm_ looks ok. Test program, blistest.c:

#include <stdio.h>
#include "blis.h"

int main() {
    float A[16], B[16], C[16], alpha = 1.0, beta = 0.0;
    int i, ld = 4;
    char trans = 'N';
    for (i=0; i<16; i++) {
        A[i] = 0.0;
        B[i] = 0.0;
        C[i] = NAN;
    }
    A[0]  = 1.0;
    A[5]  = 1.0;
    A[10] = 1.0;
    A[15] = 1.0;
    B[0]  = 1.0;
    B[5]  = 1.0;
    B[10] = 1.0;
    B[15] = 1.0;
    sgemm_(&trans, &trans, &ld, &ld, &ld, &alpha, A, &ld, B, &ld, &beta, C, &ld);
    for (i=0; i<16; i++)
        printf("C[%d] = %e\n", i, C[i]);
}

Compiled and run as:

tkelman@ygdesk:~/github/blis-build06$ gcc blistest.c -o blistest -Iinclude/blis -Llib -lblis -lm -std=gnu99
tkelman@ygdesk:~/github/blis-build06$ LD_LIBRARY_PATH=$PWD/lib ./blistest
C[0] = nan
C[1] = nan
C[2] = nan
C[3] = nan
C[4] = nan
C[5] = nan
C[6] = nan
C[7] = nan
C[8] = nan
C[9] = nan
C[10] = nan
C[11] = nan
C[12] = nan
C[13] = nan
C[14] = nan
C[15] = nan

fgvanzee · 2014-07-05T19:19:43Z

Thanks for isolating the problem, Tony. I have been able to reproduce the bug. It turns out there are two places where the "overwrite or update" logic must exist. The first is in the micro-kernel itself, which is the first place that comes to mind when I think of this issue. The second is in the macro-kernel, immediately after the micro-kernel is called for edge cases. There, the micro-kernel computes into a temporary buffer, and then the elements corresponding to the edge case elements that actually exist are copied to the output matrix. This second location only has "update" logic, hence the bug. However, I can't figure out why it is not also manifesting for double-precision.

I'll add this fix to the top of my queue.

fgvanzee · 2014-07-07T19:30:52Z

@tkelman Just a quick update: I figured out why the NaNs were only being produced by single-precision real, and not double-precision real gemm. It has to do with default micro-kernel sizes. The default sgemm micro-kernel size is 8x4, whereas for dgemm it is 4x4. The latter perfectly matches your 4x4 test case, and thus there is no edge case handling needed there (unlike for sgemm). It is in this edge case handling where the NaNs sneak through. (If you were to change the test matrix size in your test driver, for the double-precision case, they would likely show up.)

tkelman · 2014-07-07T19:35:55Z

Yep, good catch, thanks for the heads up. 3x3 double lets nans through too. For 9x9 single or double, just the last row and last column are nans.

fgvanzee · 2014-07-08T15:16:34Z

@tkelman Should be fixed now. Honestly, I don't know how this bug went undetected for so long, aside from dumb luck that, up until now, none of the infs or NaNs were inside the test matrices' edge cases.

tkelman · 2014-07-08T15:40:35Z

Great, thanks Field. This is now passing Julia's blas tests for single and double. I'll see if I can reproduce any issues with complex in a standalone example and open a new issue if so.

tkelman · 2014-07-08T20:53:01Z

Okay I'm not sure what's at fault with the complex dot product, I don't think it's worth opening a new issue on. The problem only occurs with cdotu_ and cdotc_, they give the correct real part but wrong imaginary part. It might be Julia's C FFI getting slightly confused by the function returning a struct of 2 floats, though it also happens if I turn on BLIS_ENABLE_C99_COMPLEX.

I can work around the issue by applying this patch to Julia http://gist.github.com/43cece304e7247307e47 to account for the extra inputs to bli_cdotv relative to cblas_cdotu_sub. The good news is, with this change Julia's entire test suite passes using Blis in place of OpenBLAS.

fgvanzee · 2014-07-08T21:14:28Z

@tkelman Hmm, If the problem were about return value conventions, you would think it would affect zdotc_ and zdotu_ too.

I did a quick test on my end and was not able to reproduce any issue using 4x1 vectors.

Would it be easy to check exactly how wrong the imaginary result is? (Perhaps this is subtle numerical issue, which of course single-precision floats are quite vulnerable to.)

tkelman · 2014-07-08T21:48:10Z

It's far enough off that this isn't a matter of numerical differences.

julia> elty = Complex64; n32 = int32(10); inc32 = int32(1); result = Array(Complex64, 1);

julia> z1 = convert(Vector{elty}, complex(randn(10),randn(10)))
10-element Array{Complex{Float32},1}:
   0.530397-0.547332im
 -0.0749473-0.128646im
     1.5251-0.105443im
   0.785446-0.0529454im
   -1.02396+0.624246im
   0.743135-1.66501im
   -0.46022+0.759484im
  -0.952381-0.12021im
   -1.65526+0.263255im
  -0.316383+1.36757im

julia> z2 = convert(Vector{elty}, complex(randn(10),randn(10)))
10-element Array{Complex{Float32},1}:
 -0.538045+0.899379im
 -0.303094-0.121614im
   0.25909+0.881399im
 -0.968432+0.460136im
  -1.29269+0.17956im
   1.04964+0.295628im
  0.533711+0.66925im
  0.374576+0.708107im
 -0.310224+0.204892im
 -0.437401-0.184823im

julia> sum(z1.*z2)
2.2747393f0 - 1.5521777f0im

julia> ccall((:cdotu_, "libblis"), Complex64, (Ptr{Int32}, Ptr{Complex64}, Ptr{Int32}, Ptr{Complex64}, Ptr{Int32}), &n32, z1, &inc32, z2, &inc32)
2.2747393f0 + 0.39114362f0im

julia> ccall((:cdotu_, "libblas"), Complex64, (Ptr{Int32}, Ptr{Complex64}, Ptr{Int32}, Ptr{Complex64}, Ptr{Int32}), &n32, z1, &inc32, z2, &inc32)
2.2747393f0 + 0.39114362f0im

julia> ccall((:cblas_cdotu_sub, "libblas"), Complex64, (Int32, Ptr{Complex64}, Int32, Ptr{Complex64}, Int32, Ptr{Complex64}), n32, z1, inc32, z2, inc32, result)
-1.5521777f0 + 0.39114362f0im

julia> result[1]
2.2747393f0 - 1.5521777f0im

julia> result2 = Array(Complex64, 1);

julia> ccall((:bli_cdotv, "libblis"), Void, (Cint, Cint, Int32, Ptr{Complex64}, Int32, Ptr{Complex64}, Int32, Ptr{Complex64}), 0x0, 0x0, n32, z1, inc32, z2, inc32, result2)

julia> result2[1]
2.2747393f0 - 1.5521777f0im

So libblis and libblas (system reference blas from Ubuntu package) give the same wrong result from cdotu_, which leads me to believe it's a Julia C FFI quirk. Interestingly if I ask for a Complex64 (Julia's name for complex single, 32 bits each for real and imaginary parts) return value from cblas_cdotu_sub out of libblas, which should actually be returning void, I get the imaginary part of the correct dot product value returned in the real part, and the imaginary part of the incorrect dot product value returned in the imaginary part. I'm not sure where that value's coming from, but the connection is interesting.

tkelman · 2014-07-08T21:57:37Z

cc @andreasnoackjensen does this stuff with complex return types from ccall look familiar to you?

fgvanzee · 2014-07-08T22:17:28Z

@tkelman If netlib BLAS exhibits the problem too, then it sounds like it's on Julia's side. I'll let you take it from here! :)

ViralBShah · 2014-07-08T22:31:50Z

The cdotc stuff is ABI related to returning values from fortran functions. I wouldn't worry too much about it for now. I can't find the relevant julia issue right now.

ViralBShah · 2014-07-08T22:32:24Z

One possibility is to just use the CBLAS wrapper instead of calling it directly.

fgvanzee · 2014-07-08T22:41:53Z

FWIW, I ran into this issue a lot back when Kazushige Goto maintained GotoBLAS. It basically boiled down to making sure that the Fortran components of both the BLAS and the application were compiled with either -ff2c or -fno-f2c. (The important thing was that they matched, though I preferred the more sane -fno-f2c option.)

https://gcc.gnu.org/onlinedocs/gfortran/Code-Gen-Options.html

I assume the issue with Julia is something similar.

tkelman · 2014-07-08T22:44:16Z

Thanks Viral. ~~I'll have to check, I think the Ubuntu libblas package is cblas, rather than Netlib Fortran blas.~~ edit: nevermind, this is wrong

The Julia issue was JuliaLang/julia#5291 where we did switch to using cblas for these functions, the trouble is that BLIS doesn't recreate an exact copy of cblas. BLIS' C API is similar to cblas for this particular function, but more general.

ViralBShah · 2014-07-08T22:56:43Z

We can't use f2c with Julia, as we use the C calling convention for calling gfortran compiled libraries.

fgvanzee · 2014-07-08T22:59:29Z

We can't use f2c with Julia, as we use the C calling convention for calling gfortran compiled libraries.

That's probably for the best. f2c calling conventions are horrible. (I only mentioned it because the real/imaginary return value issue seemed familiar.)

Co-authored-by: Michael Yeh <myeh@sw01.internal.sifive.com>

fgvanzee closed this as completed Jul 8, 2014

fgvanzee reopened this Jul 8, 2014

tkelman closed this as completed Jul 8, 2014

songmaotian mentioned this issue Apr 22, 2016

multi thread crash #68

Closed

heroxbd mentioned this issue Feb 20, 2017

test segment fault on Intel Knight Landing #116

Closed

loveshack mentioned this issue Mar 5, 2018

knl build fails #169

Closed

loveshack mentioned this issue Apr 5, 2018

Add skx, knl to x86_64 configuration family #183

Closed

Aaron-Hutchinson pushed a commit to sifive/sifive-blis that referenced this issue Apr 4, 2023

Add invscalv, invertv (flame#12)

c33f542

Co-authored-by: Michael Yeh <myeh@sw01.internal.sifive.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent NaNs appearing in results from calling shared-library blis from Julia #12

Intermittent NaNs appearing in results from calling shared-library blis from Julia #12

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 4, 2014

Maratyszcza commented Jul 4, 2014

tkelman commented Jul 4, 2014

Maratyszcza commented Jul 4, 2014

tkelman commented Jul 4, 2014

tkelman commented Jul 5, 2014

ViralBShah commented Jul 5, 2014

tkelman commented Jul 5, 2014

fgvanzee commented Jul 5, 2014

tkelman commented Jul 5, 2014

tkelman commented Jul 5, 2014

fgvanzee commented Jul 5, 2014

fgvanzee commented Jul 7, 2014

tkelman commented Jul 7, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

tkelman commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

tkelman commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

Intermittent NaNs appearing in results from calling shared-library blis from Julia #12

Intermittent NaNs appearing in results from calling shared-library blis from Julia #12

Comments

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 3, 2014

fgvanzee commented Jul 3, 2014

tkelman commented Jul 4, 2014

Maratyszcza commented Jul 4, 2014

tkelman commented Jul 4, 2014

Maratyszcza commented Jul 4, 2014

tkelman commented Jul 4, 2014

tkelman commented Jul 5, 2014

ViralBShah commented Jul 5, 2014

tkelman commented Jul 5, 2014

fgvanzee commented Jul 5, 2014

tkelman commented Jul 5, 2014

tkelman commented Jul 5, 2014

fgvanzee commented Jul 5, 2014

fgvanzee commented Jul 7, 2014

tkelman commented Jul 7, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

tkelman commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

tkelman commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

fgvanzee commented Jul 8, 2014

tkelman commented Jul 8, 2014

ViralBShah commented Jul 8, 2014

fgvanzee commented Jul 8, 2014