-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiOp segfaults on nlpDenseCons_ex1 tests when built with default options #1
Comments
Thanks, will look into it once I find a Mac. Can you provide any info on LAPACK/BLAS u’re using? At first sight, it looks like the issue is related to that. |
I think it's linking to the Accelerate framework right now. I'll try linking to a different LAPACK library. |
I see BLAS calls in HiOp that look like, for instance, snippet of /Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/openblas-0.2.20-dxpaoysvipwyqcscj5n5j6t5cvqtldr2/lib/libopenblas.a(dgemv.o):
U ___assert_rtn
U ___stack_chk_fail
U ___stack_chk_guard
U _blas_memory_alloc
U _blas_memory_free
0000000000000000 T _dgemv_
U _dgemv_n
U _dgemv_t
U _dscal_k
U _xerbla_
0000000000000300 s l_dgemv_.gemv snippet of /Users/oxberry1/spack/opt/spack/darwin-sierra-x86_64/clang-8.1.0-apple/netlib-lapack-3.7.1-hxqwiepbfb3forawjin3gmj6rpb6cmxk/lib/libblas.a(dgemv.f.o):
0000000000000780 s EH_frame1
0000000000000000 T _dgemv_
U _lsame_
U _xerbla_
00000000000006c3 s lC1
00000000000006c4 s lC2
00000000000006c5 s lC3
00000000000006d0 s lC4
00000000000006c6 s lC5 The compiler stack I've been using is Apple Clang 8.1 (from XCode 8.3) and gfortran 7.2.0 (from GCC 7.2.0). There are a bunch of ways around this issue, but none of them are quick:
For now, I'll stick to using HiOp on the clusters, because I know it works there. If I need to run HiOp on my laptop for some reason, I can consult with you and put together a patch, if you're interested. |
@junkudo has a fix for the fortran name mangling and will get in here soon... |
I can fix the fortran name mangling this weekend. :) |
I get segfaults as well using clang5 on linux when I try to run the examples. With GCC everything works fine. |
Julian, thanks for reporting. Apparently, hiop has all kind of issues with clang. clang5 means it's coming with llvm5, or its clang version 5.0 ? I only have clang v3.4.2 on my linux box. clang --version |
It's clang version 5.0 from the llvm ubuntu repositories. Let me know if I can give further information which could help.
|
Just for your info. The disassembly states a UD2 instruction. This means clang recognized undefined behavior in the code. It's located in bool hiopHessianLowRank::updateLogBarrierDiagonal(const hiopVector& Dx) A run with -fsanitize=undefined returns
|
@goxberry : I've finally got my hands on a mac laptop and tested the solver. @junkudo 's fortran name mangling works like a champ. I've only fixed a couple of compilation warnings. Everything works fine. I've used clang + gfortran + blas from accelarate (thanks again for the instructions!) It would be awesome if you could give it try on your system. |
@jandrej : tried -fsanitized and could not replicate your errors. Probably because those problems were fixed in the meanwhile -- I did a lot of valgrinding on linux on the library under many use cases within mfem recently and and fixed a couple of uninitialized memory accesses. It would be awesome if you can check again and see what happens. thanks |
My cmake command is
which still produces the runtime error when running ex1
using the latest version of the repo. |
Thank you both for patching this issue! I’m currently on travel, and will take a look after I return, probably no later than Monday. |
thanks! I did get runtime errors with your cmake command, one coming from hiopMatrix.cpp:137:56, though they were all related to allocating an array of size 0. It may be that we're using different versions of clang (?). In any case, I've addressed all the errors I was seeing. Could you please pull the master and see what you're getting? |
The example nlpDenseCons_ex1 is not crashing but still throws the "runtime error" from clang sanitize. |
could not replicate on my redhat machine. I've used clang 3.4.2 though. @goxberry Could you please see if you get any errors with fsanitize when you're testing the fix for this issue? Use something like rm -rf *; CC=clang CXX=clang++ cmake -DCMAKE_CXX_FLAGS="-fsanitize=nullability,undefined,integer,alignment" -DHIOP_USE_MPI=ON -DHIOP_DEEPCHECKS=ON -DCMAKE_BUILD_TYPE=DEBUG ..; make -j4 and run ./src/Drivers/nlpDenseCons_ex1.exe |
yeah, it's strange we don't get the runtime error of @jandrej looked at the code again, and, apparently passing null pointers to memcpy is not allowed even when the number of bytes to be copied is zero. I safequarded memcpy from null pointers in the latest commit. @jandrej : is it too much to ask to try again? :) |
The last commit seemed to clear the errors for clang! I don't see any warnings/errors anymore during the runs of the examples. |
great! closing the issue. |
Building HiOp with default options on macOS 10.12.6 and running the tests yields errors for the nlpDenseCons_ex1 tests. Log files can be found at: https://gist.github.com/goxberry/8bdc80e6dcd4d15ed0a7c5130009d6aa
The configuration I'm using is built by spack, so I have some flexibility in choosing libraries, but all of these libraries are included via RPATH directives. My impression is that linking isn't an issue, but I could be wrong about that.
The text was updated successfully, but these errors were encountered: