Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outstanding allocations on cleanup in BuddyAllocator #81

Closed
svigerske opened this issue May 2, 2022 · 7 comments
Closed

outstanding allocations on cleanup in BuddyAllocator #81

svigerske opened this issue May 2, 2022 · 7 comments

Comments

@svigerske
Copy link
Contributor

In coin-or/Ipopt#530, issues with running Spral SSIDS under Ipopt compiled with GCC on Windows were reported.

ssids_factor terminates with

 Warning from ssids_factor. Warning flag =   8
Matching-based ordering used but associated scaling ignored                                                                                                                                             
terminate called after throwing an instance of 'std::runtime_error'
  what():  outstanding allocations on cleanup

See coin-or/Ipopt#530 for more issues.

CC @tyronerees @jfowkes

@jfowkes jfowkes added the bug label Nov 14, 2022
@jfowkes jfowkes added this to the New Release milestone Nov 14, 2022
@jfowkes
Copy link
Contributor

jfowkes commented Nov 14, 2022

This definitely looks like an issue deep within SPRAL:
https://github.com/ralna/spral/blob/master/src/ssids/cpu/BuddyAllocator.hxx#L105
Looks like some memory doesn't get deallocated on cleanup when it should.

@jfowkes
Copy link
Contributor

jfowkes commented Jun 14, 2023

@mjacobse I don't suppose you've run into this issue at all? The exception comes from deep within SPRAL but I don't understand why we're only seeing it on Windows:

~Page() noexcept(false) {
if(next_ && head_[nlevel-1] != 0)
throw std::runtime_error("outstanding allocations on cleanup\n");

Full backtrace from the original IPOPT bug report (coin-or/Ipopt#530):

Thread 1 hit Catchpoint 1 (exception thrown), 0x000000000062bb38 in __cxa_throw ()
(gdb) bt
#0  0x000000000062bb38 in __cxa_throw ()
#1  0x000000000066eb74 in spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >::~Page (this=<optimized out>, __in_chrg=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/ext/new_allocator.h:89
#2  std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > (__pointer=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/stl_construct.h:98
#3  std::_Destroy_aux<false>::__destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*> (__last=<optimized out>, 
    __first=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/stl_construct.h:108
#4  std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*> (__last=<optimized out>, __first=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/stl_construct.h:137
#5  std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*, spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > (__last=0x16414d0, __first=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/stl_construct.h:206
#6  std::vector<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >, std::allocator<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > >::~vector (this=0x1621090, __in_chrg=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/stl_vector.h:677
#7  spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >::~Table (this=0x1621080, __in_chrg=<optimized out>)
    at ../src/ssids/cpu/BuddyAllocator.hxx:286
#8  std::_Sp_counted_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (
    this=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr_base.h:377
#9  0x00000000005c78a2 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x16411f0)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr_base.h:148
#10 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (
    this=0x16411f0)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr_base.h:148
#11 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (
    this=0x1620f40, __in_chrg=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr_base.h:730
#12 std::__shared_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x1620f38, 
    __in_chrg=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr_base.h:1169
#13 std::shared_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> > >::~shared_ptr (this=0x1620f38, __in_chrg=<optimized out>)
    at c:/program files/gnu octave/octave-6.2.0/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/c++/bits/shared_ptr.h:103
#14 spral::ssids::cpu::BuddyAllocator<double, std::allocator<double> >::~BuddyAllocator (this=0x1620f38, __in_chrg=<optimized out>)
    at ../src/ssids/cpu/BuddyAllocator.hxx:382
#15 spral::ssids::cpu::NumericSubtree<false, double, 8388608ull, spral::ssids::cpu::AppendAlloc<double> >::~NumericSubtree (this=<optimized out>, 
    __in_chrg=<optimized out>) at ../src/ssids/cpu/NumericSubtree.hxx:278
#16 spral_ssids_cpu_destroy_num_subtree_dbl (posdef=<optimized out>, 
    target=0x1620f20) at ../src/ssids/cpu/NumericSubtree.cxx:70
#17 0x00000000005c2e6c in spral_ssids_cpu_subtree::factor (this=..., 
    posdef=.FALSE., aval=..., child_contrib=..., options=..., inform=..., 
    scaling=...) at ../src/ssids/cpu/subtree.f90:273
#18 0x00000000005c0a63 in spral_ssids_fkeep::inner_factor_cpu (fkeep=..., 
    akeep=..., val=..., options=..., inform=...) at ../src/ssids/fkeep.F90:156
#19 0x000000000059ddab in spral_ssids::ssids_factor_ptr64_double (
    posdef=.FALSE., val=..., akeep=..., fkeep=..., options=..., inform=..., 
    scale=..., ptr=..., row=...) at ../src/ssids/ssids.f90:1044
#20 0x000000000059e4e8 in spral_ssids::ssids_factor_ptr32_double (
    posdef=.FALSE., val=..., akeep=..., fkeep=..., options=..., inform=..., 
    scale=..., ptr=..., row=...) at ../src/ssids/ssids.f90:757
#21 0x000000000059a06b in spral_ssids_factor_ptr32 (cposdef=<optimized out>, 
    cptr=<optimized out>, crow=<optimized out>, val=..., 
    cscale=<error reading variable: Attempt to dereference a generic pointer.>, cakeep=<error reading variable: Attempt to dereference a generic pointer.>, 
    cfkeep=0x16203c0, coptions=..., cinform=...)
    at ../interfaces/C/ssids.f90:599
#22 0x000000000056424e in Ipopt::SpralSolverInterface::MultiSolve(bool, int const*, int const*, int, double*, bool, int) ()

Looks like the same issue has also been reported in jump-dev/Ipopt.jl#374

@mjacobse
Copy link
Collaborator

I have not come across this, no, but I'd never tried SSIDS on Windows.

I gave it a try just now, but at least confined to within the SPRAL sources I could not reproduce this. I did run into a few compile-time issues (missing <cstdint> include in ThreadStats.hxx, unavailability of posix_memalign in AlignedAllocator.hxx), but after fixing those the ssids_test.exe runs and passes all tests just fine. The example examples/C/ssids.exe worked as expected too. The spral_ssids.exe runner also seemed to work just fine on some test matrices from the SuiteSparse matrix collection. Wonder if the crash happens with any matrix or just particular ones? Would be good to have one of those in the latter case.

I do see one failure in the test_maxloc_torture test within ssids_kernel_test, and the ssmfe_ciface_test runner seems to break with double complex, but I doubt those are related to this issue.

@jfowkes
Copy link
Contributor

jfowkes commented Jun 16, 2023

Many thanks for testing, it may be that this is an issue with the IPOPT SPRAL interface on Windows.

@svigerske
Copy link
Contributor Author

The version that didn't work for me was 6c56924. I tried again with the current master and now the test in Ipopt (hs071) passes!

git bisect points me to 0c2b9d2 as the version that probably fixed the problem (this commit doesn't compile), which makes sense as that fixed an assumption on long being 64bit, which doesn't hold on windows (#96).

@amontoison
Copy link
Member

amontoison commented Aug 18, 2023

@svigerske
Should we not replace all long by int64_t in the file IpSpralSolverInterface.cpp of Ipopt?

I added a test with the linear solver SPRAL (release 2023.8.2) today for the Julia interface of Ipopt and it works on all platforms: jump-dev/Ipopt.jl#380 🚀

@svigerske
Copy link
Contributor Author

Ah yes, I should adapt to the API change of Spral. The current long's don't create problems here, but it would be cleaner to use int64_t now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants