Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: linprog, with highs is killed by the OOM killer when called with dense and large matrices #15888

Open
brunorigal opened this issue Mar 28, 2022 · 13 comments · Fixed by #19255
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.optimize scipy.sparse
Milestone

Comments

@brunorigal
Copy link

Describe your issue.

Hi,
I have a bug when using the highs linprog solver with the code below.

The script is killed by the OOM killer. The problem arises when calling csc_matrix, which itself builds a coo_matrix, which calls the nonzeros method which is supposed to return the indices of non zero (far too many in this dense case).

Is it necessary to call csc_matrix here? It seems very inefficient for dense matrices.
If such a transformation is necessary, it would be better that the code returns a python error to handle the problem correctly.

Reproducing Code Example

from scipy.optimize import linprog
import numpy as np
np.random.seed(0)
n_ctr = 500000
n_var = 500
c_ = np.ones(n_var)
A_ub = np.random.random((n_ctr, n_var))
b_ub = np.zeros((n_ctr, ))

res = linprog(c_, A_ub=A_ub, b_ub=b_ub, method='highs-ds', bounds=(0, None), options={'disp': True})

print('End')

Error message

150075 killed     python highs.py

SciPy/NumPy/Python version information

scipy==1.7.3, numpy==1.22.2, python==3.8.10

@brunorigal brunorigal added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Mar 28, 2022
@mdhaber
Copy link
Contributor

mdhaber commented Mar 30, 2022

@mckib2, is there a different interface for dense matrices or other sparse matrix formats? (I'm not aware of one.)

@brunorigal does method='interior-point' fare any better here? (We were deprecating it, but I wonder if this would be a reason to keep it around.)

@mckib2
Copy link
Contributor

mckib2 commented Mar 31, 2022

@mckib2, is there a different interface for dense matrices or other sparse matrix formats? (I'm not aware of one.)

No, HIGHS will always expect a sparse constraint matrix here. Is there potentially a better/more direct way to get to a csc_matrix on the Python end that would accommodate these kinds of large, dense matrices? All HiGHS needs are the 3 arrays: indices, indptr, and data

@brunorigal
Copy link
Author

On the same script the interior point method raises an _ArrayMemoryError error, which is better.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 31, 2022

OK so @brunorigal we ultimately need a csc_matrix representation, so it is reasonable to call csc_matrix. Is the failure actually happening during the call to csc_matrix? (In that case, I think the question of how best to convert from dense to CSC matrix would be a sparse issue, not an optimize issue.)

@brunorigal
Copy link
Author

What about something like this specially for dense matrices:

n_ctr, n_var = A.shape
A_indptr = (np.arange(n_ctr+1)*n_var).astype(np.int32)
A_indices =  (np.meshgrid(np.arange(n_ctr), np.arange(n_var))[0].reshape(-1)).astype(np.int32)
A_data = A.reshape(-1, order='F')

res = _highs_wrapper(c, A_indptr, A_indices, A_data, lhs, rhs, lb, ub, options)

in replacement for A = csc_matrix(A)

I checked it yields the same results than csc_matrix on this problem. However, the same problem is then observed in the highs wrapper.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 31, 2022

in replacement for

But why would the code go in optimize and not in sparse.csc_matrix itself? It sounds like the problem was experienced in the context of using linprog, but is it very unlikely to be a problem in other contexts?

However, the same problem is then observed in the highs wrapper.

Which problem, specifically? "The script is killed by the OOM killer"?

@brunorigal
Copy link
Author

I guess you are right, it could be a part of the csc_matrix function.

The "same problem" does refer to the OOM killer (it would then be a problem for the highs team).

@mdhaber
Copy link
Contributor

mdhaber commented Mar 31, 2022

Ok, I'll add the sparse label.
@mckib2 it sounds like if this gets past csc_matrix, HiGHS should fail more gracefully.

@mckib2
Copy link
Contributor

mckib2 commented Jun 29, 2022

The "same problem" does refer to the OOM killer (it would then be a problem for the highs team).

Looking into this a little more, HiGHS underneath relies on std::vector which does not support copy-less assignment from C arrays. Even the C interface incurs copies when initializing.

The stack overflow answer above suggests that a custom allocator could be used that simply "reuses" the array allocation for the vector, but it's not clear to me currently how well Cython supports custom C++ allocators/deleters.

EDIT: followup: without C++17 features and a big rewrite of the HiGHS C++ interfaces, achieving this with custom allocators is probably not possible

HiGHS should fail more gracefully

By checking size of matrices and available memory? I'm not sure that's necessary or desirable, it seems standard to let the user try even to the point of crashing. What would we consider the "correct" behavior for the HiGHS wrapper?

@brunorigal
Copy link
Author

brunorigal commented Jun 29, 2022

I do agree that checking the size of matrices is perhaps not the best solution. However, it would be desirable that the code fail with a python error instead of crashing python. For example if I try to allocate a very large matrix, numpy raises an ArrayMemoryError, which is good because I can catch it.

@mdhaber
Copy link
Contributor

mdhaber commented Dec 17, 2022

@mckib2 is it possible for a Python error to be raised like NumPy's ArrayMemoryError?

@mdhaber
Copy link
Contributor

mdhaber commented May 12, 2023

@mckib2 quick question here: is it possible for a Python error to be raised like NumPy's ArrayMemoryError?

@rgommers
Copy link
Member

By checking size of matrices and available memory? I'm not sure that's necessary or desirable, it seems standard to let the user try even to the point of crashing.

Crashing is always a bug. I see slightly more verbose output with the reproducer in the issue description, but that should be considered a bug:

Running HiGHS 1.2.0 [date: 2021-07-09, git hash: n/a]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
Aborted (core dumped)

What would we consider the "correct" behavior for the HiGHS wrapper?

It can't be fixed in the wrapper if it's crashing in HiGHS itself. Looking at how memory allocations in HiGHS, there is a random mix of usages of malloc, new and make_unique, and the malloc instances seem unprotected by checks for a null return value.

It seems to be like the HiGHS code should be audited for all memory allocations and improved, and once that is done it's probably a matter of catching std::bad_alloc in the Cython wrapper and converting that into a Python exception.

HaoZeke added a commit to HaoZeke/scipy that referenced this issue Jul 30, 2023
More of a dodge than a fix, but will essentially work with exceptions now.
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Jul 30, 2023
More of a dodge than a fix, but will essentially work with exceptions now.
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Aug 6, 2023
More of a dodge than a fix, but will essentially work with exceptions now.
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Oct 21, 2023
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Oct 22, 2023
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Oct 22, 2023
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Oct 24, 2023
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Oct 28, 2023
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Dec 3, 2023
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Dec 3, 2023
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
rgommers pushed a commit to HaoZeke/scipy that referenced this issue Jan 17, 2024
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Feb 3, 2024
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Feb 17, 2024
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Mar 12, 2024
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
@rgommers rgommers added this to the 1.14.0 milestone Apr 25, 2024
@j-bowhay j-bowhay reopened this Apr 27, 2024
@h-vetinari h-vetinari modified the milestones: 1.14.0, 1.15.0 May 28, 2024
HaoZeke added a commit to HaoZeke/scipy that referenced this issue Jul 20, 2024
MAINT: Switch to the scipy fork of highs

TST: xfail known flaky test

Co-authored-by: mckib2 <mckib2@users.noreply.github.com>

MAINT: Add a note on .gitmodule location

Co-authored-by: h-vetinari <h-vetinari@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.optimize scipy.sparse
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants