Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault while solving basis pursuit problem with convex.jl #24

Closed
eamartin opened this issue Apr 25, 2015 · 10 comments
Closed

Comments

@eamartin
Copy link

I have the following code for a basis pursuit simulation:

using Convex
import SCS.SCSSolver
solver = SCSSolver(verbose=0);

function sparse_recovery(matrix_gen; n=100, k_range=1:20, trials=100)
    for k = k_range
        for m = [[1:4]; [5:5:n]]
            recoveries = 0.
            for i = 1:trials
                A = matrix_gen(m, n)
                x0 = sprandn(n, 1, k / n)
                y = A * x0

                x = Variable(n)
                problem = minimize(norm(x, 1), [y == A * x])
                solve!(problem, solver)

                # perfect recovery is infinity norm < 0.001
                recoveries += (norm(x.value - x0, Inf) < 0.001)
            end

            success_rate = recoveries / trials

            if success_rate <= 0.05 || success_rate >= 0.95
                @printf("%d-sparse, %d measures: %.2f recovery\n", k, m, success_rate)
            end

            if success_rate >= 0.95
                break
            end
        end
        println()
    end
end

sparse_recovery(randn)

Running this works for a while (several thousand calls to solve!) but eventually seg-faults:

signal (11): Segmentation fault
unNormalizeA at /home/emartin/.julia/v0.3/SCS/deps/src/scs-1.0.7/linsys/common.c:198
scs_finish at /home/emartin/.julia/v0.3/SCS/deps/src/scs-1.0.7/src/scs.c:706
SCS_finish at /home/emartin/.julia/v0.3/SCS/src/low_level_wrapper.jl:42
jlcall_SCS_finish_21851 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
SCS_solve at /home/emartin/.julia/v0.3/SCS/src/high_level_wrapper.jl:131
julia_SCS_solve_21832 at  (unknown line)
jlcall_SCS_solve_21832 at  (unknown line)
optimize! at /home/emartin/.julia/v0.3/SCS/src/SCSSolverInterface.jl:145
jlcall_optimize!_21830 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
solve! at /home/emartin/.julia/v0.3/Convex/src/solution.jl:58
julia_solve!_21465 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
sparse_recovery at none:16
julia_sparse_recovery_21384 at  (unknown line)
jlcall_sparse_recovery_21384 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: -1343619028)
unknown function (ip: -1343624175)
unknown function (ip: -1343562680)
jl_f_top_eval at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
eval_user_input at REPL.jl:53
jlcall_eval_user_input_19972 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
anonymous at task.jl:95
jl_handle_stack_switch at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
julia_trampoline at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 4199453)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 4199509)
zsh: segmentation fault (core dumped)  julia

I'm not sure if the bug lies in Convex.jl, SCS.jl, or SCS. I'm debugging now with gdb, will update issue as I learn more.

@eamartin
Copy link
Author

For reference, I'm running with Julia 0.3.7 with Convex.jl 0.0.4 and SCS.jl 0.0.3 (which is based on SCS 1.0.7). I'm on 64 bit Ubuntu 12.04.

I did a bit of debugging with gdb.
The segfault occurs at https://github.com/cvxgrp/scs/blob/v1.0.7/linsys/common.c#L198

At the time of the crash, j=0. The D[A->i[j]] access is causing the segfault.
GDB session

(gdb) p A->i[0]
$27 = 516524992
(gdb) p A->i[1]                                                                                                                                                                                                      
$28 = 140737332598648
(gdb) p A->i[2]                                                                                                                                                                                                      
$29 = 0
(gdb) p A->i[3]                                                                                                                                                                                                      
$30 = 0
(gdb) p A->i[4]                                                                                                                                                                                                      
$31 = 5
(gdb) p A->i[5]                                                                                                                                                                                                      
$32 = 6
(gdb) p A->i[6]                                                                                                                                                                                                      
$33 = 7
(gdb) p A->i[7]                                                                                                                                                                                                      
$34 = 8
(gdb) p A->i[8]                                                                                                                                                                                                      
$35 = 9

It looks like A->i should point to A->i + 2, because everything index 2 on looks like a valid part of the CSC data structure (and D[A->i[2]] does not cause a segfault, same if you replace 2 with 3, 4, etc).

Debugging further is beyond my interest (and time availability) at the moment, but I hope this is useful in fixing the bug.

@IainNZ
Copy link
Contributor

IainNZ commented Apr 25, 2015

It'd be good to file this on the SCS repository too - I'm not sure the maintainer of SCS looks/is subscribed to the issues for the Julia wrapper

@mlubin
Copy link
Member

mlubin commented Apr 25, 2015

I'd also recommend trying a different LP solver like Clp.

@eamartin
Copy link
Author

I am solving my problem using ECOS. Interestingly, I had a similar bug with ECOS that happened twice but then went away (even with a fixed random seed at the beginning of the program, it failed once and then didn't fail again).

@mlubin
Copy link
Member

mlubin commented Jul 10, 2015

Does this still happen with the latest version of SCS.jl?

@madeleineudell
Copy link
Contributor

In fact, Convex.jl can call an LP solver when the 1-norm is used; we can clarify that in the documentation. You can take a look at the norm atom to see exactly what we do.

@eamartin
Copy link
Author

I'm unsure if the error is fixed (since I need to run the code for a long time to find it). After a decent number of trials, I haven't seen a crash yet. However, I do get relatively frequent

WARNING: Problem status UnknownError; solution may be inaccurate.

error messages. I'm doing some more testing now and will report back if I get a segfault or any other kind of failure.

@bodono
Copy link
Contributor

bodono commented Jul 12, 2015

This is still happening (julia v0.4, SCS 1.1.5, latest SCS.jl). I ran it for a long time to get:

----------------------------------------------------------------------------
    SCS v1.1.5 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 4001
eps = 1.00e-04, alpha = 1.80, max_iters = 20000, normalize = 1, scale = 5.00
Variables n = 201, constraints m = 236
Cones:  primal zero / dual free vars: 36
    linear vars: 200
ERROR: ReadOnlyMemoryError()
 in SCS_solve at /Users/bodonoghue/.julia/v0.4/SCS/src/low_level_wrapper.jl:44
 in SCS_solve at /Users/bodonoghue/.julia/v0.4/SCS/src/high_level_wrapper.jl:140
 in SCS_solve at /Users/bodonoghue/.julia/v0.4/SCS/src/high_level_wrapper.jl:154
 in optimize! at /Users/bodonoghue/.julia/v0.4/SCS/src/SCSSolverInterface.jl:154
 in solve! at /Users/bodonoghue/.julia/v0.4/Convex/src/solution.jl:53
 in sparse_recovery at none:12

Does anybody know what exactly the ReadOnlyMemoryError() actually means?

I think this is caused by a memory leak. While running this script the memory used by Julia creeps up, in the screenshot below it's taking up a whopping 2.4Gb of memory, just solving these tiny problems.

screen shot 2015-07-12 at 1 52 09 pm

My best guess is that the leak is actually in Convex.jl (jump-dev/Convex.jl#83), since the exact same thing happens using ECOS (at least the memory usage increase), and I can't reproduce the problem by calling SCS directly.

@arsenal9971
Copy link

You are trying to solve the problem with equality constraints with arbitrary precision try to reformulate the part of the problem by minimizing the same objective function with a precision epsilon
solver = SCSSolver(verbose=0); x = Variable(n) problem = minimize(norm(x, 1), norm(y - A * x)<=epsilon) solve!(problem, solver)

@mlubin
Copy link
Member

mlubin commented Dec 2, 2017

This was possibly caused by #90.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants