-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with Linux (Arch) #25
Comments
When you use julia> ]
pkg> test PRIMA |
Hello, I have also encountered the I have tested the issue on multiple devices. This is what information I could gather;
Unfortunately, the error message does not provide any information (not even a stacktrace), so this may be difficult to debug. I will try to provide as much information as I can. MWEConsider the following MWE; using PRIMA
function prima_serial(; tasks=1)
obj = (x) -> abs(5. - x[1])
start = [0.]
results = [newuoa(obj, start)[1] for _ in 1:tasks]
end
function prima_parallel(; tasks=1)
obj = (x) -> abs(5. - x[1])
start = [0.]
tasks = [Threads.@spawn newuoa(obj, start)[1] for _ in 1:tasks]
results = fetch.(tasks)
end
Note that I am running only a single task when parallelizing. So there actually are not multiple PRIMA instances running in parallel. But somehow it causes errors on some devices anyway. Test ResultsThe following table summarizes the results of running the two functions
(Note that Correct output: julia> prima_serial()
1-element Vector{Vector{Float64}}:
[5.0000000000000115]
julia> Error message for julia> prima_serial()
ERROR: StackOverflowError:
julia> Error message for julia> prima_parallel()
ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:352 [inlined]
[2] fetch
@ ./task.jl:372 [inlined]
[3] _broadcast_getindex_evalf
@ ./broadcast.jl:709 [inlined]
[4] _broadcast_getindex
@ ./broadcast.jl:682 [inlined]
[5] getindex
@ ./broadcast.jl:636 [inlined]
[6] copy
@ ./broadcast.jl:942 [inlined]
[7] materialize
@ ./broadcast.jl:903 [inlined]
[8] prima_parallel(; tasks::Int64)
@ Main ~/julia-sandbox/prima_parallel/test.jl:15
[9] prima_parallel()
@ Main ~/julia-sandbox/prima_parallel/test.jl:10
[10] top-level scope
@ REPL[3]:1
nested task error: StackOverflowError:
julia> Device SpecificationsPC-1 and PC-2 are my personal computers. PC-2 has both Windows and Linux on dualboot. Cluster-1 and Cluster-2 are academic clusters that I have access to. The information below contains specs of both the "login" and "work" nodes from the clusters. I've tested the MWE on both the login and work nodes and the behavior does not differ between them. PC-1OS: Microsoft Windows 10 Pro Julia version: 1.10.2 PC-2 (Windows)OS: Microsoft Windows 10 Home Julia version: 1.10.2 PC-2 (Linux)OS: Ubuntu 20.04.6 LTS Julia version: 1.10.2 Cluster-1Login Node: Work Node: Julia version: 1.10.0 Cluster-2Login Node: Work Node: Julia version: 1.10.2 PC-3 (MacOS)OS: macOS (arm64-apple-darwin22.4.0) Julia version: 1.11.1 Let me know if I can help with any additional information or testing. :) EDIT-1: Added PC-2 (Linux) VSCode and "non-VSCode" versions to the test result table. EDIT-2: Added PC-3 (MacOS) |
I have run Test Summary: | Pass Total Time
PRIMA.jl | 81 81 12.9s
Testing PRIMA tests passed |
Thank you for all these details. I have tested your examples on my Linux laptop (Ubuntu 23.10 with 6.0.0 kernel) with the following results: julia> prima_serial()
1-element Vector{Vector{Float64}}:
[5.0000000000000115]
julia> prima_parallel()
ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:352 [inlined]
[2] fetch
@ ./task.jl:372 [inlined]
[3] _broadcast_getindex_evalf
@ ./broadcast.jl:709 [inlined]
[4] _broadcast_getindex
@ ./broadcast.jl:682 [inlined]
[5] getindex
@ ./broadcast.jl:636 [inlined]
[6] copy
@ ./broadcast.jl:942 [inlined]
[7] materialize
@ ./broadcast.jl:903 [inlined]
[8] prima_parallel(; tasks::Int64)
@ Main ./REPL[3]:6
[9] prima_parallel()
@ Main ./REPL[3]:1
[10] top-level scope
@ REPL[7]:1
nested task error: StackOverflowError: So the serial version worked, not the parallel one. Note that the serial version also worked for Are you sure that the serial version failed on your PC-2 (Linux)? For the parallel version, I can see some questions that need to be answered:
|
In Julia, the use of |
Yes |
Ok, I see 👍 |
I have tested it again to be sure. The serial version really fails on my Linux PC but only in Julia started by VSCode. When I run the two functions from Julia REPL started by the VSCode's Julia extension ( When I start Julia REPL from bash myself, only the parallel version fails and the serial works fine as on other linux devices. I don't know what to make of this, but at least it is consistent when tried multiple times. Version infoThe only difference in julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
LD_LIBRARY_PATH = :/opt/gurobi10.0.0_linux64/gurobi1000/linux64/lib
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
julia> |
Ok that's really puzzling... We have started to figure out a way to deal with thread-safety (and hierarchical optimization) differntly than currently done in |
Hi, I have the same issue in my Linux machine, with Has there been any progress on this bug? OS: Ubuntu 22.04.4 LTS |
Hi, got my hands on a Mac, so I've tested the parallelization on it as well. The So the issue seems to be purely Linux-related. |
Interestingly, some tests do not pass on my Mac. I believe this to be unrelated to this parallelization issue, so I've created a new issue. |
It seems to be that the Julia package have a problem when running in a Linux machine. I have the stackoverflow error after running the newuoa algorithm, this problem doesn't occur in my Windows partition. In Windows I have the Intel compiler for Fortran, in Linux just the gcc compiler. Which libraries do you require to run the package on Linux?
Regards,
The text was updated successfully, but these errors were encountered: