Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using GAP fails when julia is using multiple threads #960

Closed
mjrodgers opened this issue Jan 19, 2024 · 8 comments
Closed

using GAP fails when julia is using multiple threads #960

mjrodgers opened this issue Jan 19, 2024 · 8 comments

Comments

@mjrodgers
Copy link

mjrodgers commented Jan 19, 2024

Edit (@thoma): As a workaround, start julia with --gcthreads=1.

When starting julia using multiple threads (julia -t 4), using GAP can fail. This also seems to leave julia in a fragile state, and I get a segfault when quitting.

Interestingly, using GAP seems to work fine if I launch julia using 3 threads, but 4 is a problem.

I'm running on an Intel Mac running Sonoma 14.2.1, julia 1.10.0, GAP 0.10.1

(base) ➜  ~ julia -t 4
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using GAP
 ┌───────┐   GAP 4.12.2 of 2022-12-18
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-apple-darwin14-julia1.10-64-kv8
 Configuration:  gmp 6.2.1, Julia GC, Julia 1.10.0, readline
 Loading the library Error, IS_SUBSET_FLAGS: <flags1> must be a flags list (not a plain list) in
  IS_SUBSET_FLAGS( imp2[1], imp[2]
 ) at /Users/mrodgers/.julia/artifacts/b5c2f0f824457e5c391fb24916f94d5d91c62c4f/share/gap/lib/filter.g:151 called from
InstallTrueMethodNewFilter( tofilt, from
 ); at /Users/mrodgers/.julia/artifacts/b5c2f0f824457e5c391fb24916f94d5d91c62c4f/share/gap/lib/filter.g:298 called from
<function "InstallTrueMethod">( <arguments> )
 called from read-eval loop at /Users/mrodgers/.julia/artifacts/b5c2f0f824457e5c391fb24916f94d5d91c62c4f/share/gap/lib/pcgsspec.gd:253
ERROR: InitError: GAP variable _JULIAINTERFACE_ERROR_BUFFER not bound
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] getproperty(::GAP.GlobalsType, name::Symbol)
    @ GAP ~/.julia/packages/GAP/aJO9M/src/globals.jl:42
  [3] error_handler()
    @ GAP ~/.julia/packages/GAP/aJO9M/src/GAP.jl:72
  [4] initialize(argv::Vector{String})
    @ GAP ~/.julia/packages/GAP/aJO9M/src/GAP.jl:154
  [5] __init__()
    @ GAP ~/.julia/packages/GAP/aJO9M/src/GAP.jl:294
  [6] run_module_init(mod::Module, i::Int64)
    @ Base ./loading.jl:1128
  [7] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1116
  [8] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
    @ Base ./loading.jl:1061
  [9] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
    @ Base ./loading.jl:1575
 [10] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:1932
 [11] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:1806
 [12] #invoke_in_world#3
    @ Base ./essentials.jl:921 [inlined]
 [13] invoke_in_world
    @ Base ./essentials.jl:918 [inlined]
 [14] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:1797
 [15] macro expansion
    @ Base ./loading.jl:1784 [inlined]
 [16] macro expansion
    @ Base ./lock.jl:267 [inlined]
 [17] __require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1747
 [18] #invoke_in_world#3
    @ Base ./essentials.jl:921 [inlined]
 [19] invoke_in_world
    @ Base ./essentials.jl:918 [inlined]
 [20] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1740
during initialization of module GAP

julia>

julia> exit()

[37561] signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1
ExecProccall0args at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
ExecSeqStat at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
EXEC_CURR_FUNC at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
DoExecFunc1args at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
DoOperation1Args at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
DoProperty at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
EvalFunccall1args at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
EvalUnknownBool at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
EvalNot at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
ExecWhile2 at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
ExecSeqStat2 at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
EXEC_CURR_FUNC at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
DoExecFunc0args at /Users/mrodgers/.julia/artifacts/a7bcd955e05e9f268114b41a0606a2fc5e3dbb06/lib/libgap.8.dylib (unknown line)
_call_gap_func at /Users/mrodgers/.julia/packages/GAP/aJO9M/src/ccalls.jl:318 [inlined]
GapObj at /Users/mrodgers/.julia/packages/GAP/aJO9M/src/ccalls.jl:301
unknown function (ip: 0x10250361c)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
#1 at /Users/mrodgers/.julia/packages/GAP/aJO9M/src/GAP.jl:256
unknown function (ip: 0x1025032f1)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
_atexit at ./initdefs.jl:428
jfptr__atexit_79385.1 at /Applications/Julia-1.10.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/./julia.h:1982 [inlined]
ijl_atexit_hook at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/init.c:280
ijl_exit at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/init.c:207
exit at ./initdefs.jl:28 [inlined]
exit at ./initdefs.jl:29
jfptr_exit_79210.1 at /Applications/Julia-1.10.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/./julia.h:1982 [inlined]
do_call at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_body at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/toplevel.c:943 [inlined]
ijl_toplevel_eval_in at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
#run_repl#59 at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91808.1 at /Applications/Julia-1.10.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
#1013 at ./client.jl:432
jfptr_YY.1013_82797.1 at /Applications/Julia-1.10.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/./julia.h:1982 [inlined]
jl_f__call_latest at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:887 [inlined]
invokelatest at ./essentials.jl:884 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82823.1 at /Applications/Julia-1.10.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/./julia.h:1982 [inlined]
true_main at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-grannysmith-C07ZQ07RJYVY.0/build/default-grannysmith-C07ZQ07RJYVY-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
Allocations: 848880 (Pool: 846702; Big: 2178); GC: 3
[1]    37561 segmentation fault  julia -t 4
@ThomasBreuer
Copy link
Member

I cannot reproduce this problem under Ubuntu 20.04 with Julia 1.8.5 and 1.9.0, with various values for the -t command line option. Would it make sense to try other Julia versions with this setup?

@mjrodgers
Copy link
Author

mjrodgers commented Jan 19, 2024 via email

@benlorenz
Copy link
Member

benlorenz commented Jan 21, 2024

A workaround to load GAP in a julia session with threads enabled is to disable gcthreads with --gcthreads=1 (this corresponds to how the GC works in 1.9).
GC threads are new in julia 1.10 and enabled by default when threads are enabled, and the number of gcthreads is half the number of compute threads. This explains why the problems starts with 4 threads.

The following seems to work fine with julia 1.10:

$ julia-1.10.0 --project=. -t 4 --gcthreads=1
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using GAP
 ┌───────┐   GAP 4.12.2 of 2022-12-18
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-julia1.10-64-kv8
 Configuration:  gmp 6.2.1, Julia GC, Julia 1.10.0, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.6, AutPGrp 1.11, Browse 1.8.21, CRISP 1.4.6, 
             Cryst 4.1.25, CrystCat 1.1.10, CTblLib 1.3.4, FactInt 1.6.3, FGA 1.4.0, GAPDoc 1.6.6, 
             IRREDSOL 1.4.4, JuliaInterface 0.10.1, LAGUNA 3.9.5, Polenta 1.3.10, Polycyclic 2.16, 
             PrimGrp 3.4.3, RadiRoot 2.9, ResClasses 4.7.3, SmallGrp 1.5.1, Sophus 1.27, SpinSym 1.5.2, 
             TomLib 1.2.9, TransGrp 3.6.3, utils 0.81
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'

Tests also work in this configuration: Pkg.test("GAP", julia_args=["-t 4", "--gcthreads=1"]).

@fingolfin
Copy link
Member

Thanks for writing up a report, @mjrodgers I can reproduce it with Julia 1.10.0 on macOS. I will look into this, but possibly only after February 1st, due to the book

@fingolfin
Copy link
Member

There are multiple issues here in the GAP-Julia GC integration. One is the now incorrect usage of the variable JuliaTLS in the GAP kernel, which is based on the assumption that there is a single GC job. JuliaTLS is mostly an optimization, and all its uses could simply be replaced by a call to jl_get_ptls_states() -- and also most uses could be avoided by a change to the GAP kernel (basically MarkBag and all related functions would need an extra argument, a void * ref pointer, which is set to the tls pointer when using the Julia GC in GAP and otherwise ignored)

Then there is the use of a bunch of global variables in there (in GAP's src/julia_gc.c to be precise) which is not multi-thread safe, and needs some strategy for that (be it locks, or making them thread local, or ...)

  • MarkCache and friends: could probably be resolved by activating REQUIRE_PRECISE_MARKING which should be safe these days.
  • RootTaskOfMainThread and GapStackBottom are (I think) not used for GAP.jl, only for the CI tests over at the GAP repository
  • YoungRef: could be made thread local, but if we add a void * ref to MarkBag and friends anyway, then that could just point at a struct on the stack which contains Julias ptls and YoungRef
  • DatatypeGapObj, DatatypeSmallBag, DatatypeLargeBag, IsJuliaMultiThreaded, MaxPoolObjSize, TabMarkFuncBags: only set once at the start and otherwise read-only, so hopefully are fine
  • ExtraMarkFuncBags unused
  • FullGC set at the start of each GC, I hope this is fine, but should have a closer look
  • StartTime, TotalTime should be fine (but even if not, nothing bad can happen if they are wrong)
  • GlobalAddr, GlobalCookie, GlobalCount: should be fine... I think
  • task_stacks requires closer inspection

@simonbrandhorst
Copy link

simonbrandhorst commented Apr 20, 2024

Just hit the same bug when installing Oscar and got a segmentation fault.
Since this wil be a while until it is fixed, we could check the number of GC threads at startup and raise a useful error?

@fingolfin
Copy link
Member

I made some progress on this today: turns out I have to change function ScanTaskStack in the GAP kernel from MarkFromList(jl_get_ptls_states(), stack); to MarkFromList(task->ptls, stack); -- apparently the jl_get_ptls_states is not safe to use in a GC thread.

I also added a mutex to protect access to task_stacks, and enabled REQUIRE_PRECISE_MARKING to completely eliminate a bunch of other global variables (which are problematic in multi threading). With all these changes using GAP works with multiple GC threads (in contrast to stock GAP.jl 0.11.0 using GAP 4.13.0, which immediately hard crashes, i.e., is worse thatn GAP.jl 0.10.0 with GAP 4.12.2).

Alas, there are still issues. So the next thing I did was to add a global GAP list and then any allocation is added to that list, and any GC marks that list -- this should prevent any GAP allocation from ever be garbage collected. (Clearly this is not useful for a production system, but it helps to exclude certain issues).

But despite this, I am still seeing errors that are highly suggestive of GAP objects being GCed prematurely. Hrm.

@fingolfin
Copy link
Member

This should be fixed in GAP.jl 0.11.2 and later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants