Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errored during testing (received signal: KILL) #59

Open
bradcarman opened this issue Aug 16, 2022 · 12 comments
Open

Errored during testing (received signal: KILL) #59

bradcarman opened this issue Aug 16, 2022 · 12 comments

Comments

@bradcarman
Copy link

bradcarman commented Aug 16, 2022

Is there a time limit that is invoked? I received the following error:

LoadError: Package <my package> errored during testing (received signal: KILL)

This happened 54minutes into the job. I'm assuming I reached some kind of time limit? Is it possible to extend it?

@SaschaMann
Copy link
Member

There's no time limit from the action specifically, you might have configured a timeout in the workflow. The default (and maximum?) is 6 hours.

I'm not entirely sure if I remember correctly, but this may also be caused by running out of memory. Is there a chance that might be it?

@simonbyrne
Copy link

We've had this issue recently as well: https://github.com/CliMA/ClimaCore.jl/actions/runs/3661766725/jobs/6190294187#step:7:517

ERROR: LoadError: Package ClimaCore errored during testing (received signal: KILL)
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/hostedtoolcache/julia/1.8.3/x64/share/julia/stdlib/v1.8/Pkg/src/Types.jl:67
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /opt/hostedtoolcache/julia/1.8.3/x64/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1813
 [3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Vector{String}, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.3/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:434
 [4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.3/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:156
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.3/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:171
 [6] top-level scope
   @ ~/work/_actions/julia-actions/julia-runtest/v1/test_harness.jl:15
 [7] include(fname::String)
   @ Base.MainInclude ./client.jl:476
 [8] top-level scope
   @ none:1
in expression starting at /home/runner/work/_actions/julia-actions/julia-runtest/v1/test_harness.jl:7
Error: Process completed with exit code 1.

It's not timing out (this occurred at 36 mins, we have a 60 min limit). The tests seem to pass when switching to julia -e 'using Pkg; Pkg.test()'
https://github.com/CliMA/ClimaCore.jl/actions/runs/3680205377/jobs/6225541194

@DilumAluthge
Copy link
Member

Maybe code coverage is the problem? By default, IIRC, this action will turn code coverage on.

@simonbyrne
Copy link

It doesn't appear to be an OOM (that is usually exit code 137)

@simonbyrne
Copy link

Ah, you're right, it does appear to be coverage related:
https://github.com/CliMA/ClimaCore.jl/actions/runs/3687762429/jobs/6241745116#step:7:480

@DilumAluthge
Copy link
Member

This action has a coverage input. So you can provide coverage: false in the with: section.

@DilumAluthge
Copy link
Member

@bradcarman Can you try disabling coverage, to see if that fixes your original issue?

@simonbyrne
Copy link

Ah, ok: so what I think is happening is that the new process Pkg.test launches is being OOM-killed (and doesn't get a stacktrace because SIGKILL won't let it exit gracefully): this process presumably exits with 137 (128 + 9). The outer process just sees this as a test failure, and so exits with error code 1.

@SaschaMann
Copy link
Member

To confirm this, could you run a workflow step with the same process launched by Pkg.test manually?

If it's an OOM-issue, I'm not sure there's anything we can do in the action to prevent it, though.

@simonbyrne
Copy link

I was able to confirm it by running Pkg.test() under a memory restricted cgroup, and saw the same error.

You're right we probably can't do much to prevent it, but ideally we would be able to give a more informative error message?

@simonbyrne
Copy link

One option might be to use the eventfd: https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#oom-control

@simonbyrne
Copy link

In our case, enabling coverage does make it a lot worse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants