Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Stop Golem from crashing on benchmark failure #4449

Merged
merged 5 commits into from
Jul 10, 2019

Conversation

mplebanski
Copy link
Contributor

A rationale is that Golem should not crash on a failed benchmark. Ideally it should disable the app for this particular run. Since this is not straightforward to add I propose this PR which sets app's performance to 0. It will at least stabilize startup behavior.

The context is GLambda's secure runtime for docker - gVisor, randomly crashing due to problems with permissions on /sys/fs/cgroup/cpuset/docker/cpuset.cpu. We are probably hitting issue described here. From my trials one such cpuset mask refresh is e.g. running docker run -it --cpuset-cpus=0 some_image, although I'm not sure if it always works.

@mplebanski
Copy link
Contributor Author

A command line fix for crashing GLambda app:

# cat /sys/fs/cgroup/cpuset/cpuset.cpus > /sys/fs/cgroup/cpuset/docker/cpuset.cpus

@mfranciszkiewicz
Copy link
Contributor

mfranciszkiewicz commented Jul 10, 2019

@mplebanski Maybe it's a good idea to update the docker cpu set during installation?

We should also disable the environment. That could be done with pydispatcher if it's difficult to reach the EnvironmentsManager instance.

@mplebanski mplebanski force-pushed the mplebanski-patch-benchmark-crashing branch from 349ff5e to 41b7608 Compare July 10, 2019 12:04
Copy link
Contributor

@shadeofblue shadeofblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I could tell, looks okay :)

@codecov
Copy link

codecov bot commented Jul 10, 2019

Codecov Report

Merging #4449 into b0.20 will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##            b0.20    #4449      +/-   ##
==========================================
+ Coverage   88.32%   88.33%   +0.01%     
==========================================
  Files         223      223              
  Lines       19783    19784       +1     
==========================================
+ Hits        17474    17477       +3     
+ Misses       2309     2307       -2

@mplebanski
Copy link
Contributor Author

I moved the check to separate method and added FileNotFoundError exception handling.

@mplebanski mplebanski force-pushed the mplebanski-patch-benchmark-crashing branch from 2b8482d to b2a74a6 Compare July 10, 2019 14:47
@mplebanski mplebanski merged commit f45c31a into b0.20 Jul 10, 2019
@mplebanski mplebanski deleted the mplebanski-patch-benchmark-crashing branch July 10, 2019 15:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants