Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Running each job phase as a child of a "job leader" process #78

Open
cipriancraciun opened this issue Dec 14, 2018 · 6 comments

Comments

@cipriancraciun
Copy link

commented Dec 14, 2018

[This is a feature request that I am willing to provide a pull-request if deemed useful.]

At the current moment a laminar job is composed of multiple phases (init / before / run / after, etc.), and each of these commands are started as a direct child of laminard process.

However this provides a few "minor issues" (more inconveniences):

  • debug-ability -- say I have 20 builds of the same job "type", I can't easily say which process belongs to build 15 or build 16; (I can't even seem to find the job PID in the web UI;)
  • accountability -- one can't easily measure from the OS the resources used by a particular job run, as there is no stable "common ancestor process" (except laminard itself);
  • job suspension -- somewhat solving #68 -- having a "parent job process" I can use a tool like htop or pgrep / pkill to find and pause an entire process tree;
  • double-forking processes (or run-away processes especially when dealing with make) -- if a process double forks it ends-up as a child of PID 1 and it is quite hard to find and "stop" these;

What I am proposing is that for each job, laminard forks itself and runs all the other steps as children of this "leader job".

Thus a ps axf might look like this:

laminard
+ [laminar-job] whatever:23
|  +-- make
|    +- ...
+ [laminar-job] whatever:24
| ...

Then by using prctl with PR_SET_CHILD_SUBREAPER one can solve all of the above issues.

Moreover this opens the path for some other enhancements like nice, CPU affinity, etc. (Which I understand that can already be obtained, however not as easily.)

@ohwgiles

This comment has been minimized.

Copy link
Owner

commented Dec 14, 2018

I think this is a totally awesome idea, and I'd definitely welcome a pull request. In terms of design, it could also bring looser coupling if the "main" laminard can be simplified to only know how to wait on one process (the job leader) per job and handle its standard output, and the work of actually serially running the appropriate scripts could be fully delegated to the Run class or some other logical encapsulation. At the moment it's kind of untidily split between Run and Laminar classes.

@cipriancraciun

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

One small question: what operating systems are you targeting? (I.e. Linux only, or Linux+BSD+etc.)

@ohwgiles

This comment has been minimized.

Copy link
Owner

commented Dec 14, 2018

Just Linux, so far anyway.

@cipriancraciun

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

Just Linux, so far anyway.

Perfect (although this is not so "perfect" for non-Linux people), because the prctl is a Linux only feature. :)

Then when the time comes, the Linux-specific syscall can be compiled out.

@ohwgiles

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2019

Hi @cipriancraciun, how is this looking? Can I help somehow?

@cipriancraciun

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

Hi @cipriancraciun, how is this looking? Can I help somehow?

I didn't manage to work on this. (Unfortunately I didn't end up using laminar, yet, thus I didn't have an opportunity to focus on it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.