Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it support multiple users share one queue? #5

Open
shaoyucheng opened this issue Aug 19, 2021 · 33 comments
Open

Does it support multiple users share one queue? #5

shaoyucheng opened this issue Aug 19, 2021 · 33 comments
Labels
enhancement New feature or request

Comments

@shaoyucheng
Copy link

If true, how can i realize it, thanks

@justanhduc
Copy link
Owner

Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.

@shaoyucheng
Copy link
Author

shaoyucheng commented Aug 20, 2021

Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.

got it, i think it should be a good feature which will make your project like a enhanced version of atd service.

@justanhduc
Copy link
Owner

Sounds like an interesting feature. I will keep this issue open for updates of this feature.

@justanhduc justanhduc added the enhancement New feature or request label Aug 23, 2021
@justanhduc justanhduc added this to To do in Global Task Spooler Nov 9, 2021
@wolfram77
Copy link

I need this too for our shared Volta GPU server.

@wolfram77
Copy link

It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.

@justanhduc
Copy link
Owner

Hi @wolfram77. Yes sharing the server file can be a quick and dirty way to share the queue, but be aware that it has a lot of limitations since jobs are user-independent (like -C will erase all your colleagues' queues, and -K can be invoked by anyone).

@wolfram77
Copy link

@justanhduc While trying it out yesterday i saw -K deletes the socket file. Again had to chmod it. It shouldnt be a problem, but now i put a message about in help text in the server.

@fearedspark
Copy link

fearedspark commented Sep 1, 2022

I too would be interested in the multi-users mode, even if it means all users can kill tasks form anyone

@justanhduc
Copy link
Owner

Hey @fearedspark. Thanks for your interest. Indeed, there is a working prototype in the branch global. However, there's an ambiguity in setting the number of slots. Should we use the same or different number of slots for all users? What is the proper number? Or it has to be something that users should compromise? I am not able to come up with a good solution, so please suggest anything.

@fearedspark
Copy link

Well, I will speak about the way I'm managing it on our machine, and maybe it will provide some insight.
I have it configured as as many slots as there are threads on the machine. A user starting a task defines the number of slots it takes based on the number of threads it can use. It would be nice to have a default slot size that can be configured so that when a user doesn't give a number of slots, it defaults to the max.
Then each user is free to use how many slots they desire. This however works well if all the user behaves properly, which is the case for us. It could be a good idea to have a maximum number of slot allowed per user, defaulting to the max number of slots.

@justanhduc
Copy link
Owner

Hey @fearedspark. Yeah basically we still have to depend on the kindness of other users 😅. Then I will try to look at the prototype again and see whether I can make it stable or not. Thanks a lot for the initiatives!

@kylincaster
Copy link

kylincaster commented Oct 28, 2022

Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers

@justanhduc
Copy link
Owner

Dear all,

I already developed a multi-user version at only for cpu-only at task-spooler

If you feel interesting or useful, maybe we could try to merge it back.

However, I am not a expert on linux. So there are still much space and bug to be improved.

Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

1 similar comment
@justanhduc
Copy link
Owner

Dear all,

I already developed a multi-user version at only for cpu-only at task-spooler

If you feel interesting or useful, maybe we could try to merge it back.

However, I am not a expert on linux. So there are still much space and bug to be improved.

Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

@kylincaster
Copy link

Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

I just submit the PR. you could have a try @justanhduc

@justanhduc
Copy link
Owner

Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?

@kylincaster
Copy link

Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?

Ok, I have done with the full detail about the feature/bug in my work.

@kylincaster
Copy link

@justanhduc I found if i wanted to precisely control the task, the PID of all subprocessors needed to be known in advance.
So I use a bash script to control the running state of the task.
The transfering of the bash script into a C code would be hard work.

@justanhduc
Copy link
Owner

Hi @kylincaster. Sorry for the late reply. What do you mean by "precise control"? What is your use case thay -p is not enough?

@kylincaster
Copy link

Hi, @justanhduc, I mean to pause or kill a process by ts. not only the process itself, but also all subprocesses should be handled. So a revursive code is necessary to find the PID for all subprocesses

@justanhduc
Copy link
Owner

Hi @kylincaster. To kill or pause a process and its children, can we just simply send the signal to the whole process group like the memo here? Or is there anything I missed?

@kylincaster
Copy link

Hi @justanhduc I ever try to kill the process directly. Unfortunately, the stop signal is not compatible for task with the subprocesses. The following is the example script which cannot be held on by kill -stop -- -XXX command

#!/bin/bash
#

for i in {2..1000}
do
        dt=`date`
        echo "output: ${dt} $i" >> log.txt
        sleep 1
done

with ts command ts mpirun -np 1 loop.sh
Only the parent process mpirun is paused rather than the bash subprocess

@justanhduc
Copy link
Owner

justanhduc commented Nov 17, 2022

Hey @kylincaster. According to the documentation of mpirun 2.1.1 on Ubuntu 18.04, mpirun only propagates a selected number of signals. When dealing with such kinda program like mpirun, imo, ts has no authority to manipulate the created subprocesses because, well, it will violate the purpose of such program.

And specifically for your problem, be sure to check the Ubuntu version and mpirun version. If you run on 18.04 and mpirun 2.1.1 like me, I successfully stop/continue by the following commands

ts mpirun --mca orte_forward_job_control 1 -np 1 toy.sh
kill -20 $(ts -p <jobid>)  # stop the mpi process. Note that SIGSTOP does not work per documentation
kill -18 $(ts -p <jobid>)  # continue

Ps: Our discussion about sending signal seems not to be in the scope of this issue, so if you still have any problem it's better to open another ticket and we can continue there.

@kylincaster
Copy link

Thanks for @justanhduc's comments on the performance of mpirun.
Unfortunately, it depends on the implementation of MPI. The intel mpi processes didn't forwards such signal.
So my solution to this problem is the following bash code which will be called inside the task-spooler.

#!/bin/bash

# getting children generally resolves nicely at some point
get_child() {
    echo $(pgrep -laP $1 | awk '{print $1}')
}

get_children() {
    __RET=$(get_child $1)
    __CHILDREN=
    while [ -n "$__RET" ]; do
        __CHILDREN+="$__RET "
        __RET=$(get_child $__RET)
    done

    __CHILDREN=$(echo "${__CHILDREN}" | xargs | sort)

    echo "${__CHILDREN} $1"
}

if [ 1 -gt $# ]; 
then
    echo "not input PID"
    exit 1
fi

owner=`ps -o user= -p $1`
if [ -z "$owner" ]; 
then
    # echo "not a valid PID"
    exit 1
fi
pids=`get_children $1`

user=`whoami`

extra=""
if [[ "$owner" != "$user" ]]; then
    extra="sudo"
fi

for pid in ${pids}; 
do
    if [ -z $2 ]
    then
        echo "${extra} ${pid}"
    else
        ${extra} kill -s $2 ${pid}
    fi
done

@sadikyalcin
Copy link

It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.

Can you share details in how you got this setup? I've defined a socket but still can't see anything from other users... @justanhduc would you be able to help with this?

@wolfram77
Copy link

I set TS_SOCKET=/tmp/ts.socket in /etc/environment and chmod 777 "$TS_SOCKET".

@sadikyalcin
Copy link

Thanks. I was calling tsp via a bash script - turns out environment variables aren't exposed to bash scripts by default.

What about your logs though? I've got the shared queue working but still can't access logs from tasks queued from other users.

@wolfram77
Copy link

Is it tsp? I am able to see the tasks queued by other users with ts or ts -l. I store the program output with a pipe like stdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log" from a script. Are you interested in the program output of other users?

@sadikyalcin
Copy link

Is it tsp? I am able to see the tasks queued by other users with ts or ts -l. I store the program output with a pipe like stdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log" from a script. Are you interested in the program output of other users?

I run a node process - which can take {x} duration which does print progress / res. Ie, the below has an error and is run by the webserver but tmp/ts-out.1LkaYj doesn't exist for me. I run apache and ssh into the server as the same user (ubuntu).

52 finished /tmp/ts-out.1LkaYj 1 84.95/1.43/0.16 {my_command}

Screenshot 2023-08-02 at 13 17 10

@wolfram77
Copy link

Could you try redirecting both stdout and stderr to a file? If that does not work for you, @justanhduc may be able to help you.

@justanhduc
Copy link
Owner

justanhduc commented Aug 2, 2023

Hi @sadikyalcin @wolfram77. First of all, tsp is the original version, not the one in this fork. Please uninstall it using apt and install the one here using make cpu. If the same problem happens, could you see verify you have the right to write in /tmp? Also, why is the ts.socket file not in /tmp?

@justanhduc
Copy link
Owner

Also, if you want a proper multi-user task spooler, the fork of @kylincaster is probably a better choice.

@kylincaster
Copy link

Dear all,

If anyone is looking for a multi-queue task manager, you are welcome to try my fork at kylincaster/task-spooler-PLUS. It has been enhanced with numerous useful features, including multiple user support, fatal crash recovery, and processor allocation and binding.

Best regards,
Kylin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

6 participants