-
Notifications
You must be signed in to change notification settings - Fork 7
Magic
qstat
displays basic information about the currently running (R), queued (Q), and recently completed (C) jobs. The most basic usage of qstat is:
[user@magic]$ qstat
This displays all current and recently completed jobs. If a number of people are using the cluster, this is likely too much information. Therefore, we can look at just a single user using:
[user@magic]$ qstat -au <username>
The output then looks like:
[[/images/qstat.png]]
The columns of this output show:
* Job ID - When communicating or querying a specific job it is possible to use either the numeric job identifier (3574202) or the fully qualified name (3574204.systemimager.magic).
* Username - The user who initiated the job.
* Queue - The queue type the job is waiting or processing in.
* Jobname - The user defined jobname.
* SessID - A system generated identifier.
* NDS - The number of nodes requested.
* Req'd TSK - The total number of cores requested as computed by qselect
is a convenient way to get a listing of the JOB_IDs based on some criteria. For example, if I want all the jobs for a specific user I can run:
[user@magic]$ qselect -u jlaura
This provides a listing of fully qualified Job Ids. It is also possible to get jobs of only a specific status using:
[user@magic]$ qselect -u jlaura -s <Status Code>
For example, if I want to know the Job Ids for all currently running jobs for my user I would use:
[user@magic]$ qselect -u jlaura -s R
The ability to get a listing of job status without additional processing information is useful for both BASH scripting of some functionality (see qdel below) and parsing within a python script.
* ##### qdel
qdel
is used to delete queued on held jobs. The most basic usage, to delete a sinlge job is:
[user@magic]$ qdel <Job ID>
This removes the job from the queue and sets the job status to completed (C).
When running a larger job it may be possible that a log file is being incorretly written, or the first jobs to be run take significantly longer than expected. In this case it is convenient to be able to delete all the queued jobs using a single command. qselect
and xargs
can be used in conjunction with qdel to do this:
[user@magic]$ qselect -u jlaura -s Q | xargs qdel
Here, qselect
selects the jobs, -u
flags that a single user's jobs are to be listed, -s
flags that only a specific status of job is to be listed (queued (Q) in this case). The output of this select statement is then piped to qdel
using xargs
.
For full documentation see the cluster resources manual page for qdel.
* #####qrun
qrun
can be used to force a queued job to run. On Magic I have been seeing jobs sit queued when all resources are available. It is unclear whether this is due to how the Python launching script is working, the way that mpi4py and the scheduler are interacting to lock and free resources, or some other issue. If resources are available and a job is sitting queued for a long period of time, it can be manually run using:
[user@magic]$ qrun <Job ID>
For full documentation see the cluster resources manual page for qrun.
* #####qsig
qsig
can be used to send linux style signals to processes. This is useful if a job has hung and it should be killed. qsig
usage is:
[user@magic]$ qsig -s <SIGNAL CODE> <Job Id>
For example, if Job ID 3574204.systemimager.magic is not responding I can send a KILL (SIGKILL / 9) to the processes using:
[user@magic]$ qsig -s KILL 3574204.systemimager.magic
This should terminate the job.
Alternatively, if the job is hung and you might normally kill it with ctrl-c
you can send a SIGINT to interrupt. For example:
[user@magic]$ qsig -s SIGINT 3574204.systemimager.magic
For full documentation see the cluster resources manual page for qsig.
* ##### qalter
qalter
allows the user to alter queued jobs. This is useful when the requested walltime is less than the total expected walltime. For example:
[user@magic]$ qalter <Job ID> -l walltime=00:00:00
For full documentation see the cluster resources manual page for qalter
- MPI
- Overview
- mpi4py
- Data Passing
- One off run example
- PBS
- Basic Script
- Arguments
- PBS via Python
- Basic Wrapper
- Fisher-Jenks Example
- Fisher-Jenks Python / PBS Script
- Fisher-Jenks MPI Script
- Analytics (Pandas)
- Python on the Cluster
- Parallel Reads and Writes
- Logging
- Timing
- MPE
- Better profiling
- Example MPE Script
- Basic Commands
- Message Passing Interface
- PBS
- Basic Script
- Arguments
- PBS via Python
- Basic Wrapper
- Fisher-Jenks Example
- Fisher-Jenks Python / PBS Script
- Fisher-Jenks MPI Script
- Analytics (Pandas)
- Python on the Cluster
- Parallel Reads and Writes
- Logging
- Timing
- MPE
- Better profiling
- Example MPE Script
- Q&A