Skip to content
jlaura edited this page Oct 15, 2013 · 18 revisions

Magic

  1. Basic Commands

    • qstat

qstat displays basic information about the currently running (R), queued (Q), and recently completed (C) jobs. The most basic usage of qstat is:

        [user@magic]$ qstat

This displays all current and recently completed jobs. If a number of people are using the cluster, this is likely too much information. Therefore, we can look at just a single user using:

        [user@magic]$ qstat -au <username>

The output then looks like:

        [[/images/qstat.png]]

The columns of this output show: * Job ID - When communicating or querying a specific job it is possible to use either the numeric job identifier (3574202) or the fully qualified name (3574204.systemimager.magic).
* Username - The user who initiated the job. * Queue - The queue type the job is waiting or processing in. * Jobname - The user defined jobname. * SessID - A system generated identifier. * NDS - The number of nodes requested. * Req'd TSK - The total number of cores requested as computed by $nodes requested * cores per node requested$. * Req'd Memory - The total amount of memory requested for the job. * Time - The maximum walltime requested for the job. * S - The jobs status. * R - Running * Q - Queued * C - Completed * E - Error * Time - The total time (hours:minutes) that the job ran or has been running for. Many of these parameters are user defined when initiating the job. For full documentation see the cluster resources manual page for qstat. * ##### qselect qselect is a convenient way to get a listing of the JOB_IDs based on some criteria. For example, if I want all the jobs for a specific user I can run:

        [user@magic]$ qselect -u jlaura

This provides a listing of fully qualified Job Ids. It is also possible to get jobs of only a specific status using:

        [user@magic]$ qselect -u jlaura -s <Status Code>

For example, if I want to know the Job Ids for all currently running jobs for my user I would use:

        [user@magic]$ qselect -u jlaura -s R

The ability to get a listing of job status without additional processing information is useful for both BASH scripting of some functionality (see qdel below) and parsing within a python script. * ##### qdel qdel is used to delete queued on held jobs. The most basic usage, to delete a sinlge job is:

        [user@magic]$ qdel <Job ID>

This removes the job from the queue and sets the job status to completed (C). When running a larger job it may be possible that a log file is being incorretly written, or the first jobs to be run take significantly longer than expected. In this case it is convenient to be able to delete all the queued jobs using a single command. qselect and xargs can be used in conjunction with qdel to do this:

        [user@magic]$ qselect -u jlaura -s Q | xargs qdel

Here, qselect selects the jobs, -u flags that a single user's jobs are to be listed, -s flags that only a specific status of job is to be listed (queued (Q) in this case). The output of this select statement is then piped to qdel using xargs. For full documentation see the cluster resources manual page for qdel. * #####qrun qrun can be used to force a queued job to run. On Magic I have been seeing jobs sit queued when all resources are available. It is unclear whether this is due to how the Python launching script is working, the way that mpi4py and the scheduler are interacting to lock and free resources, or some other issue. If resources are available and a job is sitting queued for a long period of time, it can be manually run using:

        [user@magic]$ qrun <Job ID>

For full documentation see the cluster resources manual page for qrun. * #####qsig qsig can be used to send linux style signals to processes. This is useful if a job has hung and it should be killed. qsig usage is:

        [user@magic]$ qsig -s <SIGNAL CODE> <Job Id>

For example, if Job ID 3574204.systemimager.magic is not responding I can send a KILL (SIGKILL / 9) to the processes using:

        [user@magic]$ qsig -s KILL 3574204.systemimager.magic

This should terminate the job. Alternatively, if the job is hung and you might normally kill it with ctrl-c you can send a SIGINT to interrupt. For example:

        [user@magic]$ qsig -s SIGINT 3574204.systemimager.magic

For full documentation see the cluster resources manual page for qsig. * ##### qalter qalter allows the user to alter queued jobs. This is useful when the requested walltime is less than the total expected walltime. For example:

        [user@magic]$ qalter <Job ID> -l walltime=00:00:00

For full documentation see the cluster resources manual page for qalter

  1. MPI
    • Overview
    • mpi4py
    • Data Passing
    • One off run example
  2. PBS
    • Basic Script
    • Arguments
  3. PBS via Python
    • Basic Wrapper
  4. Fisher-Jenks Example
    • Fisher-Jenks Python / PBS Script
    • Fisher-Jenks MPI Script
    • Analytics (Pandas)
  5. Python on the Cluster
    • Parallel Reads and Writes
    • Logging
    • Timing
  6. MPE
    • Better profiling
    • Example MPE Script

Fisher-Jenks

Parallel Contiguity

Magic

  1. Basic Commands
  2. Message Passing Interface
  3. PBS
    • Basic Script
    • Arguments
  4. PBS via Python
    • Basic Wrapper
  5. Fisher-Jenks Example
    • Fisher-Jenks Python / PBS Script
    • Fisher-Jenks MPI Script
    • Analytics (Pandas)
  6. Python on the Cluster
    • Parallel Reads and Writes
    • Logging
    • Timing
  7. MPE
    • Better profiling
    • Example MPE Script
  8. Q&A
Clone this wiki locally