This short document presents some tools available to manage the execution of GAUSSIAN job on a PBS-compatible platform (by default, Avogadro@SNS).
The following tools are available:
gxx_qsub.py
-
Main submission script, written in Python (version 3.5 and later recommended).
gjobadd.bash
-
Updates an internal data file (
gjoblist.txt
), adding a new job. gjobrun.bash
-
Runs a job from
gjoblist.txt
gjobres.bash
-
Resets a job in
gjoblist.txt
, erasing information on previous executions. gjobchk.bash
-
Returns the list of jobs with a given status over a chosen period of time.
gjobupd.py
-
Updates the job statuses in
gjoblist.txt
(generally used in automated scripts). gxxrun.bash
-
Script acting as a wrapper to
gxx_qsub.py
, normally not run directly.
The following developer-oriented scripts are available (in a separate directory):
gxx_build_cluster.py
::
Deploys a GAUSSIAN or working tree archive, create a microarchitecture-based tree and compile on different machines.
Submission of a GAUSSIAN job can be done with gxx_qsub.py
.
Note
|
|
In its simplest form, gxx_qsub.py
only needs a valid GAUSSIAN input file.
$ gxx_qsub.py file.gjf
All optional arguments are then filled with default values.
The program executes the following operations sequentially:
-
Definition of the execution options and check on file existence
-
Definition of the available resources on computing nodes where GAUSSIAN is expected to run:
-
Number of processors available
-
Total memory available
-
-
Parsing of GAUSSIAN's input file and, except if specified otherwise:
-
Setup of the total number of processor for GAUSSIAN to use (
%NProcShared
). -
Setup of the maximum memory to be used for the job (
%Mem
). -
Definition of a permanent checkpoint file (
%Chk
) based on the input filename. While parsing the file, the script collects all filenames, which may be relevant to the calculation requested by the user.
-
-
Creation of the script to be submitted to
qsub
-
Submission of the PBS job
The script contains 2 parts:
* Definition of all relevant variables
* Sequence of operations to be run by the computing node, normally:
. Creation of a temporary directory on the local “scratch” space on the computing node.
. Copy of the input file generated by gxx_qsub.py
while parsing/completing the input file given by the user.
. Copy of all other files relevant to the calculations (generally the checkpoint files) in the temporary directory.
. Execution of GAUSSIAN on the computing node with the output directly written in the same directory as where the original input was (may require network file systems for this).
. Copy back of all relevant files (generally the file specified with %Chk
).
. Remove the temporary directory
Note
|
The way the script is submitted, the full list of commands and other important information are printed in the |
Warning
|
Because the job generated by |
Only the most relevant/useful keywords are listed below
-c
,--chk
-
Name of the checkpoint file (default: generated from the input file)
-g
,--gaussian
-
Version of GAUSSIAN to use, as
gXXYYY
whereXX
is the major revision,YYY
the minor revision.
ex.:g16a03
,g09e01
-j
,--jobname
-
Name of the job (default: name of the input file, truncated to fit PBS requirements)
-k
,--keep
-
Keep unchanged some parameters from the original input file in the generated input file. Possible values:
c
,chk
-
checkpoint file
m
,mem
-
memory
p
,proc
-
number of processors
a
,all
-
all of the above Note that GAUSSIAN may fail to run if the requested resources are not available on the node. Multiple resources can be combined by invoking the keyword multiple times (ex:
-kc -km
)
-m
,--mail
-
Uses PBS capabilities to send emails on job execution, end and/or failure. The option depends strongly on the installation.
--mailto
can be used to specify an email address (default: user@server) --multi
-
Runs multiple GAUSSIAN jobs in serial (
serial
, default) or parallel (parallel
). -o
,--out
-
Name of the log/output file (default: generated from the input file)
-p
,--project
-
Name of the project for processor time accounting (default: none).
-q
,--queue
-
A PBS or virtual queue on which the job should be supported.
The list of available PBS queues is generated by the modulehpcnodes.py
.
A virtual queue has the form: queue[:[nprocs][:node_id]], with- queue
-
A valid PBS queue
- nprocs
-
H Use half of the physical cores on 1 processor
S Use a single physical core
>0 Use nprocs cores
<0 Use nprocs physical CPUs
0 Use all cores available on the node (same behavior if nprocs is entirely missing)
- node_id
-
Name of a node on which the job must run.
-r
,--rwf
-
Name of the read-write file (default: unset).
Normally only used for development purposes -w
,--wrkdir
-
List of working trees (as root directories of
exe-dir
)
--cpto
-
Copies a list of files from the local directory to the scratch directory (added to the list automatically generated by
gxx_qsub.py
). --cpfrom
-
Copies a list of files from the scratch directory to the local directory (added to the list automatically generated by
gxx_qsub.py
). --group
-
User group to run on access-restricted nodes.
Since version 18.12.19 of the hpcnodes
module, administrators (or users) can set their own limitations on the use of resources by each job independently of the total hardware resources available on each node.
Two types of limits are available for the number of processing units and the RAM:
- soft
-
This limit should be followed by users for standard jobs and only exceptionally exceeded.
- hard
-
This limit should not be exceeded, whatever the type of job.
If the user does not set the quantity of RAM or the number of processors to be used in a GAUSSIAN job, gxx_qsub.py
will set the resources based on the first condition verified:
-
if a soft limit is set, the value will be used.
-
if a hard limit is set, the value will be used.
-
all available resources will be used (with a small reduction on the memory to prevent the risk of swapping in some extreme cases).
To override the limitation on the soft limit, the user must explicit set the resources in the GAUSSIAN input file(s), and use the command-line option --keep
to prevent gxx_qsub
from resetting those values.
If the resources requested do not exceed the hard limit or the total available hardware resources, gxx_qsub
will proceed with a simple comment on the fact that one or more soft limits have been exceeded.
Otherwise, as before, it will stop, preventing the execution of the job.
gxx_qsub.py
simply runs a GAUSSIAN job but does not keep track of the jobs submitted and their status.
The following scripts provide a very basic structure to manage jobs, by recording them, facilitating their submission and keeping track of their status.
The list of jobs is stored by default in ${HOME}/gjoblist.txt
The script gjobadd.bash
handles the insertion of new jobs.
$ gjobadd.bash input.gjf
It is possible to specify the following options (positional order, the previous options must be set by the user for a given option):
-
input file
-
queue name (default:
q02curie
) -
jobname (default: input file name)
-
comment (default:
false
) -
path to the input file (default: current directory)
$ gjobadd.bash input.gjf q02curie test01 "This is a test"
To run a job present in the job list, simply run gjobrun.bash
with the ID of the job (leading 0s are not necessary).
$ gjobrun.bash 1
The following positional arguments can be specified:
-
job ID
-
Gaussian version, in a format compatible with
gxx_qsub.py
. (default:g16b01
) -
option to force the copy of a file before execution of the job (only relevant for very specific structures of filenames, do not use).
Possible values:0
(do not copy),1
(do copy),auto
(default)
Note
|
|
A list of jobs with a specific status can be obtained with the script gjobchk.bash
.
$ gjobchk.bash
By default, all jobs with status WAIT
will be listed.
It is possible to specify the following options (positional order, the previous options must be set by the user for a given option):
. job status: WAIT
(default), GOOD
, FAIL
, QSUB
. starting date, when the job started running (all formats supported by date
are accepted)
. ending date (same formats as for starting date)
When added, a job gets the status WAIT
.
Once submitted, it becomes QSUB
and later EXEC
once running on a computing node.
Upon termination, it can have a status of FAIL
or GOOD
depending on the final result.
The status of any job can be reset to WAIT
with the script gjobres.bash
followed by the job ID.
$ gjobres.bash 1
It is also possible to change the queue as a secondary argument,
$ gjobres.bash 1 q07lee
The status of all jobs can be updated with the program gjobupd.py
.
Normally, this program should not be run directly but instead periodically invoked by a job scheduler like cron
.
To do this, open the crontab file
$ crontab -e
and insert: .crontab file
# ENVIRONMENT VARIABLES
# We need to load python 3
PATH=/cm/shared/apps/python/3.6.2/bin:/cm/shared/apps/pbspro/12.2.4.142262/bin:/bin:$PATH
LD_LIBRARY_PATH=/cm/shared/apps/python/3.6.2/lib:$LD_LIBRARY_PATH
# CRON JOBS
*/30 * * * * /home/j.bloino/bin/gjobupd.py
where the path to gjobupd.py
should be updated accordingly.
Warning
|
If a different frequency (default: 30 minutes) is desired, |
gxx_build_cluster.py
is a basic tool to facilitate the deployment and compilation of GAUSSIAN on a heterogeneous HPC infrastructure (nodes with different CPUs).
$ gxx_build_cluster.py what
what
can be:
-
A GAUSSIAN archive file, in the form:
gAA[.]BCC[C].ext
-
A working tree archive file, in the form:
working_gAA[.]BCC[C]_YYYY[-]MM[-]DD.ext
-
A GAUSSIAN version, in the form:
DDD[.]BCC[C]
. The script looks for the actual archive in a hardcoded directory (default:/cm/shared/gaussian/repository/src
).
with:
AA
-
GAUSSIAN major version. (ex.:
09
,16
,DV
) BCC[C]
-
GAUSSIAN minor revision. (ex.:
B01
,I04+
,I04p
) YYYY[-]MM[-]DD
-
Working tree version as (year, month, day)
ext
-
Extension of the archive (supported compressions: gzip, bz2, lzma)
DDD
-
Alias. Either
gAA
or any trigram representing the working.NoteFor the working, only valid for a re-compilation.
Default installation path: /cm/shared/gaussian
-
checks if a previous installation exists and removes it.
-
creates a root directory
<gAA.BCC[C]>
in the default installation path. -
for each microarchitecture (
<arch>
), creates a directory<gAA.BCC[C]>/<arch>
and extracts the archive in the directory. -
prepares a PBS submission script for each architecture and submits the compilation job.
Default installation path: ${HOME}/gxxwork
-
checks if a previous installation exists and asks the user how to proceed: [horizontal]
- backup
-
renames the working tree, adding
bak.YYYY-MM-DD
, withYYYY-MM-DD
today’s date - remove
-
deletes the previous installation
- keep
-
keeps the installation, only recompiling the existing tree (jumps to 4).
-
creates a
src
directory and extracts the archive -
creates microarchitecture-specific directories (
<arch>
) at the same level assrc
and links the source files fromsrc
to<arch>
-
prepares a PBS submission script for each architecture and submits the compilation job.
-c
,--compile
-
Only compiles an existing installation. Archive names or aliases can be used.
-m
,--mach
-
List of microarchitecture (supported by GAUSSIAN) on which the installation/compilation will be done
-u
,--update
-
Updates an existing installation with the files in the archive given in input (only for the working tree).
--gpath
-
Root path of the Gaussian installation. The created structure will have the form:
<gpath>/ <gAA.BCC[C]>/ <arch>/ <gAA>/
--wpath
-
Root path for the working tree. The created structure will have the form:
<wpath>/ <gAA.BCC[C]>/ src/ <arch>/ ... exe-dir