Verification cluster configuration and execution files
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
etc
tests
ups
.travis.yml
README
SConstruct
setup.cfg

README

This is a brief into in how to use this platform package. Refer to the confluence documentation
for much greater detail.

This package is used in conjunction with the utilities in ctrl_orca and 
ctrl_execute using the lsstvc cluster as a target platform.

The current configuration of this package is set to use slurm as a submit 
mechanism to start HTCondor clients which join the HTCondor submitter.

To allocation nodes from the lsstvc cluster and add them to the 
HTCondor submitter, execute the allocationNode.py command in ctrl_execute.



BEFORE YOU START:

mkdir /scratch/username/condor_scratch
mkdir /scratch/username/condor_work


setup ctrl_execute
setup ctrl_platform_lsstvc



ALLOCATING NODES:

The following command reserves four nodes, with 24 condor slots on each node, 
for 30 minutes, on the "normal" queue:

$ allocateNodes.py -n 4 -s 24 -m 00:30:00 lsstvc -q normal
4 nodes will be allocated on lsstvc with 24 slots per node and maximum time limit of 00:30:00
Node set name:
srp_333
$


Take note of the "Node set name". This is the name of machines you allocated, 
and you'll use this when you submit nodes to HTCondor. This will be 
different each time you use allocateNodes.py, unless you specify it on the 
command line, with the -N option:

$ allocateNodes.py -n 2 -s 4 -m 00:30:00 lsstvc -N srp_450 -q normal
2 nodes will be allocated on lsstvc with 4 slots per node and maximum time limit of 00:30:00
Node set name:
srp_450
$

Running "sinfo" results in output containing the line:

normal*      up   infinite      2  alloc lsst-verify-worker[01-02]

This varies depending on who is running and which machines are allocated.

Running "condor_status" results in output similiar to:

$ condor_status
Name           OpSys      Arch   State     Activity LoadAv Mem    ActvtyTime

slot1@worker01 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:04
slot2@worker01 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:05
slot3@worker01 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:06
slot4@worker01 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:07
slot1@worker02 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:04
slot2@worker02 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:05
slot3@worker02 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:06
slot4@worker02 LINUX      X86_64 Unclaimed Idle      0.000 32163  0+00:00:07

The variable "JOB_NODE_SET" is specified in the HTCondor glidein configuration for all
machines which are allocated.  In order to run jobs on these allocated nodes, you must
have this value set in your HTCondor submit file.  Add the lines to your submit file: 

+JOB_NODE_SET="srp_450"

Requirements = (Arch != "") && (OpSys != "") && (Disk != -1) && (Memory != -1) && (DiskUsage >= 0) && (Target.ALLOCATED_NODE_SET == "srp_450")


(Note the "Requirements" line is all one line).

If your jobs do not run on the nodes you allocated, it's likely you left these lines 
out of your HTCondor submit file.

Please note that specifying "-N" on the allocateNodes.py invocation will let
you  name your own node set; it's likely you'll want to do this because 
otherwise you'll have to change the HTCondor submit file each time you run.
It defaults to allocating a new name each time because of they way we wanted
to run orchestration with the ctrl_orca package.


NOTE ON HOW JOBS/MACHINES ARE LABELED

There are a couple of reasons that things are set up with NODE_SET_NAME.

1) This prevents users from accidentally using nodes that they didn't
allocate.  The HTCondor job/slot matching mechanism labels the jobs and
nodes with the "node set" name.  

2) The nodes are also labeled with the user name of the person that allocated
them.   Both conditions (user name and node set name) must match in order for
the job to run.   Another user launching jobs from their account using
someone else's allocated "node set" name will not work.

3) It's possible to run allocateNodes.py twice, allocate another node set, and
run two different sets of jobs concurrently.  Since you specify the "node set"
name for each, the jobs only run on those nodes.

4) It's possible to run allocateNodes.py and specify your own "node set"
name.  If you do this, you can just keep the same HTCondor submit file and
use it over and over again.  Note that you can allocate additional nodes
to an existing node set in this way.  

For example, say that you want to allocate 20 nodes.  When you go to type the
command, you accidentally allocate two nodes with the node set name "srp_665",
start some HTCondor jobs, and then realize you really meant to allocate
18 more.   You can re-run allocateNodes.py while the HTCondor jobs is in
progress, and as long as you specify the node set name "srp_665", those 
nodes will be allocated and as soon as HTCondor starts on those nodes. Any 
jobs in the queue will start being scheduled to those nodes.


USING SLURM RESERVATIONS

If you have a slurm reservation, you can use the "-r" option to use it.  If 
you have a reservation named "res_33", the allocation would look like this:

$ allocateNodes.py -n 4 -s 24 -r res_33 -m 00:30:00 lsstvc
4 nodes will be allocated on lsstvc with 24 slots per node and maximum time limit of 00:30:00
Node set name:
srp_888
$

Note that if you do not have an allocation, and you specify one, your jobs 
should run anyway if there are free nodes, since Slurm doesn't force it to
only run on those allocated nodes.

If you try a reservation and get the error:

$ allocateNodes.py -n 4 -s 24 -r res_43 -m 00:30:00 lsstvc
error running sbatch /scratch/srp/condor_scratch/configs/alloc_srp_2017_1107_194557.slurm
$

It is likely that you tried to use an allocation you do not have permissions
to use. You can use the "-v" option, and amoung the output you'll see a line
that's something like:

sbatch: error: Batch job submission failed: Access denied to requested reservation

which verifies that is the case.

ORCA


To use this with Orca, specify the nodeset name using the "-N" option in 
runOrca.py.   The following command targets jobs to the "srp_450" node set, 
executing the command "/scratch/srp/myjob.sh" with input file "~/medinput.txt"
using the EUPS stack located at "/scratch/srp/lsstsw", on the target platform
"verfication"

$ runOrca.py -N srp_450 -c /scratch/srp/myjob.sh -i ~/medinput.txt -e /scratch/srp/lsstsw -p lsstvc
setup_using = getenv
runid for this run is  srp_2016_1202_145231
orca.py /scratch/srp/condor_scratch/configs/srp_2016_1202_145231.config srp_2016_1202_145231
Production launched.
Waiting for shutdown request.
sending last Logger Event
logger handled... and... done!
$ 

in this case, files were created in
/scratch/srp/condor_scratch/srp_2016_1202_145231
/scratch/srp/condor_work/srp_2016_1202_145231

Files in condor_scratch/srp_2016_1202_145231 contain the working directory and
files for a DAGman submission by HTCondor.

Files in:

condor_work/srp_2016_1202_145231/logs

contain log files from stdout of the jobs that ran, identified by machine 
name, and the HTCondor slot the job ran on.  It also contains a top level
output directory, but it's  up to the job to decide how that's used.

RELINQUISHING NODES

To remove the nodes from use, clear out all currently running HTCondor jobs:

$ condor_rm username
All jobs of user "username" have been marked for removal
$

Then run "squeue" to see which nodes you've allocated:

$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
32507     debug  srp_450      srp  R      26:11      4 lsst-verify-worker[01-04]
$

Then run "scancel" on that JOBID:

$ scancel 32507

This will remove your allocated nodes, and return them to the general slurm pool.

WARNING

Keep in mind that the current HTCondor configuration is set to remove all worker nodes after 15
minutes of inactivity.  This means if there are no jobs running after 15 minutes, the worker nodes
will be deallocated and no longer be part of the HTCondor pool. If this happens dealloc your allocated
slurm nodes, and execute allocateNodes.py again.