-
Notifications
You must be signed in to change notification settings - Fork 0
sun gridengine
- Gridengine.org
- Gridengine.info
- Gridscheduler.sourceforge.net
- Grid Engine HOWTOs
- Archive of defunct Sun gridengine.sunsource.net
- Bioteam SGE Administration Training Slides
- Bioteam SGE Quick Reference Guide
- Bioteam SGE for Users Slides
- For generic instructions, see Tight MPICH2 Integration in Grid Engine
- For Rocks-specific instructions, see sge-tight-mpich2-integration
Rocks comes with a default parallel environment for SGE named mpich that facilitates tight integration of MPICH1. Unfortunately it's not quite complete. For the ch_p4 mpich device, the environment variable MPICH_PROCESS_GROUP must be set to no on both the frontend and compute nodes in order for SGE to maintain itself as process group leader. These are the steps I did to get it to work in Rocks 4.1 (I opted for the 2nd solution described in Tight MPICH Integration in Grid Engine, Nmichaud@jhu.edu, 14:11, 15 February 2006 EST)
1. Edit /opt/gridengine/default/common/sge_request and add the following line at the end:
-v MPICH_PROCESS_GROUP=no
2. For a default Rocks setup, SGE calls /opt/gridengine/mpi/startmpi when starting an mpi job. This in turn calls /opt/gridengine/mpi/rsh. Both these files must be changed. However, each compute node has its own copy of these files. Instead of editing it on the frontend and copying to all the compute nodes, I found it easier to place my own copy in a subdirectory of /share/apps called mpi and then change the mpich parallel environment to call my own copies of startmpi and stopmpi (and by extension rsh). This way my one copy is exported to all the nodes and I don't have to worry about keeping them in sync.
3. Edit /share/apps/mpi/startmpi. Change the line:
rsh_wrapper=$SGE_ROOT/mpi/rsh
to:
rsh_wrapper=/share/apps/mpi/rsh
4. Edit /share/apps/mpi/rsh. Change the following lines:
echo $SGE_ROOT/bin/$ARC/qrsh -inherit -nostdin $rhost $cmd exec $SGE_ROOT/bin/$ARC/qrsh -inherit -nostdin $rhost $cmd else echo $SGE_ROOT/bin/$ARC/qrsh -inherit $rhost $cmd exec $SGE_ROOT/bin/$ARC/qrsh -inherit $rhost $cmdto:
echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd else echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd
5. Finally, run qconf -mp mpich. Change it from:
pe_name mpich slots 9999 user_lists NONE xuser_lists NONE start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile stop_proc_args /opt/gridengine/mpi/stopmpi.sh allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots minto:
pe_name mpich slots 9999 user_lists NONE xuser_lists NONE start_proc_args /share/apps/mpi/startmpi.sh -catch_rsh $pe_hostfile stop_proc_args /share/apps/mpi/stopmpi.sh allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min
This is a place for information about the prologue and epilogue scripts that run before and after a job, respectively.
I have found that SGE is not particularly good at cleaning up after MPI jobs; it does not keep track of which other nodes are being used and killing leftover user processes on those nodes. If anyone has a good solution for this, I'd love to see it.
(Note: MPI tight integration is suppose to fix this, see http://web.archive.org/web/20080916135356/http://gridengine.sunsource.net/howto/lam-integration/lam-integration.html)
To setup the frontend node to also be a SGE execution host which queued jobs can be run on (like the compute nodes), do the following:
# cd /opt/gridengine # ./install_execd (accept all of the default answers) # qconf -mq all.q (if needed, adjust the number of slots for [Frontend.local=4] and other parameters) # /etc/init.d/sgemaster.Frontend stop # /etc/init.d/sgemaster.Frontend start # /etc/init.d/sgeexecd.Frontend stop # /etc/init.d/sgeexecd.Frontend start
1. As root, make sure $SGE_ROOT
, etc. are setup correctly on the frontend:
# env | grep SGE
It should return back something like:
SGE_CELL=default SGE_ARCH=lx26-amd64 SGE_EXECD_PORT=537 SGE_QMASTER_PORT=536 SGE_ROOT=/opt/gridengine
If not, source the file /etc/profile.d/sge-binaries.[c]sh
or check if the SGE Roll is properly installed and enabled:
# rocks list roll NAME VERSION ARCH ENABLED sge: 5.2 x86_64 yes
2. Run the install_execd
script to setup the frontend as a SGE execution host:
# cd $SGE_ROOT # ./install_execd
Accept all of the default answers as suggested by the script.
-
NOTE: For the following examples below, the text
Frontend
should be substituted with the actual "short hostname" of your frontend (as reported by the commandhostname -s
).
hostname
on your frontend returns back the "FQDN long hostname" of:
# hostname mycluster.mydomain.org
then hostname -s
should return back just:
# hostname -s mycluster
3. Verify that the number of job slots for the frontend is equal to the number of physical processors/cores on your frontend that you wish to make available for queued jobs by checking the value of the slots
parameter of the queue configuration for all.q
:
# qconf -sq all.q | grep slots slots 1,[compute-0-0.local=4],[Frontend.local=4]
The [Frontend.local=4]
means that SGE can run up to 4 jobs on the frontend. Be aware that since the frontend is normally used for other tasks besides running compute jobs, it is recommended that not all the installed physical processors/cores on the frontend be available to be scheduled by SGE to avoid overloading the frontend.
For example, on a 4-core frontend, to configure SGE to use only up to 3 of the 4 cores, you can modify the slots for Frontend.local
from 4 to 3 by typing:
# qconf -mattr queue slots '[Frontend.local=3]' all.q
If there are additional queues besides the default all.q
one, repeat the above for each queue.
Read "man queue_conf"
for a list of resource limit parameters such as s_cpu
, h_cpu
, s_vmem
, and h_vmem
that can be adjusted to prevent jobs from overloading the frontend.
-
NOTE: For Rocks 5.2 or older, the frontend may have been default configured during installation with only 1 job slot (
[Frontend.local=1]
) in the defaultall.q
queue, which will only allow up to 1 queued job to run on the frontend. To check the value of theslots
parameter of the queue configuration forall.q
, type:
# qconf -sq all.q | grep slots slots 1,[compute-0-0.local=4],[Frontend.local=1]
If needed, modify the slots for <frontend>.local
from 1 to 4 (or up to the maximum number of physical processors/cores on your frontend that you wish to use) by typing:
# qconf -mattr queue slots '[Frontend.local=4]' all.q
-
NOTE: For Rocks 5.3 or older, create the file
/opt/gridengine/default/common/host_aliases
to contain both the .local hostname and the FQDN long hostname of your frontend:
# vi $SGE_ROOT/default/common/host_aliases Frontend.local Frontend.mydomain.org
-
NOTE: For Rocks 5.3 or older, edit the file
/opt/gridengine/default/common/act_qmaster
to contain the .local hostname of your frontend:
# vi $SGE_ROOT/default/common/act_qmaster Frontend.local
-
NOTE: For Rocks 5.3 or older, edit the file
/etc/init.d/sgemaster.<frontend>
:
# vi /etc/init.d/sgemaster.Frontend
and comment out the line:
/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmaster
by inserting a #
character at the beginning, so it becomes:
#/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmaster
in order to prevent the file /opt/gridengine/default/common/act_qmaster
from getting overwritten with incorrect data every time sgemaster.Frontend
is run during bootup.
4. Restart both qmaster and execd for SGE on the frontend:
# /etc/init.d/sgemaster.Frontend stop # /etc/init.d/sgemaster.Frontend start # /etc/init.d/sgeexecd.Frontend stop # /etc/init.d/sgeexecd.Frontend start
And everything will start working. :)
References:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-September/042678.html
© 2014 www.rocksclusters.org. All Rights Reserved.