Default behaviour of SGE inhibits the load balancer from shutting down nodes #158

Open
scrappythekangaroo opened this Issue Oct 28, 2012 · 1 comment

1 participant

@scrappythekangaroo

It seems that the default behaviour of SGE is to use "load_formula = np_load_avg" (see qconf -ssconf) which will balance jobs across nodes.

For example:
1. My cluster currently has three nodes up and the queue is currently empty
2. Three new jobs come in -- these will most likely be spread across each of the three nodes
3. Since all three nodes have processes on them the load balancer will not be able to shut down any of the nodes even though the cluster is under-utilised

I'd suggest modifying the SGE setup to use the "fill up host" configuration according to:
http://wiki.gridengine.info/wiki/index.php/StephansBlog

Even better would be to configure SGE to send jobs to the most recently booted node first so that we may shut down older nodes first (hopefully before their hour is up). I'm not yet sure if this is possible.

@scrappythekangaroo

Example code that applies the "fill up host" change here:
scrappythekangaroo@fb54595

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment