SGE Changes Tracking #24

tatarsky · 2015-11-04T20:19:24Z

I will be documenting when I make changes to the SGE config as we move toward a first goal of queue and settings config here. I will also attempt to get into the Wiki the details of such changes usages.

tatarsky · 2015-11-04T20:21:16Z

Changed default qsub priority to 100 from 0. This allows a user to qalter their own priorities on queued obs later or on the qsub command line without an SGE manager intervention. Details coming to wiki when I get a moment. Goal is to allow user to re-order their own priorities within reason once queued.

tatarsky · 2015-11-04T20:22:37Z

Initial default qsub memory target will be 4GB. Memory as a consumable is NOT enabled yet. But noting this selection from Skype call.

tatarsky · 2015-11-06T15:15:23Z

Due to the priority request the two high memory nodes will be removed from the all.q (default) in a few minutes. Please confirm the name of the desired long running queue. I believe it was long

nariai · 2015-11-06T15:19:24Z

long queue is fine. thanks!

tatarsky · 2015-11-06T15:24:14Z

Nodes 15 and 16 will shortly not be in the default queue (all.q).

Do you want just those two nodes in the long running queue or do you want all the nodes to be able to run some number of long.q jobs as well if they are idle?

tatarsky · 2015-11-06T15:27:55Z

Memory as a consumable will be prepared today but NOT activated as @hurleyLi is running I believe a stack of jobs. I will be oversubscribing ram on the nodes by 20% to start. The default qsub if non-specified will be 4GB.

nariai · 2015-11-06T15:30:45Z

Let's do the first option (nodes 15 and 16 as the long queue).
We want to use the long queue as the high memory requiring jobs.

tatarsky · 2015-11-06T15:38:38Z

When the jobs on node 15 and node 16 are complete I will test that the long.q is correct and ready for use and add the usage statements to the wiki.

What was the other queue you wanted?

nariai · 2015-11-06T15:45:01Z

Also, as discussed, please make 128 cores (four nodes amount) as the short queue (2 days maximum).
In this case, let all the nodes be able to run the short queue as well if they are idle.

The remaining nodes (10 nodes) will be the week queue (7 days maximum).
In this case, if the high memory nodes (2 nodes) are idle, then these nodes can be
used for the week queue.

tatarsky · 2015-11-06T15:46:15Z

Noted.

tatarsky · 2015-11-06T15:46:56Z

What queue do you want as default if they don't specify it. I assume week

nariai · 2015-11-06T15:50:05Z

Yes, let's make the week queue as a default.

tatarsky · 2015-11-06T16:30:33Z

Week queue for simplicity is going to simply be the all.q for now. I will set the maximum run time to one week after @hurleyLi jobs are done.

hurleyLi · 2015-11-06T16:46:36Z

I'll stop submitting jobs for now then.

tatarsky · 2015-11-06T17:10:48Z

No, thats not been part of any thing I've recommended. I was told you should be pushing jobs through! My efforts here are to NOT conflict with that as a stated goal from Kelly.

hurleyLi · 2015-11-06T18:53:09Z

Sorry I misunderstood.. But if I'm gonna continuously running those jobs, they'll probably be finished in at least a week. So you'll activate the changes at that time?
Another thing I noticed is there're two jobs running on n16 now, although both of them are specified as using 32 cores.

tatarsky · 2015-11-06T18:59:30Z

I've been told your jobs are the priority. If my changes are going to interfere with that I will leave the changes until when you are done.

However, yes, you are correct on that n16 matter. Which is a config item I need to finish when I believed getting that queue was my priority.

I might as well finish it if you can hold a moment.

tatarsky · 2015-11-06T19:06:24Z

I've disabled that queue for now. Shouldn't happen again and will let your jobs finish.

tatarsky · 2015-11-06T19:08:29Z

Basically and I could use some clarification still. Are people waiting to run jobs besides @hurleyLi ?
If so, why? Just because we don't have named queues yet? I am trying to honor a request to not cause issues with his jobs but configuring a scheduler while people are using it is not really my favorite thing to do. However, there is nothing preventing anyone from issuing jobs to the default queue.

If @hurleyLi reduces his slot count a bit as noted in #34 there will even be a place to run such jobs.But I defer to his goals in getting things done for those numbers.

hurleyLi · 2015-11-06T19:19:22Z

Hi Paul, how long does it take for you to activate/implement those changes?

tatarsky · 2015-11-06T19:27:33Z

I have the first pass ready for some testing to make sure what has been described is what will occur. I am not however changing the name of the all.q to week.q at the moment due to your jobs in said queue.

tatarsky · 2015-11-06T19:32:17Z

For example at the moment due to node 3 being idle I am testing the short queue.

hurleyLi · 2015-11-06T19:44:10Z

What I can do is I can kill the jobs and let you make the changes, coz I also want these to be implemented and tested sooner than later. I have to stop at some point anyway for you the activate these changes, if I don't stop, the whole process after these jobs will last a month.

Now I'm really just waiting 2-3 jobs to finish. 276, 279, 285. They'll probably finish around 2-3pm this afternoon. Do you think you can activate the changes this afternoon, so I can restart those jobs later today. And at the meantime, I can test some of my changes as mentioned in #34

Does it sound a plan?

tatarsky · 2015-11-06T19:49:42Z

I will implement the changes when I see your jobs exit. Be aware your 2-3PM is my end of day. Afterhours monitoring is best effort which is why I normally do not make heavy changes on a Friday.

But in the interests of moving this along and given I am mostly around this weekend I will attempt to implement the items above for the queues.

IMPORTANT question however: are you really ready for memory reservations? I can do that separately. But you would need to add the proper -l h_vmem=XXG to your jobs....

As we are outpacing my ability to document and configure we'd have to sort out any issues as we go.

hurleyLi · 2015-11-06T19:57:00Z

Do you prefer to do this on Monday? We can do this next week if it's best for you to monitor the cluster after these changes. I don't think other people are gonna use the cluster beside me

tatarsky · 2015-11-06T19:58:24Z

If that is truly the case I would prefer Monday. That way you will get uninterrupted use this weekend. While SGE is fairly direct, it periodically gets confused and I'd rather not scramble through config if I can do it during regular work hours.

nariai · 2015-11-06T19:59:43Z

That's fine for me. Let's change the config on Monday.

tatarsky · 2015-12-22T15:34:31Z

Old nodes except for cn7 and cn12 are in SGE opt.q which is selected with the -l opt resource. Report issues in #33 and I'll get to them fairly quickly but with some in/out today for me.

Remember, its a single 1G link over to these nodes and for now we are using an NFS->Lustre method.

tatarsky · 2015-12-22T16:11:57Z

So this is to remind me: I have NOT turned on memory as a consumable yet on opt.q. I will shortly after an errand. I am going to start with an oversubsciption of 74G (real=54G"

There is no walltime limit on the opt.q either. I assume for now that is fine.

tatarsky · 2015-12-29T01:05:52Z

Walltime on opt.q set to one week per above and memory as a consumable activated with some oversubscription.

tatarsky · 2016-02-04T16:03:11Z

The SGE s_core limit (soft core limit) was changed from UNLIMITED to 0 a moment ago per #102

This means the default UNIX limit "coredump" is set to zero which results in a crashing program not dumping potentially a large core file. A recent reminder of the joys of this with many large memory jobs all crashing due to a bug was the impetus for this what I normally do as a default and forgot.

If you NEED coredumps in a job add a ulimt -c unlimited to your submit script.

nariai · 2016-02-19T01:29:17Z

I submitted a group of jobs (job id: 405730-405751) by
qsub -l long
However, it looks like only two of the submitted jobs are using long.q, whereas other jobs are using all.q.

I'm expecting that the submitted jobs can be running more than two days (that is why I submitted jobs in long queue), but it looks like some of the jobs running in all.q will be terminated after two days? Is this situation (submit jobs in long.q, but actually using all.q) expected? Do you have any ideas?

tatarsky · 2016-02-19T03:03:26Z

I'm looking at job 405731 as an example of this.

No, I would say that is not expected but will look at the logs to see what I can locate to explain it or locate a config item that needs to change.

tatarsky · 2016-02-19T03:15:14Z

And if you have handy the precise qsub command you issued it would be helpful.

tatarsky · 2016-02-19T03:16:07Z

Assuming it wasn't just qsub -l long calc_LD_005_ALL.sh BTW...just checking if any other command line args were used...

nariai · 2016-02-19T03:25:56Z

Thank you for taking a look.

The command was (in Makefile):

calc_LD_005_ALL:
number=1 ; while [[ $$number -le 22 ]] ; do
qsub -l long -p 0 -pe smp 1 -cwd -e logs -o stdouts -v ARG1=$$number calc_LD_005_ALL.sh ;
((number = number + 1)) ;
done

The path is:
/projects/T2D/analysis/common_variants

tatarsky · 2016-02-19T03:34:47Z

OK. I noted that "pe smp 1" in and wanted to check on that. Is there a reason for requesting a PE of only one processor? (compared to more than one as that is really the same as not asking for a PE). Shouldn't matter but I've actually never used that construct.

I will try to reproduce using a similar makefile. I wonder if the -l long is somehow not being fully preserved in the make commands that are finally issued.

I wonder if the behavior will change if you embed the SGE hard resource in the script ala:

#$ -l long

tatarsky · 2016-02-19T03:38:05Z

I have reproduced the above. For whatever reason the first job is issued to long.q as expected but the following jobs are in all.q

I am trying to step through the outputted items to see why.

nariai · 2016-02-19T03:41:12Z

When I changed from "-l long" to "-l week", all of the jobs are running in week.q (405752-405773).
Other parameters were set exactly the same.
Is this something long.q specific?

tatarsky · 2016-02-19T03:43:50Z

Interesting. When I removed "pe smp 1" they all ended up in long.q ;)

So I am suspecting something about the final commands being output here.

Is there a reason you are doing this make loop compared to an SGE array job?

nariai · 2016-02-19T03:47:10Z

Very interesting. No, I didn't have a specific reason not using an array job. Next time, I'll try to remove "pe smp 1" if I run jobs with one cpu, or try to use an array job.

tatarsky · 2016-02-19T03:56:09Z

Let me look at it a bit more in the morning. Whatever is going on its subtle.

The array job suggestion I can explain more its just a simpler way of running a series of jobs from the same submit script using SGE built in shell variables that increment similar to your make file loop. Basically a notch less setup work and SGE is a notch more efficient with array jobs particularly for large arrays.

But I'd still like to understand what is going on there.

I am suspicious of the PE perhaps confusing things but I don't know why it would drop into all.q.

So "unsure" is the answer for now and I will look closer with some coffee in the morning.

nariai · 2016-02-19T04:13:43Z

Thanks, please let me know if you can figure it out in more detail. I just re-submitted jobs in long queue.

tatarsky · 2016-02-19T15:16:44Z

So basically its got something to do with the -pe smp 1 but I don't know why. Tracing the make file shows clearly that your qsubs are as expected but that under some circumstance its like the -l long is dropped.

Removing the -pe smp 1 and it works fine. Trying to place the -l long into the submit script does NOT change the outcome. Changing the ordering does NOT change the outcome.

qsub -l long -pe smp 1 -p 0 -cwd -e logs -o stdouts -v ARG1=17 foo.sh

So I'm suspecting some kind of qsub argument bug at the moment but will double check my queue assumptions.

I am also going to try different pe smp options. Technically "pe smp 1" is not really a concept you need to specify but I would assume even if you did SGE wouldn't care.

tatarsky · 2016-02-19T15:19:36Z

Interestingly using -pe smp 2 results in everything going into long.q fine. At least for my tests.

So I'm guessing a form of bug involving asking for an SMP environment (multiple processors) but then really saying "not really" as that is what "1" would mean.

tatarsky · 2016-02-19T15:25:58Z

There are a few mutterings on the mailing lists of somewhat similar experiences. But nothing conclusive. I believe the queues are correct but that the outcome when the PE is 1 is probably a bug. I'll look a little more but my belief is also that stanza isn't ever needed (asking for one processor slot).

tatarsky · 2016-02-19T15:45:16Z

BTW, random choices of PE smp values above 1 all seem to result in the jobs correctly in the long.q.
So I'm guessing more its a bug.

tatarsky · 2017-02-27T13:36:56Z

SGE spool moved to fl-ims per #178 to reduce impact if fl-hn1 crashes during attempts to determine reason. fl-hn2 shadow qmaster now will not hang on NFS spool. No cross head node dependencies.

tatarsky · 2017-04-21T11:53:17Z

Moved seq_no for juplow and juphigh to non-matching seq_no's as other queues.
Based on possible assignment to said queues under some cases of jobs not referencing their required resource. #188

Current settings for reference:

short     0,[@notlonghosts=0],[@longhosts=5]
juplow    1,[@notlonghosts=1],[@longhosts=6]
juphigh   2,[@notlonghosts=2],[@longhosts=7]
opt       3
week      5,[@notlonghosts=5],[@longhosts=10]
all       10,[@notlonghosts=10],[@longhosts=15
long      10

tatarsky · 2017-06-30T12:34:28Z

Just for reference. c7.q was added. Single host in hostgroup for it cn12. queue resource c7. Defaults for everything else.

tatarsky · 2017-07-20T16:59:04Z

Make a "jupsmp" PE environment in prep for possible fix for #188
Remove make and mpi from juplow.q and juphigh.q as never needed for those queues and in theory related to proposed solution to #188 (although we don't see much mpi or make PE on this cluster)

tatarsky · 2019-04-12T12:48:46Z

For reference: Per Git #278 the s_rt and h_rt were removed from juphigh.q.

tatarsky · 2020-01-07T19:21:30Z

opt_q and all cn* nodes will be removed shortly. Made a backup of SGE config and will start removing. Already removed them as an option in JupyterHubs.

tatarsky mentioned this issue Nov 6, 2015

SGE setting and documentation #32

Closed

tatarsky mentioned this issue Nov 6, 2015

IMPORTANT read this carefully and comment #34

Closed

tatarsky self-assigned this Nov 6, 2015

tatarsky mentioned this issue Dec 29, 2015

Error in queue request #88

Closed

tatarsky mentioned this issue Jan 7, 2016

Jobs Won't Start #95

Closed

tatarsky mentioned this issue Feb 4, 2016

Default SGE coredump size #102

Closed

tatarsky mentioned this issue Apr 12, 2019

Changes to jupyterhub #278

Closed

SGE Changes Tracking #24

SGE Changes Tracking #24

Comments

tatarsky commented Nov 4, 2015

tatarsky commented Nov 4, 2015

tatarsky commented Nov 4, 2015

tatarsky commented Nov 6, 2015

nariai commented Nov 6, 2015

tatarsky commented Nov 6, 2015

tatarsky commented Nov 6, 2015

nariai commented Nov 6, 2015

tatarsky commented Nov 6, 2015

nariai commented Nov 6, 2015

tatarsky commented Nov 6, 2015

tatarsky commented Nov 6, 2015

nariai commented Nov 6, 2015

tatarsky commented Nov 6, 2015

hurleyLi commented Nov 6, 2015

tatarsky commented Nov 6, 2015

hurleyLi commented Nov 6, 2015

tatarsky commented Nov 6, 2015

tatarsky commented Nov 6, 2015

tatarsky commented Nov 6, 2015

hurleyLi commented Nov 6, 2015

tatarsky commented Nov 6, 2015

tatarsky commented Nov 6, 2015

hurleyLi commented Nov 6, 2015

tatarsky commented Nov 6, 2015

hurleyLi commented Nov 6, 2015

tatarsky commented Nov 6, 2015

nariai commented Nov 6, 2015

tatarsky commented Dec 22, 2015

tatarsky commented Dec 22, 2015

tatarsky commented Dec 29, 2015

tatarsky commented Feb 4, 2016

nariai commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

nariai commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

nariai commented Feb 19, 2016

tatarsky commented Feb 19, 2016

nariai commented Feb 19, 2016

tatarsky commented Feb 19, 2016

nariai commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 19, 2016

tatarsky commented Feb 27, 2017

tatarsky commented Apr 21, 2017

tatarsky commented Jun 30, 2017

tatarsky commented Jul 20, 2017

tatarsky commented Apr 12, 2019

tatarsky commented Jan 7, 2020