Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proto-scheduler class, pre-support for slurm #271

Merged
merged 6 commits into from
Jun 18, 2020
Merged

Conversation

marshallward
Copy link
Collaborator

This is a very minimal implementation of a Scheduler class, in order to
handle scheduler-specific operations.

It includes a basic Factory for controlling the Scheduler instantiation,
and a single method for returning the submission command line.

I do not expect things to keep working like this, it is only meant to be
the most minimal implementation to support both PBS and Slurm.

Hopefully more changes to come.

This is a very minimal implementation of a Scheduler class, in order to
handle scheduler-specific operations.

It includes a basic Factory for controlling the Scheduler instantiation,
and a single method for returning the submission command line.

I do not expect things to keep working like this, it is only meant to be
the most minimal implementation to support both PBS and Slurm.

Hopefully more changes to come.
@pep8speaks
Copy link

pep8speaks commented Jun 16, 2020

Hello @marshallward! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-17 03:25:28 UTC

@marshallward
Copy link
Collaborator Author

I have not yet tested this on Gadi, so it may fail spectacularly on there. Will try to get it going...

@coveralls
Copy link

coveralls commented Jun 16, 2020

Coverage Status

Coverage decreased (-0.8%) to 41.274% when pulling 98de47d on proto_scheduler into bb72250 on master.

@marshallward
Copy link
Collaborator Author

marshallward commented Jun 16, 2020

Already found one bug:

 539                 #model_prog.append('-np {0}'.format(model_ncpus))                                                                                                                                       
 540                 model_prog.append('-n {0}'.format(model_ncpus))                 

(or maybe that works? mpirun appears to support both...)

@aidanheerdegen
Copy link
Collaborator

I don't understand why coveralls says coverage reduced by 42%.

A few thoughts:

  1. With the site specific hacks required for both GFDL and NCI perhaps it would be best to create clean scheduler classes and put the site specific stuff into some site/platform class. It could contain methods to modify the mpirun command, as well as queue information maybe?

  2. This is creating code that I can't test, as I don't have access to a slurm machine, and is more difficult for you. Might be time for me to resuscitate my Pawsey account ...

  3. Some of the recent code reorganisation was to make it easier to write tests for the code. So for example have a functions that returns a command string that can be checked for correctness rather than submitting that directly. In case it wasn't obvious why I'd made those changes.

@marshallward
Copy link
Collaborator Author

marshallward commented Jun 17, 2020

  1. I don't understand either what's going on with code coverage, but will try to see what's happening. (So much has changed, I don't recognize the output much anymore...)

  2. I agree, we should try to nail down these site-specific hacks. An NCI scheduler which subclasses PBS is not a bad idea IMO, for example. (Maybe revive the old "ANUPBS" name?)

    For now, I think it's probably OK to just say "PBS" is synonymous with "NCI" and "Slurm" is synonymous with "GFDL".

  3. I agree (again) and think the safest thing to do is for me not to merge anything until you've tested it out, or at least signed off on it. And no need to return the favor; at this time I'm the only GFDL user.

  4. Yeah, the testing changes are a bit confusing to me at the moment (see first point), but hopefully I'll get my head around them soon.

Pytest is only used for Python3 testing, and our environment needs a
newer version than is being installed by Travis, so we force it to be at
least 4.6.
@aidanheerdegen
Copy link
Collaborator

I have an idea for making the site specific stuff a little tidier. How about moving to a system where all arguments to stuff like mpirun and qsub is a dict or ordereddict. Then that can be returned, and checked, really easily, and also modified in a very modular way. You can delete and replace options for your MPICH wrapper, I can add storage options for our PBS system.

Rather than parsing stupid effing stringsI think this could be a winner.

@marshallward
Copy link
Collaborator Author

I agree with much of this, and consider most of those string checks to be stubs.

I think that a lot of the proposed structure will emerge as I get the GFDL version working correctly.

@marshallward marshallward merged commit e5f3787 into master Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants