Skip to content

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Jul 7, 2020

List nodes and nodes in partitions in slurm.conf using ansible inventory group contents rather than assuming they follow the pattern cluster_name-group_name-{0..num_nodes-1}. This allows compute nodes to have arbitrary hostnames.

I think this addresses #3 in a more flexible way than the current code.

Note that:

  • The parameter num_nodes is no longer required & will be ignored if provided.
  • The requirement for appropriately-named inventory group(s) existed already; this is not changed by this PR.
  • Node definition now makes use of NodeName=DEFAULT which is recommended to improve scheduler performance.
  • The case with a single node in a partition/group is now handled without being a special case.
  • Combined conditionals on this line avoids a bug in the current master branch with escaped line-ends if ram_mb is not provided and group_hosts is empty.
  • Newlines/empty lines in template are required to put linebreaks in completed file in the correct place.

Tested (manually, and now via Molecule running locally with different node/partition/group names - this is not yet in Travis) on cases:

        # test 1: single partition, group is partition
        openhpc_slurm_partitions:
         - name: "compute"
        # test 1b: single node [OK]
        # test 1c: multiple nodes, non-sequential names
        
        # test 2: two partitions, groups are partitions
        openhpc_slurm_partitions:
          - name: "compute"
          - name: "fake"
        
        # test 3: one partition with groups
        openhpc_slurm_partitions:
          - name: "compute"
            groups:
            - name: "compute"
            - name: "fake"

@sjpb sjpb marked this pull request as ready for review July 8, 2020 13:31
@sjpb sjpb requested review from brtkwr, markgoddard and oneswig July 9, 2020 07:46
@sjpb sjpb self-assigned this Jul 9, 2020
@sjpb sjpb added the enhancement New feature or request label Jul 9, 2020
@brtkwr
Copy link

brtkwr commented Jul 9, 2020

FYI the molecule tests pass and also deployed a basic slurm cluster which also appears to work.

Copy link
Member

@oneswig oneswig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor observation on the README content. I tested it with OpenHPC2 and CentOS 8 - there's some work to do there but I ran on top of this branch and it all seemed good.

In due course it would be neat to make a python filter to find run-length encodings from an arbitrary list of hostnames!

@oneswig oneswig mentioned this pull request Jul 9, 2020
Copy link

@brtkwr brtkwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjpb sjpb merged commit 9a76c34 into master Jul 10, 2020
@sjpb sjpb deleted the anyhostname branch July 10, 2020 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants