Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,12 @@ package in the image.

`openhpc_slurmdbd_host`: Optional. Where to deploy slurmdbd if are using this role to deploy slurmdbd, otherwise where an existing slurmdbd is running. This should be the name of a host in your inventory. Set this to `none` to prevent the role from managing slurmdbd. Defaults to `openhpc_slurm_control_host`.

`openhpc_slurm_configless`: Optional, default false. If True then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used. **NB: Requires Centos8/OpenHPC v2.**
`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used. **NB: Requires Centos8/OpenHPC v2.**

`openhpc_munge_key_path`: Optional, default ''. Define a path for a local file containing a munge key to use, otherwise one will be generated on the slurm control node.

`openhpc_login_only_nodes`: Optional. If using "configless" mode specify the name of an ansible group containing nodes which are login-only nodes (i.e. not also control nodes), if required. These nodes will run `slurmd` to contact the control node for config.

### slurm.conf

`openhpc_slurm_partitions`: list of one or more slurm partitions. Each partition may contain the following values:
Expand Down
1 change: 1 addition & 0 deletions defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ ohpc_release_repos:
"8": "http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm" # ohpc v2 for Centos 8
openhpc_slurm_configless: false
openhpc_munge_key: ''
openhpc_login_only_nodes: ''
2 changes: 2 additions & 0 deletions molecule/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ test4 | 1 | N | 2x compute node, accounting en
test5 | 1 | N | As for #1 but configless
test6 | 1 | N | 0x compute nodes, configless
test7 | 1 | N | 1x compute node, no login node, configless
test8 | 1 | N | 2x compute node, 2x login-only nodes, configless
test9 | 1 | N | As test8 but uses `--limit=testohpc-control,testohpc-compute-0` and checks login nodes still end up in slurm.conf

# Local Installation & Running

Expand Down
22 changes: 22 additions & 0 deletions molecule/test8/INSTALL.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
*******
Docker driver installation guide
*******

Requirements
============

* Docker Engine

Install
=======

Please refer to the `Virtual environment`_ documentation for installation best
practices. If not using a virtual environment, please consider passing the
widely recommended `'--user' flag`_ when invoking ``pip``.

.. _Virtual environment: https://virtualenv.pypa.io/en/latest/
.. _'--user' flag: https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site

.. code-block:: bash

$ python3 -m pip install 'molecule[docker]'
19 changes: 19 additions & 0 deletions molecule/test8/converge.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
- name: Converge
hosts: all
tasks:
- name: "Include ansible-role-openhpc"
include_role:
name: "ansible-role-openhpc/"
vars:
openhpc_enable:
control: "{{ inventory_hostname in groups['testohpc_control'] }}"
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
runtime: true
openhpc_slurm_control_host: "{{ groups['testohpc_control'] | first }}"
openhpc_slurm_partitions:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_login_only_nodes: 'testohpc_login'

79 changes: 79 additions & 0 deletions molecule/test8/molecule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
name: single partition, group is partition
driver:
name: docker
platforms:
- name: testohpc-control
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_control
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-login-0
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_login
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-login-1
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_login
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-compute-0
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_compute
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1
- name: testohpc-compute-1
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_compute
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1
provisioner:
name: ansible
# ansible_args:
# - --limit=testohpc-control,testohpc-compute-0
verifier:
name: ansible
12 changes: 12 additions & 0 deletions molecule/test8/verify.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---

- name: Check slurm hostlist
hosts: testohpc_login # NB for this test this is 2x non-control nodes, so tests they can contact slurmctld too
tasks:
- name: Get slurm partition info
command: sinfo --noheader --format="%P,%a,%l,%D,%t,%N" # using --format ensures we control whitespace
register: sinfo
- name:
assert: # PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
that: "sinfo.stdout_lines == ['compute*,up,60-00:00:00,2,idle,testohpc-compute-[0-1]']"
fail_msg: "FAILED - actual value: {{ sinfo.stdout_lines }}"
22 changes: 22 additions & 0 deletions molecule/test9/INSTALL.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
*******
Docker driver installation guide
*******

Requirements
============

* Docker Engine

Install
=======

Please refer to the `Virtual environment`_ documentation for installation best
practices. If not using a virtual environment, please consider passing the
widely recommended `'--user' flag`_ when invoking ``pip``.

.. _Virtual environment: https://virtualenv.pypa.io/en/latest/
.. _'--user' flag: https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site

.. code-block:: bash

$ python3 -m pip install 'molecule[docker]'
19 changes: 19 additions & 0 deletions molecule/test9/converge.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
- name: Converge
hosts: all
tasks:
- name: "Include ansible-role-openhpc"
include_role:
name: "ansible-role-openhpc/"
vars:
openhpc_enable:
control: "{{ inventory_hostname in groups['testohpc_control'] }}"
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}"
runtime: true
openhpc_slurm_control_host: "{{ groups['testohpc_control'] | first }}"
openhpc_slurm_partitions:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_login_only_nodes: 'testohpc_login'

79 changes: 79 additions & 0 deletions molecule/test9/molecule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
name: single partition, group is partition
driver:
name: docker
platforms:
- name: testohpc-control
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_control
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-login-0
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_login
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-login-1
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_login
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1

- name: testohpc-compute-0
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_compute
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1
- name: testohpc-compute-1
image: ${MOLECULE_IMAGE}
pre_build_image: true
groups:
- testohpc_compute
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
networks:
- name: net1
provisioner:
name: ansible
ansible_args:
- --limit=testohpc-control,testohpc-compute-0
verifier:
name: ansible
13 changes: 13 additions & 0 deletions molecule/test9/verify.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---

- hosts: testohpc_control
tasks:
- name: Check both compute nodes are listed and compute-0 is up
shell: 'sinfo --noheader --Node --format="%P,%a,%l,%D,%t,%N"' # using --format ensures we control whitespace
register: sinfo
- assert: # PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
that: "sinfo.stdout_lines == ['compute*,up,60-00:00:00,1,idle,testohpc-compute-0','compute*,up,60-00:00:00,1,unk*,testohpc-compute-1']" # NB: -1 goes 'down' after a while!
fail_msg: "FAILED - actual value: {{ sinfo.stdout_lines }}"
- name: Check login nodes in config
command: "grep NodeName={{ item }} /etc/slurm/slurm.conf"
loop: "{{ groups['testohpc_login'] }}"
5 changes: 5 additions & 0 deletions tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
when: "openhpc_enable.get(item, false)"
tags: always

- name: Set slurmd as service for openhpc_login_only_nodes
set_fact:
openhpc_slurm_service: "slurmd"
when: openhpc_login_only_nodes and (openhpc_login_only_nodes in group_names)

- name: Install packages
block:
- include_tasks: install.yml
Expand Down
2 changes: 1 addition & 1 deletion tasks/runtime.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
group: root
mode: 0644
when:
- openhpc_enable.batch | default(false) | bool
- openhpc_slurm_service == 'slurmd'
- openhpc_slurm_configless
notify:
- Reload SLURM service
Expand Down
6 changes: 6 additions & 0 deletions templates/slurm.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,12 @@ PartitionName={{part.name}} \
{% endfor %}{# group #}
{% endfor %}{# partitions #}


# Define slurmd nodes not in partitions for configless login-only nodes:
{%if openhpc_login_only_nodes %}{% for node in groups[openhpc_login_only_nodes] %}
NodeName={{ node }}
{% endfor %}{% endif %}

# Want nodes that drop out of SLURM's configuration to be automatically
# returned to service when they come back.
ReturnToService=2