-
Notifications
You must be signed in to change notification settings - Fork 19
Add support for autoscaling #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
28b161b
a770503
f787c51
1e87ade
b3dba86
eb85c8b
5548366
2f21c0a
b7b14b7
4f8fa0e
11703b9
2ced32b
4e99892
7cd9706
3cd5a8e
92b4864
399bd61
a868d94
2816433
1b24122
840fff3
1f7bcf4
97a25b4
1a1fc8e
07e33f5
534ae78
c23a735
0d13fe2
d42c71d
6595cce
242faaf
da5a043
726f1c2
f8fe1d2
8481261
cd2435a
3b387e6
dfa9045
a9f395a
fae7ae0
e672b65
baef09f
2f938f4
cc262b2
a803905
b6416fd
bca9709
97268e6
1617c96
717752c
505661e
9b04782
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,8 @@ | |
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# NB: To test this from the repo root run: | ||
# ansible-playbook -i tests/inventory -i tests/inventory-mock-groups tests/filter.yml | ||
|
||
from ansible import errors | ||
import jinja2 | ||
|
@@ -30,11 +32,25 @@ def _get_hostvar(context, var_name, inventory_hostname=None): | |
namespace = context["hostvars"][inventory_hostname] | ||
return namespace.get(var_name) | ||
|
||
@jinja2.contextfilter | ||
def group_hosts(context, group_names): | ||
return {g:_group_hosts(context["groups"].get(g, [])) for g in sorted(group_names)} | ||
def hostlist_expression(hosts): | ||
""" Group hostnames using Slurm's hostlist expression format. | ||
E.g. with an inventory containing: | ||
[compute] | ||
dev-foo-0 ansible_host=localhost | ||
dev-foo-3 ansible_host=localhost | ||
my-random-host | ||
dev-foo-4 ansible_host=localhost | ||
dev-foo-5 ansible_host=localhost | ||
dev-compute-0 ansible_host=localhost | ||
dev-compute-1 ansible_host=localhost | ||
Then "{{ groups[compute] | hostlist_expression }}" will return: | ||
["dev-foo-[0,3-5]", "dev-compute-[0-1]", "my-random-host"] | ||
""" | ||
|
||
def _group_hosts(hosts): | ||
results = {} | ||
unmatchable = [] | ||
for v in hosts: | ||
|
@@ -58,9 +74,16 @@ def _group_numbers(numbers): | |
prev = v | ||
return ','.join(['{}-{}'.format(u[0], u[-1]) if len(u) > 1 else str(u[0]) for u in units]) | ||
|
||
def error(condition, msg): | ||
""" Raise an error if condition is not True """ | ||
|
||
if not condition: | ||
raise errors.AnsibleFilterError(msg) | ||
|
||
class FilterModule(object): | ||
|
||
def filters(self): | ||
return { | ||
'group_hosts': group_hosts | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That looks like its backwards incompatible? Could we keep old and new easily enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is the filter name is massively confusing when used in the templating which also used |
||
'hostlist_expression': hostlist_expression, | ||
'error': error, | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
- name: Converge | ||
hosts: all | ||
tasks: | ||
- name: "Include ansible-role-openhpc" | ||
include_role: | ||
name: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}" | ||
vars: | ||
openhpc_enable: | ||
control: "{{ inventory_hostname in groups['testohpc_login'] }}" | ||
batch: "{{ inventory_hostname in groups['testohpc_compute'] }}" | ||
runtime: true | ||
openhpc_slurm_control_host: "{{ groups['testohpc_login'] | first }}" | ||
openhpc_slurm_partitions: | ||
- name: "compute" | ||
extra_nodes: | ||
# Need to specify IPs for the non-existent State=DOWN nodes, because otherwise even in this state slurmctld will exclude a node with no lookup information from the config. | ||
# We use invalid IPs here (i.e. starting 0.) to flag the fact the nodes shouldn't exist. | ||
# Note this has to be done via slurm config rather than /etc/hosts due to Docker limitations on modifying the latter. | ||
- NodeName: fake-x,fake-y | ||
NodeAddr: 0.42.42.0,0.42.42.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we use internal IPs here? like 10.42.42.0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this also affect cloud nodes that do not exist? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JohnGarbutt does 9b04782 clarify why I'm using invalid IPs rather than internal ones? It doesn't affect cloud nodes which don't exist as they will have been listed in the config as State=CLOUD, not State=DOWN. The former specifically means |
||
State: DOWN | ||
CPUs: 1 | ||
- NodeName: fake-2cpu-[3,7-9] | ||
NodeAddr: 0.42.42.3,0.42.42.7,0.42.42.8,0.42.42.9 | ||
State: DOWN | ||
CPUs: 2 | ||
openhpc_cluster_name: testohpc | ||
openhpc_slurm_configless: true | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
--- | ||
name: single partition, group is partition | ||
driver: | ||
name: docker | ||
platforms: | ||
- name: testohpc-login-0 | ||
image: ${MOLECULE_IMAGE} | ||
pre_build_image: true | ||
groups: | ||
- testohpc_login | ||
command: /sbin/init | ||
tmpfs: | ||
- /run | ||
- /tmp | ||
volumes: | ||
- /sys/fs/cgroup:/sys/fs/cgroup:ro | ||
networks: | ||
- name: net1 | ||
docker_networks: | ||
- name: net1 | ||
driver_options: | ||
com.docker.network.driver.mtu: ${DOCKER_MTU:-1500} # 1500 is docker default | ||
- name: testohpc-compute-0 | ||
image: ${MOLECULE_IMAGE} | ||
pre_build_image: true | ||
groups: | ||
- testohpc_compute | ||
command: /sbin/init | ||
tmpfs: | ||
- /run | ||
- /tmp | ||
volumes: | ||
- /sys/fs/cgroup:/sys/fs/cgroup:ro | ||
networks: | ||
- name: net1 | ||
docker_networks: | ||
- name: net1 | ||
driver_options: | ||
com.docker.network.driver.mtu: ${DOCKER_MTU:-1500} # 1500 is docker default | ||
- name: testohpc-compute-1 | ||
image: ${MOLECULE_IMAGE} | ||
pre_build_image: true | ||
groups: | ||
- testohpc_compute | ||
command: /sbin/init | ||
tmpfs: | ||
- /run | ||
- /tmp | ||
volumes: | ||
- /sys/fs/cgroup:/sys/fs/cgroup:ro | ||
networks: | ||
- name: net1 | ||
docker_networks: | ||
- name: net1 | ||
driver_options: | ||
com.docker.network.driver.mtu: ${DOCKER_MTU:-1500} # 1500 is docker default | ||
provisioner: | ||
name: ansible | ||
verifier: | ||
name: ansible |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
|
||
- name: Check slurm hostlist | ||
hosts: testohpc_login | ||
tasks: | ||
- name: Get slurm partition info | ||
command: sinfo --noheader --format="%P,%a,%l,%D,%t,%N" # using --format ensures we control whitespace | ||
register: sinfo | ||
- name: | ||
assert: # PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
that: "sinfo.stdout_lines == ['compute*,up,60-00:00:00,6,down*,fake-2cpu-[3,7-9],fake-x,fake-y', 'compute*,up,60-00:00:00,2,idle,testohpc-compute-[0-1]']" | ||
fail_msg: "FAILED - actual value: {{ sinfo.stdout_lines }}" |
Uh oh!
There was an error while loading. Please reload this page.