-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Hi, wondering if you have a suggestion how to do this:
I want to run a parameterised test using:
- all cores of the nodes
- a geometric series of number of nodes: 2, 4, 8, 16, ... up to the maximum available.
The reframe partition has a slurm --partition directive in the access property to define the slurm partition to use.
Inside a test I can parse output from sinfo to get the number of nodes and number of cores, and reading self.current_partition.access allows me to limit this to the right slurm partition. This allows me to calculate self.num_tasks_per_node and self.num_tasks for reframe.
However the current reframe partition is only set during the setup() phase, whereas test parameterisation is done BEFORE even the test's __init__(). Which makes total sense from a python perspective but is problematic here.
The best approach I can think of so far is to just parameterise over a number of tests, say range(6), then inside each parameterised test calculate the number of processes/nodes to set, with e.g. __init__(1) meaning a single node and __init__(6) meaning all nodes. But that is not really ideal as I won't get that nice series which allows me to compare 2,4,... node results across e.g. slurm partitions.
The only other idea I have is to somehow launch subtests, e.g. the main test runs on all nodes, then I run multiple srun commands to run on subsets of those. But that doesn't fit into reframe's model, and is obviously inefficient on resources.
Any suggestions gratefully appreciated!