Skip to content

Conversation

@JavierCladellas
Copy link
Collaborator

@JavierCladellas JavierCladellas commented Jan 22, 2025

Major changes:

  • App config now has a resources field, containing tasks, nodes, exclusive_access and memory values. Conceived to be used along with parameters, like this:
     "resources":{
         "tasks":"{{parameters.resources.tasks.value}}",
         "exclusive_access":"{{parameters.resources.exclusive_access.value}}",
         "memory":"{{parameters.memory.value}}",
     },
  • Add a conditions field to parameters, enabling to select given combinations (pruning) of the parameter space.
    For example,
"parameters": [
        {
            "name": "tasks",
             "sequence":[64,128,256]
        },
        {
            "name": "mesh",
             "sequence":["M1","M2","M3","M4"],
            "conditions":{
                "M2":{ "tasks": [128] },
                "M3":{"tasks": [128,256]},
                "M3":{"tasks": []}
            }
        }
]

Will only run the tests:

  • M1 with tasks 64, 128, 256 (default value "all" as not specified in conditions)

  • M2 only with 128 tasks

  • M3 with 128 and 256 tasks

  • Will not run any tests for M4
    Also accepts multiple conditions for a given value, e.g "M2":{ "tasks": [128], "discretization": ["P1","P3"] }

  • Removed the memory field of the root of the configuration file, it should now be specified under the resources field. The --mem directive is still added to the scheduler script

  • [❗ IMPORTANT ❗ ] Removed the use_nodes_option. The --nodes directive will no longer be emitted. But --ntasks and --ntasks-per-node will always appear, we should let the scheduler allocate the number of nodes it needs. ( Done like this because reframe does not let to specify the number of nodes, although we can explicitly pass the --nodes directive but will be scheduler dependent). What do you think @vincentchabannes

  • The scheduler script is now exported on the ReFrame report, along with output and error logs. A link to these are now included in the report (adoc) parameter table.

  • Separated resource handling resources.py using strategy+factory pattern to handle tasks-nodes-tasks_per_node-memory combinations.

  • Also updated tests.

  • The actual value of num_nodes, tasks and tasks_per_node | None should be shown in the report, not only the parameter values (as they might differ, be updated )

  • Scripts and logs should be exported even on failed tests (if the job fails during the run phase, logs will not be exported 😢 )

  • Implement copying mechanism from a input_user_dir folder to input_dataset_base_dir. Only compatible with execution_policy:serial. If input_user_dir is defined, all files listed under input_file_dependencies will be copied (and removed after the test is done).

@JavierCladellas JavierCladellas marked this pull request as ready for review January 22, 2025 15:30
@vincentchabannes vincentchabannes merged commit dd51660 into master Feb 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarking Issues related to benchmarking enhancement A new feature or request hpc High-performance computing related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make memory a parameter

3 participants