Skip to content

srunalloc launcher makes reframe lose track of the prerun_cmds etc. output #3044

@vkarak

Description

@vkarak

The problem is that the test's standard output/error files are passed as options to the srun command, thus overriding the output of the whole script. Here's how to reproduce:

Configuration file (you can add the access options accordingly if needed):

site_configuration = {
    'systems': [
        {
            'name': 'system',
            'hostnames': ['nid0'],
            'partitions': [
                {
                    'name': 'part',
                    'scheduler': 'local',
                    'launcher': 'srunalloc',
                    'environs': ['builtin']
                }
            ]
        }
    ]
}

And the test file:

import reframe as rfm
import reframe.utility.sanity as sn


@rfm.simple_test
class srunalloc_fail_test(rfm.RunOnlyRegressionTest):
    executable = 'hostname'
    prerun_cmds = ['echo hello']
    valid_systems = ['system:part']
    valid_prog_environs = ['*']

    @sanity_function
    def validate(self):
        return sn.assert_found('hello', self.stdout)

Running the test fails as follows:

SUMMARY OF FAILURES
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
FAILURE INFO for srunalloc_fail_test (run: 1/1)
  * Description:
  * System partition: system:part
  * Environment: builtin
  * Stage directory: /home/user/reframe/stage/system/part/builtin/srunalloc_fail_test
  * Node list: nid0001
  * Job type: local (id=83006)
  * Dependencies (conceptual): []
  * Dependencies (actual): []
  * Maintainers: []
  * Failing phase: sanity
  * Rerun with '-n /b359e5de -p builtin --system system:part -r'
  * Reason: sanity error: pattern 'hello' not found in 'rfm_job.out'
--- rfm_job.out (first 10 lines) ---
nid0001
--- rfm_job.out ---
--- rfm_job.err (first 10 lines) ---
--- rfm_job.err ---

Removing the --output and --error srun options here solves the issue:

if job.stdout:
ret += ['--output=%s' % job.stdout]
if job.stderr:
ret += ['--error=%s' % job.stderr]

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions