Skip to content

ReFrame crashes when trying to retrieve the completion time of a Slurm job #1349

@vkarak

Description

@vkarak

Essentially, the end time is reported as Unknown by Slurm and ReFrame panics.

Traceback (most recent call last):
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/frontend/executors/__init__.py", line 169, in _safe_call
    return fn(*args, **kwargs)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/pipeline.py", line 114, in _wrapped
    return fn(*args, **kwargs)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/pipeline.py", line 1329, in poll
    return self._job.finished()
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/__init__.py", line 387, in finished
    done = self.scheduler.finished(self)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 449, in finished
    self._update_state(job)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 363, in _update_state
    job, ','.join(s.group('nodespec') for s in state_match)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 332, in _set_nodelist
    job.nodelist = [n.name for n in self._get_nodes_by_name(nodespec)]
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 323, in _get_nodes_by_name
    nodespec)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/utility/os_ext.py", line 33, in run_command
    log=log)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/utility/os_ext.py", line 62, in run_command_async
    getlogger().debug('executing OS command: ' + cmd)
  File "/apps/escha/UES/jenkins/RH7.5/generic/easybuild/software/python-bare/3.6.8/lib/python3.6/logging/__init__.py", line 1630, in debug
    self.log(DEBUG, msg, *args, **kwargs)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/logging.py", line 477, in log
    super().log(level, msg, *args, **kwargs)
  File "/apps/escha/UES/jenkins/RH7.5/generic/easybuild/software/python-bare/3.6.8/lib/python3.6/logging/__init__.py", line 1673, in log
    msg, kwargs = self.process(msg, kwargs)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/logging.py", line 466, in process
    self._update_check_extras()
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/logging.py", line 445, in _update_check_extras
    if self.check.job.completion_time:
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/__init__.py", line 310, in completion_time
    return self.scheduler.completion_time(self) or self._completion_time
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 129, in completion_time
    self._completion_time = max(float(s.group('end')) for s in state_match)
  File "/scratch-shared/meteoswiss/scratch/jenscscs/reframe-ci-kesch-9039b57-3467/reframe/core/schedulers/slurm.py", line 129, in <genexpr>
    self._completion_time = max(float(s.group('end')) for s in state_match)
ValueError: could not convert string to float: 'Unknown'

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions