Skip to content

Conversation

@teojgo
Copy link
Contributor

@teojgo teojgo commented May 25, 2018

  • Reduce the squeue rate by introducing the constant
    SACCT_SQUEUE_RATIO.

  • Fix some warnings regarding regexes.

Fixes #283

* Reduce the `squeue` rate by introducing the constant
  `SACCT_SQUEUE_RATIO`.

* Fix some warnings regarding regexes.
Copy link
Contributor

@victorusu victorusu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

'ReqNodeNotAvail', # Inaccurate SLURM doc
'QOSUsageThreshold']
self._is_cancelling = False
self._update_state_count = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it state or status?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorusu I think 'state' is fine.

self._state = SlurmJobState(state_match.group('state'))
self._cancel_if_blocked()

if self._update_state_count == SACCT_SQUEUE_RATIO:
Copy link
Contributor

@vkarak vkarak May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply do a modulo operation here:

if not self._update_state_count % SACCT_SQUEUE_RATIO:
    self._cancel_if_blocked()

SLURM_JOB_TIMEOUT = SlurmJobState('TIMEOUT')

# Number of _update_state calls per which _cancel_if_blocked is called
SACCT_SQUEUE_RATIO = 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this constant should be defined inside the SlurmJob class as class attribute, since it is very specific to this implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorusu Should we start with 10 and see if ops are happy?

SLURM_JOB_SUSPENDED = SlurmJobState('SUSPENDED')
SLURM_JOB_TIMEOUT = SlurmJobState('TIMEOUT')

# Number of _update_state calls per which _cancel_if_blocked is called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this comment very much; it contradicts with the name of the variable. If we use the SACCT_SQUEUE_RATIO as a name, I'd like to see a comment describing the rationale behind this instead, because what this variable controls is obvious from its name.

@vkarak vkarak changed the title Introduce SACCT_SQUEUE_RATIO to reduce squeue calls Add a knob for controlling the 'sacct'/'squeue' ratio in Slurm 'sacct' backend. May 28, 2018
@vkarak vkarak changed the title Add a knob for controlling the 'sacct'/'squeue' ratio in Slurm 'sacct' backend. Add a knob for controlling the 'sacct'/'squeue' ratio in Slurm 'sacct' backend May 28, 2018
@vkarak vkarak merged commit 3976f0e into master May 28, 2018
@vkarak vkarak deleted the enhancement/reduce_squeue_ratio branch May 28, 2018 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants