New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43299: changes to evaluate RequestMemory expressions #26
Conversation
print("Auto: No Large BPS Jobs.") | ||
return | ||
except Exception as exc: | ||
raise type(exc)("Problem querying condor schedd for jobs") from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move lines 219-244 out of the try-except statement. I don't see how anything in these lines could go wrong, i.e., raise an exception. In other words, it looks like only condor_q()
call is worth checking for possible exceptions.
condorq_large = [] | ||
condorq_small = [] | ||
schedd_name = list(condorq_data.keys())[0] | ||
condorq_full = condorq_data[schedd_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 259 and 260 could be replaced by a single statement:
schedd_name, condor_full = condorq_data.popitem()
condorq_full = condorq_data[schedd_name] | ||
|
||
print("Auto: Search for Large htcondor jobs.") | ||
for jid in list(condorq_full.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to create a list (and use keys()
) when iterating over dictionary keys in a for loop.
for jid in condorq_full:
...
should work just fine and will be more memory efficient. Though in this particular case
for jid, ajob in condorq_full.items():
...
will work even better for assigning values to jid
and ajob
variables.
this_list.append(ajob) | ||
|
||
for job_label in unique_labels: | ||
print(f"Making an evaluation {thisEvalMemory}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we can't always add the RequestMemoryEval
(i.e. the attribute representing numerical value of the requested memory) to the job's ClassAd regardless whether the RequestMemory
is an integer or a ClassAd expression? It looks like doing so would allow us to avoid rechecking what RequestMemory
really is in other places (lines 289-293 and 342-346).
return | ||
|
||
def countIdleSlurmJobs(self, jobname): | ||
"""Check Slurm queue for Idle Glideins""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it looks like a newly added method please include Parameters and Returns sections in its docstring. The same goes for countRunningSlurmJobs()
below.
Also, it looks like these two methods could be static as neither of them is accessing/modifying instance members.
Finally, countIdleSlurmJobs()
and countRunningSlurmJobs()
are practically identical. Can't we replace them by a single, more generic one by introducing additional parameter, say jobstates
, and rewriting the Slurm query as f"squeue --noheader --states={jobstates} --name={jobname} | wc -l"
?
Even if you prefer to have "convenience" methods for counting Slurm jobs at the given state, they could be just appropriate calls of such a generic function. For example
@staticmethod
def countIdleSlurmJobs(jobname):
print(f"Checking if idle Slurm job {jobname} exists:")
return self.countSlurmJob(jobname, jobstates="PD")
@staticmethod
def countRunningSlurmJobs(jobname):
print(f"Checking if running Slurm job {jobname} exists:")
return self.countSlurmJob(jobnames)
@staticmethod
def countSlurmJobs(jobname, jobstates="R"):
batcmd = f"squeue --noheader --states={jobstates} --name={jobname} | wc -l"
print(f"The squeue command is {batcmd}")
time.sleep(3)
try:
resultPD = subprocess.check_output(batcmd, shell=True)
except subprocess.CalledProcessError as e:
print(e.output)
numberOfJobs = int(resultPD.decode("UTF-8"))
return numberOfJobs
if verbose: | ||
print(f"{job_label} reduced {numberOfGlideinsReduced}") | ||
|
||
print(f"jobname {jobname}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a "leftover" after some debugging session. If so, please remove. Otherwise, I'd only print it if verbose
is True
as this piece of information may have little use for a regular user.
if maxNumberOfGlideins > maxAllowedNumberOfGlideins: | ||
maxNumberOfGlideins = maxAllowedNumberOfGlideins | ||
print("Reducing Small Glidein limit due to threshold.") | ||
# initialize counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, instead of "singling out" this fairly obvious fact, a comment briefly describing what is the purpose of this section of the code (lines 339-360) would be much more useful here.
Also, I think the readability of the entire section could be improved by small adjustments of the variables names. For example, I'd use:
requested(Cpus|Memory)
(or similar) instead ofthis(Cpus|Memory)
,neededCpus
(or similar) instead ofthisRatio
.
print("smallGlideins: Reducing due to threshold.") | ||
print( | ||
f"smallGlideins: Number of Glideins to submit is {numberOfGlideinsReduced}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong, but Judging by earlier code, it feels like all these messages (lines 377 - 399) should be printed out only if verbose
is True
.
No description provided.