DM-43299: changes to evaluate RequestMemory expressions #26

daues · 2024-03-27T13:42:55Z

No description provided.

mxk62 · 2024-04-05T02:09:19Z

python/lsst/ctrl/execute/slurmPlugin.py

-                print("Auto: No Large BPS Jobs.")
-                return
+        except Exception as exc:
+            raise type(exc)("Problem querying condor schedd for jobs") from None


I'd move lines 219-244 out of the try-except statement. I don't see how anything in these lines could go wrong, i.e., raise an exception. In other words, it looks like only condor_q() call is worth checking for possible exceptions.

mxk62 · 2024-04-05T02:20:31Z

python/lsst/ctrl/execute/slurmPlugin.py

+        condorq_large = []
+        condorq_small = []
+        schedd_name = list(condorq_data.keys())[0]
+        condorq_full = condorq_data[schedd_name]


Lines 259 and 260 could be replaced by a single statement:

schedd_name, condor_full = condorq_data.popitem()

mxk62 · 2024-04-05T02:24:54Z

python/lsst/ctrl/execute/slurmPlugin.py

+        condorq_full = condorq_data[schedd_name]
+
+        print("Auto: Search for Large htcondor jobs.")
+        for jid in list(condorq_full.keys()):


There's no need to create a list (and use keys()) when iterating over dictionary keys in a for loop.

for jid in condorq_full: ...

should work just fine and will be more memory efficient. Though in this particular case

for jid, ajob in condorq_full.items(): ...

will work even better for assigning values to jid and ajob variables.

mxk62 · 2024-04-05T02:58:34Z

python/lsst/ctrl/execute/slurmPlugin.py

-                this_list.append(ajob)
-
-            for job_label in unique_labels:
+                    print(f"Making an evaluation {thisEvalMemory}")


Is there a reason why we can't always add the RequestMemoryEval (i.e. the attribute representing numerical value of the requested memory) to the job's ClassAd regardless whether the RequestMemory is an integer or a ClassAd expression? It looks like doing so would allow us to avoid rechecking what RequestMemory really is in other places (lines 289-293 and 342-346).

mxk62 · 2024-04-05T03:36:04Z

python/lsst/ctrl/execute/slurmPlugin.py

+        return
+
+    def countIdleSlurmJobs(self, jobname):
+        """Check Slurm queue for Idle Glideins"""


As it looks like a newly added method please include Parameters and Returns sections in its docstring. The same goes for countRunningSlurmJobs() below.

Also, it looks like these two methods could be static as neither of them is accessing/modifying instance members.

Finally, countIdleSlurmJobs() and countRunningSlurmJobs() are practically identical. Can't we replace them by a single, more generic one by introducing additional parameter, say jobstates, and rewriting the Slurm query as f"squeue --noheader --states={jobstates} --name={jobname} | wc -l"?

Even if you prefer to have "convenience" methods for counting Slurm jobs at the given state, they could be just appropriate calls of such a generic function. For example

@staticmethod def countIdleSlurmJobs(jobname): print(f"Checking if idle Slurm job {jobname} exists:") return self.countSlurmJob(jobname, jobstates="PD") @staticmethod def countRunningSlurmJobs(jobname): print(f"Checking if running Slurm job {jobname} exists:") return self.countSlurmJob(jobnames) @staticmethod def countSlurmJobs(jobname, jobstates="R"): batcmd = f"squeue --noheader --states={jobstates} --name={jobname} | wc -l" print(f"The squeue command is {batcmd}") time.sleep(3) try: resultPD = subprocess.check_output(batcmd, shell=True) except subprocess.CalledProcessError as e: print(e.output) numberOfJobs = int(resultPD.decode("UTF-8")) return numberOfJobs

mxk62 · 2024-04-05T12:27:16Z

python/lsst/ctrl/execute/slurmPlugin.py

-                if verbose:
-                    print(f"{job_label} reduced {numberOfGlideinsReduced}")
-
+                print(f"jobname {jobname}")


Looks like a "leftover" after some debugging session. If so, please remove. Otherwise, I'd only print it if verbose is True as this piece of information may have little use for a regular user.

mxk62 · 2024-04-05T12:55:04Z

python/lsst/ctrl/execute/slurmPlugin.py

+            if maxNumberOfGlideins > maxAllowedNumberOfGlideins:
+                maxNumberOfGlideins = maxAllowedNumberOfGlideins
+                print("Reducing Small Glidein limit due to threshold.")
+            # initialize counter


IMHO, instead of "singling out" this fairly obvious fact, a comment briefly describing what is the purpose of this section of the code (lines 339-360) would be much more useful here.

Also, I think the readability of the entire section could be improved by small adjustments of the variables names. For example, I'd use:

requested(Cpus|Memory) (or similar) instead of this(Cpus|Memory),

neededCpus (or similar) instead of thisRatio.

mxk62 · 2024-04-05T13:23:17Z

python/lsst/ctrl/execute/slurmPlugin.py

+                print("smallGlideins: Reducing due to threshold.")
+            print(
+                f"smallGlideins: Number of Glideins to submit is {numberOfGlideinsReduced}"
+            )


I may be wrong, but Judging by earlier code, it feels like all these messages (lines 377 - 399) should be printed out only if verbose is True.

daues added 2 commits March 27, 2024 06:41

changes to evaluate RequestMemory expressions

b9474ab

python formatting

e090c48

mxk62 approved these changes Apr 5, 2024

View reviewed changes

daues added 2 commits April 10, 2024 08:52

updates in response to review of pr 26

23d21b3

comment one character too long

eb27653

daues merged commit 2e90341 into main Apr 10, 2024
3 checks passed

daues deleted the tickets/DM-43299 branch April 10, 2024 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-43299: changes to evaluate RequestMemory expressions #26

DM-43299: changes to evaluate RequestMemory expressions #26

daues commented Mar 27, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024

mxk62 Apr 5, 2024 •

edited

DM-43299: changes to evaluate RequestMemory expressions #26

DM-43299: changes to evaluate RequestMemory expressions #26

Conversation

daues commented Mar 27, 2024

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024

Choose a reason for hiding this comment

mxk62 Apr 5, 2024 • edited

Choose a reason for hiding this comment

mxk62 Apr 5, 2024 •

edited