DM-27100: Add PanDA support #17

SergeyPod · 2020-12-14T21:51:15Z

No description provided.

timj · 2020-12-14T22:01:26Z

Please rebase to get rid of that bad merge commit you have there.

SergeyPod · 2020-12-15T21:24:02Z

Please rebase to get rid of that bad merge commit you have there.

done. Thank you.

timj

I've had a quick look and I have many comments. I'm not willing to approve this just yet because there is so much missing in terms of docstrings and code comments that it's hard to see what's really going on.

I'm also a bit concerned that there is LSST code referenced here that is being developed somewhere else that we can't see (DomaLSSTWork) so I'd like to have some insight into that (unless the "LSST" in the name is some fluke).

timj · 2021-01-05T20:09:58Z

python/lsst/ctrl/bps/wms/panda/conf_example/example_panda.yaml

+  bucket: "s3://ci_hsc_w_2020_48"
+  s3_endpoint_url: "https://storage.googleapis.com"
+  payload_folder: payload
+  singulatiry_prefix: '"cd /tmp;export HOME=/tmp;export S3_ENDPOINT_URL={s3_endpoint_url};export AWS_ACCESS_KEY_ID={aws_access_key}; export AWS_SECRET_ACCESS_KEY={aws_secret_access_key}; . /opt/lsst/software/stack/loadLSST.bash; setup lsst_distrib -t w_2020_48;'


This doesn't seem correct. "singulatiry"

Thanks. Fixed

The naming of the DomaLSSTWork class in iDDS will be changed in the next PIP release of that module. I'll update the BPS plugin accordingly once it appears.

timj · 2021-01-05T20:14:25Z

python/lsst/ctrl/bps/wms/panda/edgenode/sw_runner

@@ -0,0 +1,8 @@
+#!/bin/bash
+echo "I am in container"


You need to put some commentary in here explaining why it exists.

Added description

timj · 2021-01-05T20:14:41Z

python/lsst/ctrl/bps/wms/panda/edgenode/cmd_line_encoder.py

@@ -0,0 +1,6 @@
+#!/usr/bin/python
+import sys


Please add a docstring to let people know what this command is for.

timj · 2021-01-05T20:15:13Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

@@ -0,0 +1,126 @@
+class LSSTTask(object):


Never inherit from object in python3. Include a docstring. Convention requires this be LsstTask but I imagine RubinTask would also work.

Can this be a dataclass?

Reimplemented as a dataclass

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

timj · 2021-01-05T21:03:50Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+
+            # We take the commandline only from first job  because PanDA uses late binding and
+            # command line for each job in task is equal to each other in exception to the processing
+            # file name which is substitutes by PanDA


substituted?

timj · 2021-01-05T21:08:56Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+            task.step = task_step
+            task.name = self.define_task_name(task.step)
+            task.queue = self.computing_queue_himem if task_step in self.himem_tasks else self.computing_queue
+            task.lfns = list(jobs.keys())


if jobs is dict-like then list(jobs) is shorter and results in the same thing.

Thanks. Fixed.

timj · 2021-01-05T21:10:42Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+    def pick_non_init_cmdline(self):
+        for node_name in self.bps_workflow.nodes:
+            if node_name != 'pipetaskInit':
+                return self.bps_workflow.nodes[node_name]['job'].cmdline


This picks the first command line that isn't called pipeTaskInit?

I believe this is an over-optimized workaround for something ctrl_bps is currently doing. We should make a ctrl_bps ticket explaining exactly what PanDA needs/does to make sure those of us unfamiliar with PanDA don't make wrong assumptions. Currently this works. But there will be more jobs that do not share the same command line. And users currently are allowed to change the command based upon which PipelineTask is being executed, but so far no one has needed this. For example, turn on running with DEBUG logging only for assembleCoadd. Coming soon will be ability to use single full QuantumGraph for every job, but it means having different Quantum NodIds on each command line.

Perhaps add explicit return of None if can't find a command line?

timj · 2021-01-05T21:11:28Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+    def define_execution_command(self):
+        exec_str = ""
+        if self.bps_config.get("computing_queue") == 'docker':
+            pass


What's the code meant to be doing? Should the if be deleted?

Code cleaned.

timj · 2021-01-05T21:14:27Z

python/lsst/ctrl/bps/wms/panda/panda_service.py

+            Extra message for report command to print.  This could be pointers to documentation or
+            to WMS specific commands.
+        """
+        message = ""


Should probably document that this method does nothing.

Wondering why having it return None is better than the NotImplementedError? With None, bps report will always behave like there are no runs. I would rather have folks know that "bps report" doesn't work with PanDA yet as opposed to saying there's a bug in "bps report".

DM-28480: Update action to py3.8 and ignore W503

…f PanDA server as authorization proxy.

Update doc/conf.py to new version Add a :no-inherited-members: option to the automodapi directive in ctrl_bps/doc/lsst.ctrl.bps/index.rst to workaround bug. Since having to test building the docs, included minor text changes: Update pipeline syntax from ':' to '#' to match stack changes. Fix one missed capitalization change (qgraph_file to qgraphFile).

MichelleGower

I ran a pipeline using HTCondor and it still worked fine (as expected because of where the changes are). After rebasing latest changes on master, there is nothing that says this shouldn't be merged. There are various LSST guideline comments and a few other smaller changes that could be done before merging if time allows. There are a couple places we should follow up on for ctrl_bps tickets and some other longer term questions and comments.

MichelleGower · 2021-02-24T16:16:33Z

python/lsst/ctrl/bps/wms/panda/conf_example/example_panda.yaml

@@ -0,0 +1,43 @@
+operator: jdoe


Just comment for future: When we're past the prototyping period, if the wms plugins stay in ctrl_bps, I think this example should go in the ctrl_bps docs.

MichelleGower · 2021-02-24T16:18:53Z

python/lsst/ctrl/bps/wms/panda/conf_example/example_panda.yaml

+
+computing_queue: DOMA_LSST_GOOGLE_TEST
+computing_queue_himem: DOMA_LSST_GOOGLE_TEST_HIMEM
+himem_steps: ['makeWarp', 'assembleCoadd', 'deblend', 'measure', 'pipetaskInit']


Wasn't there a discussion about using requestMemory to determine which queue it should go to?

MichelleGower · 2021-02-24T16:36:44Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+    def pick_non_init_cmdline(self):
+        for node_name in self.bps_workflow.nodes:
+            if node_name != 'pipetaskInit':
+                return self.bps_workflow.nodes[node_name]['job'].cmdline


I believe this is an over-optimized workaround for something ctrl_bps is currently doing. We should make a ctrl_bps ticket explaining exactly what PanDA needs/does to make sure those of us unfamiliar with PanDA don't make wrong assumptions. Currently this works. But there will be more jobs that do not share the same command line. And users currently are allowed to change the command based upon which PipelineTask is being executed, but so far no one has needed this. For example, turn on running with DEBUG logging only for assembleCoadd. Coming soon will be ability to use single full QuantumGraph for every job, but it means having different Quantum NodIds on each command line.

MichelleGower · 2021-02-24T16:40:32Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+            Tasks filled with parameters provided in workflow configuration and generated pipeline.
+        """
+        tasks = []
+        raw_dependency_map = self.create_raw_jobs_dependency_map()


raw is an overloaded term in the LSST project. I was wondering why the code was assuming that the pipelines always started with raw images. If there's an equivalent term, it would be helpful to use it instead.

MichelleGower · 2021-02-24T16:42:27Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+class IDDSWorkflowGenerator:
+    """
+    Class generates a iDDS workflow to be submitted into PanDA. Workflow includes definition of each task and
+    definition of dependencies for each task input.


Is task PanDA's name for a compute job?

MichelleGower · 2021-02-24T17:55:49Z

python/lsst/ctrl/bps/wms/panda/panda_service.py

+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+


Missing module docstring

MichelleGower · 2021-02-24T17:59:27Z

python/lsst/ctrl/bps/wms/panda/panda_service.py

+        workflow.write(out_prefix)
+        return workflow
+
+    def convert_exec_string_to_hex(self, cmdline):


Why passing in self? And then why is this a class method?

MichelleGower · 2021-02-24T18:30:29Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+        nodes_from_edges = set(list(dependency_map.keys()))
+        extra_nodes = [node for node in all_nodes if node not in nodes_from_edges]
+        for node in extra_nodes:
+            dependency_map.setdefault(node, [])


Does this need to be a setdefault()?

MichelleGower · 2021-02-24T18:36:49Z

python/lsst/ctrl/bps/wms/panda/idds_tasks.py

+            tasks_dependency_map.setdefault(self.get_task_by_job_name(job), {})[file_name] = \
+                self.split_dependencies_by_tasks(dependency)
+            self.tasks_inputs.setdefault(self.define_task_name(
+                self.get_task_by_job_name(job)), []).append(file_local_src)


Lines would be shorter and easier to read if self.get_task_by_job_name(job) would get pulled out and run once. Think this is possible, but having problems understanding these two setdefault lines of code.

MichelleGower · 2021-02-24T21:13:24Z

python/lsst/ctrl/bps/wms/panda/panda_service.py

+            target = ButlerURI(cloud_prefix + '/' + os.path.basename(src_path))
+            target.transfer_from(src, transfer="copy")
+
+    def write(self, out_prefix):


If not implementing, should this just be removed (after removing call from prepare)?

…f PanDA server as authorization proxy.

…rl_bps into tickets/DM-27100

SergeyPod force-pushed the tickets/DM-27100 branch from 20c4779 to 70ace2c Compare December 15, 2020 21:09

Added initial implementation of the PanDA BPS plugin

b701a24

SergeyPod force-pushed the tickets/DM-27100 branch from 70ace2c to b701a24 Compare December 15, 2020 21:22

timj changed the title ~~Tickets/dm 27100~~ DM-27100: Add PanDA support Jan 4, 2021

SergeyPod requested a review from timj January 4, 2021 22:47

timj reviewed Jan 5, 2021

View reviewed changes

timj and others added 14 commits January 26, 2021 15:27

Switch action trigger

b15ff9f

Change action to py3.8

ded71e2

Switch to ignore W503

6954675

Merge pull request #18 from lsst/tickets/DM-28480

4087a9f

DM-28480: Update action to py3.8 and ignore W503

Implementation of sugggestions provided in code review. Added usage o…

a68ccfd

…f PanDA server as authorization proxy.

Shortened a string in cmd_line_decoder.py to pass check

ebca945

Code refactoring to pass check

02c7075

Code refactoring to pass check

0ca8222

Code polish according to received feedback

6ebb7da

Code polish according to linter feedback

34dc3f7

Change code to handle non-string attribute values.

ad1da3c

Modify per review comments.

034aae9

Merge branch 'tickets/DM-28929'

89fe1a4

MichelleGower approved these changes Feb 24, 2021

View reviewed changes

SergeyPod added 7 commits March 1, 2021 13:19

Added initial implementation of the PanDA BPS plugin

eab335a

Implementation of sugggestions provided in code review. Added usage o…

bdd8751

…f PanDA server as authorization proxy.

Shortened a string in cmd_line_decoder.py to pass check

e17b4bd

Code refactoring to pass check

8e8436b

Code refactoring to pass check

2a275f1

Code polish according to received feedback

65888f3

Code polish according to linter feedback

ab9d57a

Merge rebased branch 'tickets/DM-27100' of https://github.com/lsst/ct…

e49e227

…rl_bps into tickets/DM-27100

SergeyPod merged commit f221100 into master Mar 1, 2021

SergeyPod deleted the tickets/DM-27100 branch March 1, 2021 19:20

DM-27100: Add PanDA support #17

DM-27100: Add PanDA support #17

Conversation

SergeyPod commented Dec 14, 2020

timj commented Dec 14, 2020

SergeyPod commented Dec 15, 2020

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichelleGower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment