tasks: addition of MQ publisher class #17

dinosk · 2018-06-27T12:43:22Z

ADD publisher class with reconnection strategy based
on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq

Addresses reanahub/reana-workflow-controller#102
Depends on reanahub/reana-workflow-controller#101

Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>

* ADD publisher class with reconnection strategy based on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>

diegodelemos · 2018-06-28T07:23:13Z

reana_workflow_engine_serial/publisher.py

+        except pika.exceptions.ConnectionClosed:
+            logging.debug('Publisher: ConnectionClosed, reconnecting.')
+            print('Publisher: ConnectionClosed, reconnecting.')
+            self.connect()


Is it enough to try to reconnect only twice?

FWIW it worked on my cluster.

$ kubectl get pods NAME READY STATUS RESTARTS AGE cwl-default-worker-8494bdcfc4-ckpg8 1/1 Running 2 7m db-54bd479dd8-pbndf 1/1 Running 0 7m job-controller-65f87df9df-kzmcs 1/1 Running 0 8m message-broker-69ff599898-qqcl9 1/1 Running 0 8m serial-default-worker-5788fb67f5-v96fc 1/1 Running 0 7m server-645485b68-tprtg 1/1 Running 2 8m wdb-8b76ff44b-f8gzn 1/1 Running 0 7m workflow-controller-76d7b87887-7kblz 2/2 Running 2 8m workflow-monitor-84bcfd65c7-wvjt8 1/1 Running 0 8m yadage-default-worker-7d5dbf956d-mls49 1/1 Running 0 7m zeromq-msg-proxy-6c4f6bd6b-2h4lc 1/1 Running 0 7m

The fact that the server pod restarts may be a bit annoying, since it means that the endpoint may change and that people have to do eval $(reana-cluster env) twice...

@diegodelemos good question, the reconnection attempt will happen for each outgoing message if necessary though, so the engine shouldn't get completely disconnected. Still, it is possible that if a specific update gets lost the count is wrong from that point on. Also, by adding a while not connected, retry loop, we risk getting stuck workflows, because of issues in the message broker.

tiborsimko · 2018-06-28T09:02:19Z

Confirmed working with:

reana-job-controller @ master
reana-message-broker @ master
reana-server @ master
reana-workflow-controller @ pr-101
reana-workflow-engine-cwl @ master
reana-workflow-engine-serial @ pr-17
reana-workflow-engine-yadage @ master
reana-workflow-monitor @ master

All three workflow engines (CWL Serial, Yadage) report progress well 👍 albeit we should think of updating the messages and the client parsing display:

serial:

$ reana-client workflow status -w workflow.1
NAME       RUN_NUMBER   CREATED               STATUS     PROGRESS   COMMAND                                                   
workflow   1            2018-06-28T08:56:58   finished   2/2        root -b -q '../code/fitdata.C(\"data.root\",\"plot.png\")'

cwl:

$ reana-client workflow status -w workflow.2
NAME       RUN_NUMBER   CREATED               STATUS     PROGRESS   COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
workflow   2            2018-06-28T08:57:32   finished   2/2        /bin/sh -c 'export PATH="/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";export TMPDIR="/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/tmpdir/_A33Un";export HOME="/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir";cp -a /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/outdir/w_pMyc/data.root /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/data.root ;cp -a /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/inputs/fitdata.C /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/fitdata.C ;mkdir -p /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir && cd /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir && root -b -q '"'"'fitdata.C("data.root","/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/plot.png")'"'"'; cp -r /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/* /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/outdir/MDga4P'

yadage

reana-client workflow status -w workflow.3
NAME       RUN_NUMBER   CREATED               STATUS     PROGRESS   COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
workflow   3            2018-06-28T08:57:58   finished   2/2        sh -c 'echo cm9vdCAtYiAtcSAnL2RhdGEvMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwL2FuYWx5c2VzLzAwYThlOWUyLWJkNjktNGFkYy04ZmE3LWIxNzkwZWMwM2E2Ni95YWRhZ2VfaW5wdXRzL2NvZGUvZml0ZGF0YS5DKCIvZGF0YS8wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvYW5hbHlzZXMvMDBhOGU5ZTItYmQ2OS00YWRjLThmYTctYjE3OTBlYzAzYTY2L3dvcmtzcGFjZS9nZW5kYXRhL2RhdGEucm9vdCIsIi9kYXRhLzAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMC9hbmFseXNlcy8wMGE4ZTllMi1iZDY5LTRhZGMtOGZhNy1iMTc5MGVjMDNhNjYvd29ya3NwYWNlL2ZpdGRhdGEvcGxvdC5wbmciKSc=|base64 -d|sh'

tasks: publish accumulative progress reports

255997f

Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>

dinosk added Status: in review labels Jun 27, 2018

dinosk mentioned this pull request Jun 27, 2018

MQ: reconnection strategy reanahub/reana-workflow-controller#102

Closed

global: addition of MQ publisher class

1fd8672

* ADD publisher class with reconnection strategy based on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>

dinosk force-pushed the MQ-publisher branch from 3e77048 to 1fd8672 Compare June 27, 2018 16:16

dinosk removed the Status: in work label Jun 27, 2018

dinosk requested review from diegodelemos and tiborsimko June 27, 2018 16:20

diegodelemos reviewed Jun 28, 2018

View reviewed changes

tiborsimko added this to the Internal-Consolidation milestone Jun 28, 2018

tiborsimko approved these changes Jun 28, 2018

View reviewed changes

lukasheinrich mentioned this pull request Jun 28, 2018

submission: auxiliary one-off data as files mounted into job container reanahub/reana-job-controller#45

Open

tiborsimko merged commit 1fd8672 into reanahub:master Jun 28, 2018

tiborsimko removed the Status: in review label Jun 28, 2018

This was referenced Jun 29, 2018

cwl: nicer progress command reanahub/reana-client#125

Closed

yadage: nicer progress command reanahub/reana-client#126

Closed

tiborsimko added this to Done in Internal-Consolidation Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tasks: addition of MQ publisher class #17

tasks: addition of MQ publisher class #17

dinosk commented Jun 27, 2018 •

edited

diegodelemos Jun 28, 2018

tiborsimko Jun 28, 2018

dinosk Jun 28, 2018

tiborsimko commented Jun 28, 2018

tasks: addition of MQ publisher class #17

tasks: addition of MQ publisher class #17

Conversation

dinosk commented Jun 27, 2018 • edited

diegodelemos Jun 28, 2018

Choose a reason for hiding this comment

tiborsimko Jun 28, 2018

Choose a reason for hiding this comment

dinosk Jun 28, 2018

Choose a reason for hiding this comment

tiborsimko commented Jun 28, 2018

dinosk commented Jun 27, 2018 •

edited