Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks: addition of MQ publisher class #17

Merged
merged 2 commits into from
Jun 28, 2018

Conversation

dinosk
Copy link
Member

@dinosk dinosk commented Jun 27, 2018

Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>
* ADD publisher class with reconnection strategy based
  on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq

Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>
except pika.exceptions.ConnectionClosed:
logging.debug('Publisher: ConnectionClosed, reconnecting.')
print('Publisher: ConnectionClosed, reconnecting.')
self.connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to try to reconnect only twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it worked on my cluster.

$ kubectl get pods
NAME                                     READY     STATUS    RESTARTS   AGE
cwl-default-worker-8494bdcfc4-ckpg8      1/1       Running   2          7m
db-54bd479dd8-pbndf                      1/1       Running   0          7m
job-controller-65f87df9df-kzmcs          1/1       Running   0          8m
message-broker-69ff599898-qqcl9          1/1       Running   0          8m
serial-default-worker-5788fb67f5-v96fc   1/1       Running   0          7m
server-645485b68-tprtg                   1/1       Running   2          8m
wdb-8b76ff44b-f8gzn                      1/1       Running   0          7m
workflow-controller-76d7b87887-7kblz     2/2       Running   2          8m
workflow-monitor-84bcfd65c7-wvjt8        1/1       Running   0          8m
yadage-default-worker-7d5dbf956d-mls49   1/1       Running   0          7m
zeromq-msg-proxy-6c4f6bd6b-2h4lc         1/1       Running   0          7m

The fact that the server pod restarts may be a bit annoying, since it means that the endpoint may change and that people have to do eval $(reana-cluster env) twice...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegodelemos good question, the reconnection attempt will happen for each outgoing message if necessary though, so the engine shouldn't get completely disconnected. Still, it is possible that if a specific update gets lost the count is wrong from that point on. Also, by adding a while not connected, retry loop, we risk getting stuck workflows, because of issues in the message broker.

@tiborsimko tiborsimko added this to the Internal-Consolidation milestone Jun 28, 2018
@tiborsimko
Copy link
Member

Confirmed working with:

  • reana-job-controller @ master
  • reana-message-broker @ master
  • reana-server @ master
  • reana-workflow-controller @ pr-101
  • reana-workflow-engine-cwl @ master
  • reana-workflow-engine-serial @ pr-17
  • reana-workflow-engine-yadage @ master
  • reana-workflow-monitor @ master

All three workflow engines (CWL Serial, Yadage) report progress well 👍 albeit we should think of updating the messages and the client parsing display:

  • serial:
$ reana-client workflow status -w workflow.1
NAME       RUN_NUMBER   CREATED               STATUS     PROGRESS   COMMAND                                                   
workflow   1            2018-06-28T08:56:58   finished   2/2        root -b -q '../code/fitdata.C(\"data.root\",\"plot.png\")'
  • cwl:
$ reana-client workflow status -w workflow.2

workflow   2            2018-06-28T08:57:32   finished   2/2        /bin/sh -c 'export PATH="/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";export TMPDIR="/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/tmpdir/_A33Un";export HOME="/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir";cp -a /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/outdir/w_pMyc/data.root /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/data.root ;cp -a /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/inputs/fitdata.C /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/fitdata.C ;mkdir -p /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir && cd /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir && root -b -q '"'"'fitdata.C("data.root","/data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/plot.png")'"'"'; cp -r /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/docker_outdir/* /data/00000000-0000-0000-0000-000000000000/analyses/291323df-feb5-44ce-a36b-7067e40f80ff/workspace/cwl/outdir/MDga4P' 
  • yadage
reana-client workflow status -w workflow.3
NAME       RUN_NUMBER   CREATED               STATUS     PROGRESS   COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
workflow   3            2018-06-28T08:57:58   finished   2/2        sh -c 'echo cm9vdCAtYiAtcSAnL2RhdGEvMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwL2FuYWx5c2VzLzAwYThlOWUyLWJkNjktNGFkYy04ZmE3LWIxNzkwZWMwM2E2Ni95YWRhZ2VfaW5wdXRzL2NvZGUvZml0ZGF0YS5DKCIvZGF0YS8wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvYW5hbHlzZXMvMDBhOGU5ZTItYmQ2OS00YWRjLThmYTctYjE3OTBlYzAzYTY2L3dvcmtzcGFjZS9nZW5kYXRhL2RhdGEucm9vdCIsIi9kYXRhLzAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMC9hbmFseXNlcy8wMGE4ZTllMi1iZDY5LTRhZGMtOGZhNy1iMTc5MGVjMDNhNjYvd29ya3NwYWNlL2ZpdGRhdGEvcGxvdC5wbmciKSc=|base64 -d|sh'  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants