-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tasks: addition of MQ publisher class #17
Conversation
Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>
* ADD publisher class with reconnection strategy based on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq Signed-off-by: Dinos Kousidis <dinos.kousidis@cern.ch>
except pika.exceptions.ConnectionClosed: | ||
logging.debug('Publisher: ConnectionClosed, reconnecting.') | ||
print('Publisher: ConnectionClosed, reconnecting.') | ||
self.connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it enough to try to reconnect only twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it worked on my cluster.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cwl-default-worker-8494bdcfc4-ckpg8 1/1 Running 2 7m
db-54bd479dd8-pbndf 1/1 Running 0 7m
job-controller-65f87df9df-kzmcs 1/1 Running 0 8m
message-broker-69ff599898-qqcl9 1/1 Running 0 8m
serial-default-worker-5788fb67f5-v96fc 1/1 Running 0 7m
server-645485b68-tprtg 1/1 Running 2 8m
wdb-8b76ff44b-f8gzn 1/1 Running 0 7m
workflow-controller-76d7b87887-7kblz 2/2 Running 2 8m
workflow-monitor-84bcfd65c7-wvjt8 1/1 Running 0 8m
yadage-default-worker-7d5dbf956d-mls49 1/1 Running 0 7m
zeromq-msg-proxy-6c4f6bd6b-2h4lc 1/1 Running 0 7m
The fact that the server pod restarts may be a bit annoying, since it means that the endpoint may change and that people have to do eval $(reana-cluster env)
twice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegodelemos good question, the reconnection attempt will happen for each outgoing message if necessary though, so the engine shouldn't get completely disconnected. Still, it is possible that if a specific update gets lost the count is wrong from that point on. Also, by adding a while not connected, retry
loop, we risk getting stuck workflows, because of issues in the message broker.
Confirmed working with:
All three workflow engines (CWL Serial, Yadage) report progress well 👍 albeit we should think of updating the messages and the client parsing display:
$ reana-client workflow status -w workflow.1
NAME RUN_NUMBER CREATED STATUS PROGRESS COMMAND
workflow 1 2018-06-28T08:56:58 finished 2/2 root -b -q '../code/fitdata.C(\"data.root\",\"plot.png\")'
reana-client workflow status -w workflow.3
NAME RUN_NUMBER CREATED STATUS PROGRESS COMMAND
workflow 3 2018-06-28T08:57:58 finished 2/2 sh -c 'echo cm9vdCAtYiAtcSAnL2RhdGEvMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwL2FuYWx5c2VzLzAwYThlOWUyLWJkNjktNGFkYy04ZmE3LWIxNzkwZWMwM2E2Ni95YWRhZ2VfaW5wdXRzL2NvZGUvZml0ZGF0YS5DKCIvZGF0YS8wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvYW5hbHlzZXMvMDBhOGU5ZTItYmQ2OS00YWRjLThmYTctYjE3OTBlYzAzYTY2L3dvcmtzcGFjZS9nZW5kYXRhL2RhdGEucm9vdCIsIi9kYXRhLzAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMC9hbmFseXNlcy8wMGE4ZTllMi1iZDY5LTRhZGMtOGZhNy1iMTc5MGVjMDNhNjYvd29ya3NwYWNlL2ZpdGRhdGEvcGxvdC5wbmciKSc=|base64 -d|sh'
|
on https://stackoverflow.com/questions/35193335/how-to-reconnect-to-rabbitmq
Addresses reanahub/reana-workflow-controller#102
Depends on reanahub/reana-workflow-controller#101