-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coexecution jobs don't work with non-default managers #313
Comments
Ok, confirmed, if I configure the following in the config file referenced by ---
message_queue_url: amqp://user:pass@host:5671//main_pulsar?ssl=1
managers:
tacc_k8s:
type: coexecution
monitor: background Which is maybe not ideal but at least there's a workaround without the change in #314. |
I've tried to get away from having to define managers at all in there and I'd like to stick with the default manager - I think #315 should fix that? You can just define |
The type coexecution and background things shouldn't be needed anymore either - you should just use the PulsarCoexecutionJobRunner. |
Okay @natefoo - I think this is all ready to go on the dev side - the TL;DR:
|
This looks perfect, thanks, I'll try it out. Yes, the reason I had a named manager was for AMQP purposes, having a way to specify the AMQP exchange but still use the default manager seems to me like the best solution for the coexecution case where named managers don't make sense. |
In the absence of news I assume this is working now, but please re-open if that's not the case @natefoo |
A non-default manager is necessary in this case because it's how we route messages to the correct Pulsar via AMQP.
Upon upgrading to usegalaxy.org, 23.0, Pulsar Kubernetes runner jobs are failing with the following error (where
tacc_k8s
is the manager defined in the runner plugin and is the destination id:usegalaxy.org had been running a custom coexecution image and unfortunately I no longer have any recollection as to why or how that image was built. But the version of Pulsar appears to be from somewhere around d9b7102 and I can't see any differences that suggest I manually hacked a fix. But in local testing I can reproduce this error all the way back to very old 0.14.x versions, so I have been unable to figure out how this was working until now. This image might still work if not for the fact that the client is adding
--wait
to thepulsar-submit
args and the version ofpulsar-submit
in that image doesn't have--wait
.It may be possible/correct to set a pulsar app config on the Galaxy side that explicitly defines the
tacc_k8s
manager. The app conf generated forpulsar-submit
's--app_conf_base64
option for these jobs contains:I just wish I understood how this worked before.
The text was updated successfully, but these errors were encountered: