New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zombie scp processes and failed jobs #2651
Comments
Looking to the release notes from 2.7.3 to 2.8.3, found two commits relating to ssh. Not sure if relevant. |
it looks like you are using the |
No, I'm not using a plugin, I'm using the "service.FileCopier.default.provider": "script-copy",
"service.NodeExecutor.default.provider": "script-exec",
"plugin.script-copy.default.command": "/usr/bin/scp -q -o StrictHostKeyChecking=no ${file-copy.file} ${node.username}@${node.hostname}:${file-copy.destination}",
"plugin.script-copy.default.remote-filepath": "/tmp/${file-copy.filename}",
"plugin.script-exec.default.command": "/usr/bin/ssh -q -o StrictHostKeyChecking=no ${node.username}@${node.hostname} ${exec.command}" |
I continued testing this in 2.7.3 and it's also reproducible, just less likely to happen. |
ok this is possibly a bug in the script-copy/exec plugin |
Same issue here when we upgraded from 2.6.9-1 to 2.9.2-1-GA. We got 5k to 10k ssh zombies after 10 days of normal work (cronjobs, deployment jobs...). |
Had time for some deeper testing through different rundeck releases - 2.7.3, through 2.8.* to 2.9.3 and found out this is not a regression, I was just confused by a different issue of rundeck 2.9 - #2756 For our 12-cpu instance, the limit of zombie creation remains the same - something between 650 and 700 target nodes being acessed simulaneously ( |
@pbenas do you have a minimum heap size setting, e.g. |
@gschueler Yes, we run rundeck with |
In an effort to focus on bugs and issues that impact currently supported versions of Rundeck, we have elected to notify GitHub issue creators if their issue is classified as stale and close the issue. An issue is identified as stale when there have been no new comments, responses or other activity within the last 12 months. If a closed issue is still present please feel free to open a new Issue against the current version and we will review it. If you are an enterprise customer, please contact your Rundeck Support to assist in your request. |
Issue type: Bug report
My Rundeck detail
Expected Behavior
No failed jobs, no zombie SCP processes.
Actual Behavior
Rundeck executes the scp, then fails the job. Couple of zombie scp processes remains.
Digging through the logs, I've found following two errors:
and
How to reproduce Behavior
Not reproducible when running over smaller sets of nodes. Our instance has 12 CPUs and 72G of RAM. Short term load reaches small, single-digit numbers in peaks.
The text was updated successfully, but these errors were encountered: