New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh Project Nodes does not allways work #3967
Comments
I believe I'm encountering a similar issue, and after discussion with #rundeck on freenode, I was directed here. Describe the bug
Immediately after the job fails with the above error, I can go to project B's node list in the web UI and confirm that My Rundeck detail
To Reproduce
Expected behavior Screenshots I was able to recreate this issue with all jobs in the same project. |
Hi @n-cc I found something with an example to reproduce (three jobs): JobA (that adds a new server on Ansible node sources). <joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>2f71b2c0-991b-4191-9373-feec9c8c7514</id>
<loglevel>INFO</loglevel>
<name>JobA</name>
<nodeFilterEditable>false</nodeFilterEditable>
<plugins />
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<exec>echo "updating..."</exec>
</command>
<command>
<fileExtension>.sh</fileExtension>
<script><![CDATA[echo -e "\n192.168.33.22" >> /home/user/Downloads/hosts]]></script>
<scriptargs />
<scriptinterpreter>/bin/bash</scriptinterpreter>
</command>
<command>
<step-plugin type='source-refresh-plugin'>
<configuration>
<entry key='sleep' value='15' />
</configuration>
</step-plugin>
</command>
<command>
<exec>echo "done"</exec>
</command>
</sequence>
<uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
</job>
</joblist> JobB (that executes a command in the future new node). <joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<dispatch>
<excludePrecedence>true</excludePrecedence>
<keepgoing>false</keepgoing>
<rankOrder>ascending</rankOrder>
<successOnEmptyNodeFilter>false</successOnEmptyNodeFilter>
<threadcount>1</threadcount>
</dispatch>
<executionEnabled>true</executionEnabled>
<id>7a616d10-bf61-4953-a09e-d294287269c4</id>
<loglevel>INFO</loglevel>
<name>JobB</name>
<nodeFilterEditable>false</nodeFilterEditable>
<nodefilters>
<filter>192.168.33.22</filter>
</nodefilters>
<nodesSelectedByDefault>true</nodesSelectedByDefault>
<plugins />
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<exec>echo "hi"</exec>
</command>
</sequence>
<uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
</job>
</joblist> ParentJob (that call JobA and JobB). <joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>5ff04e74-156c-4525-a666-76d3e5d30e72</id>
<loglevel>INFO</loglevel>
<name>Parent</name>
<nodeFilterEditable>false</nodeFilterEditable>
<plugins />
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<jobref name='JobA' nodeStep='true'>
<uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
</jobref>
</command>
<command>
<exec>echo "hi"</exec>
</command>
<command>
<jobref name='JobB' nodeStep='true'>
<uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
</jobref>
</command>
</sequence>
<uuid>5ff04e74-156c-4525-a666-76d3e5d30e72</uuid>
</job>
</joblist> Executing that parent job I obtained the same error reported: But putting a single "sleep 10" in the parent job before the JobB execution, it works like a charm. (of course, keep in mind that the 192.168.33.22 node doesn't exist before the execution of JobB) <joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>5ff04e74-156c-4525-a666-76d3e5d30e72</id>
<loglevel>INFO</loglevel>
<name>Parent</name>
<nodeFilterEditable>false</nodeFilterEditable>
<plugins />
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<jobref name='JobA' nodeStep='true'>
<uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
</jobref>
</command>
<command>
<exec>sleep 10</exec>
</command>
<command>
<jobref name='JobB' nodeStep='true'>
<uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
</jobref>
</command>
</sequence>
<uuid>5ff04e74-156c-4525-a666-76d3e5d30e72</uuid>
</job>
</joblist> Result: Perhaps the time of the "Refresh Project Nodes" step needs some adjustments to work correctly. Workaround: put a "sleep 10" before the step that points to the new remote node. Hope it helps! |
Thanks for the tip - looks like the sleep allows the node to be found when all jobs are in the same project, but I'm still having issues getting it to work cross-project. I'll do a bit more testing. |
I'm not sure what changed on my side, but the local sleep is no longer working for new nodes where all jobs are within the same project. |
A bit more insight to the problem - we are using the ansible plugin, and in the project's Nodes -> Sources tab, we have an Ansible Resource Model source with the inventory file path set to a local directory on the rundeck server. Inside of this directory is a script that ansible runs to grab a list of remote hosts/nodes (scripts in inventory directories get ran automatically, and the returned hosts are added to the inventory). The plugin parses this output to create the project's list of nodes. The job I'm testing is simple - it runs a command to create a new node on a remote host. I can run the script inside the inventory directory to confirm the new node is immediately showing up. It then refreshes the project nodes, which in the log output generates a bunch of these lines:
The newly created node DOES show up in this output, but after sleeping for X minutes (see data below on failure rates), rundeck fails with the "No nodes matched for the filters..." error when it attempts to contact the node. In short: Create node -> refresh project nodes -> sleep X -> attempt to contact node Here are my results of testing different sleep values after the refresh project nodes step (a success means it could contact the node and run steps, failure means it hit the "No nodes matched for the filters" error): 10 second sleep: 3 out of 8 jobs succeeded 3 minute sleep: 3 out of 8 jobs succeeded 10 minute sleep: 12 out of 12 succeeded Given the consistent success of jobs that sleep for 10 minutes, I'm wondering if there's something internal to Rundeck that's causing a 10-or-so minute delay in allowing nodes to be reliably referenced. |
Today I created a fresh Rundeck server, configured a project similar to the project on the server above, gave it the same list of nodes, and gave it the same jobs. I was able to run the same job (create node, access node) that produced a ~50% failure rate as noted above with a 100% success rate. I tried using 1 and 3 minute sleeps after the Refresh Project nodes, and was unable to recreate the "No nodes matched for the filters" error. Since the node list, jobs, and project configuration are all identical between the two Rundeck servers, the issue with the first server outlined in the above comment must be somewhere else. Is it possible there's some sort of garbage buildup that can take place in servers that have been running for a while that can cause the sort of delay in recognition of nodes that we've been seeing? That's my first thought, since the only thing dissimilar between the two servers is their history and database. |
Ping on ^... the fact that a new Rundeck server doesn't experience the same delay in nodes being recognized as a longstanding Rundeck server, even when both servers have the same node list and the same project/job configuration leads me to believe there's an issue on the application's side. Are there some tunables we can tweak to fix problems with longstanding servers, or anything else we can look into to help debug this issue? |
In an effort to focus on bugs and issues that impact currently supported versions of Rundeck, we have elected to notify GitHub issue creators if their issue is classified as stale and close the issue. An issue is identified as stale when there have been no new comments, responses or other activity within the last 12 months. If a closed issue is still present please feel free to open a new Issue against the current version and we will review it. If you are an enterprise customer, please contact your Rundeck Support to assist in your request. |
Describe the bug
The job step "Refresh Project Nodes" does not allways work. I a workflow where a node resource is added, then a refresh step is run, the third step which refers to the newly added node sometimes fails with no nodes matched (best guess 1 in 10).
My Rundeck detail
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The refresh project nodes should allways synchrounously refresh job node resource xml files.
Screenshots
The text was updated successfully, but these errors were encountered: