Refresh Project Nodes does not allways work #3967

eblikstad · 2018-08-31T16:02:58Z

Describe the bug
The job step "Refresh Project Nodes" does not allways work. I a workflow where a node resource is added, then a refresh step is run, the third step which refers to the newly added node sometimes fails with no nodes matched (best guess 1 in 10).

My Rundeck detail

Rundeck version: 2.10.6
install type: launcher
OS Name/version: Windows 2012 R2
DB Type/version: Micrososft SQL Server

To Reproduce
Steps to reproduce the behavior:

Creae a new job
Add a job step which adds a node
Add a refresh project nodes step
Add a job reference step which has a node override for the newly added node
Re-run this job until it fails

Expected behavior
The refresh project nodes should allways synchrounously refresh job node resource xml files.

Screenshots

n-cc · 2020-03-23T16:58:48Z

I believe I'm encountering a similar issue, and after discussion with #rundeck on freenode, I was directed here.

Describe the bug
I have a job that runs a command on a node in project A that creates a node in project B, then cross-references jobs in project B in order to run ansible against the new node in project B. In order to have the new node available, I'm cross-referencing a job in project B to run "Refresh Project Nodes" in order to refresh the nodes available in project B so that the new one is available. However, no sleep period for "Refresh Project Nodes" (I've tested up to 1 hour) allows the new node to be available by cross-referenced jobs in project B. The following error is encountered:

No nodes matched for the filters: NodeSet{includes={name=nodename, dominant=false, }}

Immediately after the job fails with the above error, I can go to project B's node list in the web UI and confirm that nodename doesn't show up. After waiting ~3 minutes, it then shows up. This ~3 minute wait period is consistent regardless of whether or not the "Refresh Project Nodes" jobs has a sleep period of 5 minutes or 1 hour. It seems the new node only shows up after the job fails.

My Rundeck detail

Rundeck version: 3.2.3-20200221
install type: deb
OS Name/version: Ubuntu 18.04
DB Type/version: local mysql

To Reproduce
Steps to reproduce the behavior:

Create a new job in project A that:

Creates a node in project B
Runs a cross-project job in project B to run "Refresh Project Nodes," with any sleep period
Attempts to contact the new node via a cross-project job in project B

Run this job and wait for the error I pasted above
Enter the Nodes tab of project B, confirm the new node doesn't show up, wait ~3 minutes, and it shows up

Expected behavior
The new node would being available after running "Refresh Project Nodes" (instead, it only shows up ~3 minutes after the job fails, regardless of how long the sleep value is).

Screenshots
See above post, it's essentially the same thing.

I was able to recreate this issue with all jobs in the same project.

MegaDrive68k · 2020-03-23T20:36:33Z

Hi @n-cc

I found something with an example to reproduce (three jobs):

JobA (that adds a new server on Ansible node sources).

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <executionEnabled>true</executionEnabled>
    <id>2f71b2c0-991b-4191-9373-feec9c8c7514</id>
    <loglevel>INFO</loglevel>
    <name>JobA</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <plugins />
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <exec>echo "updating..."</exec>
      </command>
      <command>
        <fileExtension>.sh</fileExtension>
        <script><![CDATA[echo -e "\n192.168.33.22" >> /home/user/Downloads/hosts]]></script>
        <scriptargs />
        <scriptinterpreter>/bin/bash</scriptinterpreter>
      </command>
      <command>
        <step-plugin type='source-refresh-plugin'>
          <configuration>
            <entry key='sleep' value='15' />
          </configuration>
        </step-plugin>
      </command>
      <command>
        <exec>echo "done"</exec>
      </command>
    </sequence>
    <uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
  </job>
</joblist>

JobB (that executes a command in the future new node).

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <dispatch>
      <excludePrecedence>true</excludePrecedence>
      <keepgoing>false</keepgoing>
      <rankOrder>ascending</rankOrder>
      <successOnEmptyNodeFilter>false</successOnEmptyNodeFilter>
      <threadcount>1</threadcount>
    </dispatch>
    <executionEnabled>true</executionEnabled>
    <id>7a616d10-bf61-4953-a09e-d294287269c4</id>
    <loglevel>INFO</loglevel>
    <name>JobB</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <nodefilters>
      <filter>192.168.33.22</filter>
    </nodefilters>
    <nodesSelectedByDefault>true</nodesSelectedByDefault>
    <plugins />
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <exec>echo "hi"</exec>
      </command>
    </sequence>
    <uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
  </job>
</joblist>

ParentJob (that call JobA and JobB).

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <executionEnabled>true</executionEnabled>
    <id>5ff04e74-156c-4525-a666-76d3e5d30e72</id>
    <loglevel>INFO</loglevel>
    <name>Parent</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <plugins />
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <jobref name='JobA' nodeStep='true'>
          <uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
        </jobref>
      </command>
      <command>
        <exec>echo "hi"</exec>
      </command>
      <command>
        <jobref name='JobB' nodeStep='true'>
          <uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
        </jobref>
      </command>
    </sequence>
    <uuid>5ff04e74-156c-4525-a666-76d3e5d30e72</uuid>
  </job>
</joblist>

Executing that parent job I obtained the same error reported:

But putting a single "sleep 10" in the parent job before the JobB execution, it works like a charm. (of course, keep in mind that the 192.168.33.22 node doesn't exist before the execution of JobB)

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <executionEnabled>true</executionEnabled>
    <id>5ff04e74-156c-4525-a666-76d3e5d30e72</id>
    <loglevel>INFO</loglevel>
    <name>Parent</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <plugins />
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <jobref name='JobA' nodeStep='true'>
          <uuid>2f71b2c0-991b-4191-9373-feec9c8c7514</uuid>
        </jobref>
      </command>
      <command>
        <exec>sleep 10</exec>
      </command>
      <command>
        <jobref name='JobB' nodeStep='true'>
          <uuid>7a616d10-bf61-4953-a09e-d294287269c4</uuid>
        </jobref>
      </command>
    </sequence>
    <uuid>5ff04e74-156c-4525-a666-76d3e5d30e72</uuid>
  </job>
</joblist>

Result:

Perhaps the time of the "Refresh Project Nodes" step needs some adjustments to work correctly.

Workaround: put a "sleep 10" before the step that points to the new remote node.

Hope it helps!

n-cc · 2020-04-02T18:54:12Z

Thanks for the tip - looks like the sleep allows the node to be found when all jobs are in the same project, but I'm still having issues getting it to work cross-project. I'll do a bit more testing.

n-cc · 2020-04-29T15:28:30Z

I'm not sure what changed on my side, but the local sleep is no longer working for new nodes where all jobs are within the same project. ~~No length of sleep, before or after the refresh project nodes task (or even without it), is allowing new nodes to be recognized from within the same job that creates them.~~ Not true, see rate of failure for sleep values below.

n-cc · 2020-05-01T18:42:23Z

A bit more insight to the problem - we are using the ansible plugin, and in the project's Nodes -> Sources tab, we have an Ansible Resource Model source with the inventory file path set to a local directory on the rundeck server. Inside of this directory is a script that ansible runs to grab a list of remote hosts/nodes (scripts in inventory directories get ran automatically, and the returned hosts are added to the inventory). The plugin parses this output to create the project's list of nodes.

The job I'm testing is simple - it runs a command to create a new node on a remote host. I can run the script inside the inventory directory to confirm the new node is immediately showing up. It then refreshes the project nodes, which in the log output generates a bunch of these lines:

13:17:38 |   | node1 -> localhost]
13:17:39 |   | node2 -> localhost]
13:17:40 |   | node3 -> localhost]

The newly created node DOES show up in this output, but after sleeping for X minutes (see data below on failure rates), rundeck fails with the "No nodes matched for the filters..." error when it attempts to contact the node.

In short:

Create node -> refresh project nodes -> sleep X -> attempt to contact node

Here are my results of testing different sleep values after the refresh project nodes step (a success means it could contact the node and run steps, failure means it hit the "No nodes matched for the filters" error):

10 second sleep: 3 out of 8 jobs succeeded

3 minute sleep: 3 out of 8 jobs succeeded

10 minute sleep: 12 out of 12 succeeded

Given the consistent success of jobs that sleep for 10 minutes, I'm wondering if there's something internal to Rundeck that's causing a 10-or-so minute delay in allowing nodes to be reliably referenced.

n-cc · 2020-05-05T19:58:59Z

Today I created a fresh Rundeck server, configured a project similar to the project on the server above, gave it the same list of nodes, and gave it the same jobs. I was able to run the same job (create node, access node) that produced a ~50% failure rate as noted above with a 100% success rate. I tried using 1 and 3 minute sleeps after the Refresh Project nodes, and was unable to recreate the "No nodes matched for the filters" error.

Since the node list, jobs, and project configuration are all identical between the two Rundeck servers, the issue with the first server outlined in the above comment must be somewhere else. Is it possible there's some sort of garbage buildup that can take place in servers that have been running for a while that can cause the sort of delay in recognition of nodes that we've been seeing? That's my first thought, since the only thing dissimilar between the two servers is their history and database.

n-cc · 2020-07-24T17:40:26Z

Ping on ^... the fact that a new Rundeck server doesn't experience the same delay in nodes being recognized as a longstanding Rundeck server, even when both servers have the same node list and the same project/job configuration leads me to believe there's an issue on the application's side. Are there some tunables we can tweak to fix problems with longstanding servers, or anything else we can look into to help debug this issue?

stale · 2021-07-24T22:39:47Z

In an effort to focus on bugs and issues that impact currently supported versions of Rundeck, we have elected to notify GitHub issue creators if their issue is classified as stale and close the issue. An issue is identified as stale when there have been no new comments, responses or other activity within the last 12 months. If a closed issue is still present please feel free to open a new Issue against the current version and we will review it. If you are an enterprise customer, please contact your Rundeck Support to assist in your request.
Thank you, The Rundeck Team

jtobard self-assigned this Mar 24, 2020

stale bot added the wontfix:stale label Jul 24, 2021

stale bot closed this as completed Jul 27, 2021

MegaDrive68k mentioned this issue Feb 28, 2024

Refresh Project Nodes job step : no refresh if cache timout is not exceeded #8956

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh Project Nodes does not allways work #3967

Refresh Project Nodes does not allways work #3967

eblikstad commented Aug 31, 2018

n-cc commented Mar 23, 2020 •

edited

MegaDrive68k commented Mar 23, 2020 •

edited

n-cc commented Apr 2, 2020

n-cc commented Apr 29, 2020 •

edited

n-cc commented May 1, 2020

n-cc commented May 5, 2020

n-cc commented Jul 24, 2020

stale bot commented Jul 24, 2021

Refresh Project Nodes does not allways work #3967

Refresh Project Nodes does not allways work #3967

Comments

eblikstad commented Aug 31, 2018

n-cc commented Mar 23, 2020 • edited

MegaDrive68k commented Mar 23, 2020 • edited

n-cc commented Apr 2, 2020

n-cc commented Apr 29, 2020 • edited

n-cc commented May 1, 2020

n-cc commented May 5, 2020

n-cc commented Jul 24, 2020

stale bot commented Jul 24, 2021

n-cc commented Mar 23, 2020 •

edited

MegaDrive68k commented Mar 23, 2020 •

edited

n-cc commented Apr 29, 2020 •

edited