Skip to content

[JENKINS-53401] Random FileNotFoundException when creating lots of agents in parallel threads #19979

@jenkins-infra-bot

Description

@jenkins-infra-bot

Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to node/config.xml.

Also:   java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp -> /var/jenkins_home/nodes/myagent-5pr7b/config.xml
		at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
		at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
		at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
		at java.nio.file.Files.move(Files.java:1395)
		at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:191)
java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
	at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
	at java.nio.file.Files.move(Files.java:1395)
	at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:206)
	at hudson.XmlFile.write(XmlFile.java:198)
	at jenkins.model.Nodes.save(Nodes.java:289)
	at hudson.util.PersistedList.onModified(PersistedList.java:173)
	at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
	at hudson.model.Slave.(Slave.java:198)
	at hudson.slaves.AbstractCloudSlave.(AbstractCloudSlave.java:51)
	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.(KubernetesSlave.java:116)
	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$Builder.build(KubernetesSlave.java:408)
	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:122)
	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:35)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I tracked the root cause being the nodeProperties field in hudson.model.Slave.

If you have a lot of agents created in different threads, this will cause to call Jenkins.get().getNodesObject().save in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.


Originally reported by vlatombe, imported from: Random FileNotFoundException when creating lots of agents in parallel threads
  • assignee: vlatombe
  • status: Resolved
  • priority: Major
  • component(s): core
  • label(s): 2.138.2-fixed, robustness
  • resolution: Fixed
  • resolved: 2018-09-26T08:51:44+00:00
  • votes: 0
  • watchers: 4
  • imported: 2025-11-24
Raw content of original issue

Upon creating lots of agents in parallel (Cloud provisioning containers), I see sometimes random exceptions reported moving temporary files to node/config.xml.

Also:   java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp -> /var/jenkins_home/nodes/myagent-5pr7b/config.xml
		at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
		at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
		at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
		at java.nio.file.Files.move(Files.java:1395)
		at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:191)
java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/myagent-5pr7b/atomic4488666319135941520tmp
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
	at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
	at java.nio.file.Files.move(Files.java:1395)
	at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:206)
	at hudson.XmlFile.write(XmlFile.java:198)
	at jenkins.model.Nodes.save(Nodes.java:289)
	at hudson.util.PersistedList.onModified(PersistedList.java:173)
	at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
	at hudson.model.Slave.<init>(Slave.java:198)
	at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.<init>(KubernetesSlave.java:116)
	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$Builder.build(KubernetesSlave.java:408)
	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:122)
	at com.cloudbees.jenkins.plugins.kube.PlannedKubernetesSlave.call(PlannedKubernetesSlave.java:35)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I tracked the root cause being the nodeProperties field in hudson.model.Slave.

If you have a lot of agents created in different threads, this will cause to call Jenkins.get().getNodesObject().save in each thread. This method is not thread-safe, and affects all nodes storage. As a result, in some threads, save() throws an exception because the node has been already processed through another thread.

In JENKINS-31055, Stephen made Node implement Saveable, which means the persisted lists should be tied to the node instead of the Nodes object. The corresponding save() operation is fine-grained, so the issue would be avoided completely.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions