New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runOnAllAgents="true"
added to jobs at GoCD startup
#11868
Comments
Can you describe the pattern a bit more clearly here? I feel there is a lot of missing context. Was this on server upgrade? A normal restart? For a single job as a once off? For many different pipelines? Every time you restart? Multiple times? What commonality is there between the type of jobs this is happening to if more than one? Have you seen this historically (if you're down in the history already and don't normally use this Possibly a dumb question, but are you confident that no-one manually changed the master Without any way of narrowing it down, patterns or method/theory for reproducing what you are seeing, there's unlikely much can be done here. |
I had this happen twice two me. On update from 23.1 -> 23.2 and from 23.2 -> 23.3. We have about 100 pipes and it only happend on one in each update round. 23.1 -> 23.2 excerpt from
Upgrader commit in config git added this
23.2 -> 23.3
This time it did a bit more. Apart from adding "runOnAllAgents=true" on one pipe it touch some more (mostly bool values)
This time to go-server.log yielded some more information
|
Good questions! I was on holiday last week, let me ask the colleagues who were here during the restart, and we'll get back to you with more context.
We've seen this multiple times over the last few months, looking back at the logs, here's another example of (what we believe is) GoCD doing this (note the timestamps, being within seconds of each other): commit eca98b446e4322b8eba1ab19ba2c9378803bc572
Author: anonymous <go-cd-dev@googlegroups.com>
Date: Sat Jul 22 14:07:10 2023 +0000
user:anonymous|timestamp:1690034830125|schema_version:139|go_edition:OpenSource|go_version:23.1.0 (16080-54a6971915ff8d
402c9fea8cd2ceeb6e31c8cdc8)|md5:c3ca08a3776e3ccd6969b7e2417d7de3
diff --git a/cruise-config.xml b/cruise-config.xml
index 97b3f0694..c43f4def2 100644
--- a/cruise-config.xml
+++ b/cruise-config.xml
@@ -12102,7 +12102,7 @@ fi</arg>
<artifact type="test" src="po-feature-tests/target/logs/docker" dest="logs" />
</artifacts>
</job>
- <job name="compliance-feature-tests">
+ <job name="compliance-feature-tests" runOnAllAgents="true">
<tasks>
<fetchartifact artifactOrigin="gocd" srcfile="tag" stage="build" job="defaultJob" />
<exec command="/bin/bash">
commit 6256feea1bcf9bcb5b48ce087a560450e60238f1
Author: Upgrade <go-cd-dev@googlegroups.com>
Date: Sat Jul 22 14:07:05 2023 +0000
user:Upgrade|timestamp:1690034824889|schema_version:139|go_edition:OpenSource|go_version:23.1.0 (16080-54a6971915ff8d402c9fea8cd2ceeb6e31c8cdc8)|md5:95adbb5a400a6292b8fd216333a07e53
diff --git a/cruise-config.xml b/cruise-config.xml
index c43f4def2..97b3f0694 100644
--- a/cruise-config.xml
+++ b/cruise-config.xml
@@ -12102,7 +12102,7 @@ fi</arg>
<artifact type="test" src="po-feature-tests/target/logs/docker" dest="logs" />
</artifacts>
</job>
- <job name="compliance-feature-tests" runOnAllAgents="true">
+ <job name="compliance-feature-tests">
<tasks>
<fetchartifact artifactOrigin="gocd" srcfile="tag" stage="build" job="defaultJob" />
<exec command="/bin/bash">
commit 3d333bea4e45908713a238508128d70d42d2d4dc
Author: Filesystem <go-cd-dev@googlegroups.com>
Date: Sat Jul 22 14:07:04 2023 +0000
user:Filesystem|timestamp:1690034824645|schema_version:139|go_edition:OpenSource|go_version:23.1.0 (16080-54a6971915ff8d402c9fea8cd2ceeb6e31c8cdc8)|md5:c3ca08a3776e3ccd6969b7e2417d7de3
diff --git a/cruise-config.xml b/cruise-config.xml
index 97b3f0694..c43f4def2 100644
--- a/cruise-config.xml
+++ b/cruise-config.xml
@@ -12102,7 +12102,7 @@ fi</arg>
<artifact type="test" src="po-feature-tests/target/logs/docker" dest="logs" />
</artifacts>
</job>
- <job name="compliance-feature-tests">
+ <job name="compliance-feature-tests" runOnAllAgents="true">
<tasks>
<fetchartifact artifactOrigin="gocd" srcfile="tag" stage="build" job="defaultJob" />
<exec command="/bin/bash"> As we are running our GoCD server in Kubernetes, it's possible that it may restart at "unplanned times", but maybe there's even more patterns to be found (like @k-c-p so nicely described above). |
Thanks @k-c-p s log sharing is helpful. I don't think anything has changed anywhere in the config loading area that would make it any more likely than it ever was for something like this to happen (at least over the past couple of years I have been helping maintain GoCD) which is why I am interested in folks who might have a longer config history. Honestly the whole config area is terrifying to me, and I avoid touching it 😇 I am aware of at least one other bug that seems somewhat similar to this (on restart certain encrypted values get persisted unclencrypted). Clearly the |
Searched logs for
Here's one of them, with following log lines:
|
Adding further logs before this (for context):
|
The change being
... is weird, but might point at a possible circumstance of the problem. What is perhaps happening is some strange race condition between the normal config validation at startup ( Overall, we'd have to correlated with what the logs say at that time to understand more - it's not particularly useful to just see the config history on its own. The strangest thing here to me is how |
@MPV In your case, how far back history would you have for your config repository? Years? Back through which GoCD version? If you do have such a longer history, and can see a clear place this My current working, but largely unvalidated theory here is that there is some sort of thread safety problem with the ancient type converter logic here that is leading to this bizarre behaviour (it is starting with a
If that's the case, it's been that way for a very long time, and there's no obvious explanation of why it would have gotten any "worse" except perhaps with some other reason for increased chance of a race condition over time. But worth exploring. |
I saw occurrences of changes to this value in our GoCD git history from back in 2015, though I'm not at my computer at the moment so I can't verify if it was flapping as early as that or just manual changes back then. If I remember correctly it was more recently (the past year) I saw it occurring. And mostly a single random pipeline/stage/job affected, not all of them (we have many hundreds). We do run our GoCD in Kubernetes (since quite a few years back) so there's mounted ConfigMaps, paths and remote disks for persistence, but I guess that's common. More context; we have some hundred pipelines defined manually, then a few hundred pipelines defined as code/yaml. We also maintain less than fifty pipeline templates as code, which we keep upserting into GoCD (using the json api) whenever developers (or us in the platform team) make changes to it. The latter gives occasional quite hefty diffs in the config repo (as in one templates being rearranged after/before another), but nothing which I assume would cause this (apart from making my troubleshooting/git history digging a bit messier). |
And here are some logs from the issue back in July 22nd that I showed git commits from back in #11868 (comment)
|
For the issue seen at |
Hey folks, ive closed the issue now on the basis that I can reliably replicate the specific bug that leads to the error deserializing the runOnAllAgents value and confirmed that it is a threading issue, which I've fixed. What I haven't tried to do, or fully replicated, is how the value ends up as 'true' and then alternates back, nor specifically which threads are trying to do conconcurrent deserialisation at startup and whether that is expected - but I can see how it is conceptually possible (this is just a limitation of my own knowledge of the project). The other attributes flipping in inexplicable ways are likely to be related, but booleans will cause more obvious issues since they are non-nullable primitives. Since this threading issue is essentially to do with the java memory model across threads, some really unpredictable situations could arise. It may just be that further CPU and JVM optimisations over time have made this more likely to happen - as it's code largely untouched for 8-9 years. I suspect the bigger the config (with lots of non config repo pipelines) the likelier the problem is to manifest. You probably don't need to keep digging or analysing on this specific one, but will leave that to you to decide. I'll re-open if I discover more but the real test will probably be in the real world with 23.4.0 I suspect. |
@chadlwilson Thanks for all your help in troubleshooting and eventually (hopefully) fixing this. 🙏💐🎁 |
Hi folks. Sorry for the delay, but wanted to add that this change is now out in GoCD 23.4.0. Fingers crossed that we've nailed this one! https://www.gocd.org/download/ |
I have just upated by GoCD instance from 23.3 to 23.4. Looks good: No "runOnAllAgents" setting appeared out of thin air :-) |
Issue Type
Summary
GoCD surprisingly adds
runOnAllAgents="true"
to jobs.Environment
👋 Let me know if you need this and I can share this individually, as it might contain sensitive information.
Basic environment details
23.1.0 (16080-54a6971915ff8d402c9fea8cd2ceeb6e31c8cdc8)
17.0.6
Linux 5.10.176+
Additional Environment Details
Steps to Reproduce
Expected Results
Actual Results
runOnAllAgents="true"
to jobs.Possible Fix
🤷
Log snippets
Code snippets/Screenshots
Here is an example of a commit GoCD makes:
Any other info
The text was updated successfully, but these errors were encountered: