Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Re-running failed jobs leaves a warning with null exception in the logs #5587
On re-running failed jobs from the jobs tab on stage details a null exception is thrown in the logs with a large stack trace. While this does not block the job assignment this can flood the logs.
Basic environment details
Additional Environment Details
Steps to Reproduce
The logs should not be flooded
The logs are flooded
The above save() method is responsible for incrementing the stage counter (indicative of the number of times the stage has run). It then saves the stage information along with counter and also saves job information
The above building() method is responsible for setting the state of the stage to StageState.Building enum value. We will cover the significance of "if" condition in later section below.
Before a stage is saved, it's state is changed to
Before 2015, the building() method was only setting the state of the stage to StageState.Building constant. This was based on the assumption that the stage should not have a state (should be NULL) before the stage is saved using save() method above. But it was noticed that the stage sometimes had some state (NOT NULL state - contained one of the values in
In order to debug, the issue of when and why the stage contains a state before save, an if condition was introduced in the building() method to log a warning message along with a detailed stack trace whenever stage has a state before saving itself. Later it was found that during re-runs (either of stage or job), the state has a value before saving itself. So additional !isRerun() check was introduced to the earlier if condition (see the building() method above) as the earlier assumption of the stage having no state before saving was no longer valid.
At that time, it was still not clear when exactly the stage contains a state (especially before save). So the warning log message was kept intact (not removed) to catch any inconsistent state changes based on the detailed stack trace that was logged along with the message.
Based on the investigation, it is now clear that the stage does not contain any state (before saving) whenever the stage is run/re-run. However, that is not the case with jobs. Whenever a job is re-run, the stage always contains the state of its previous run, before save() method above is called.
Another observation is - isRerun() is written with the assumption that the stage counter always reflects the number of times the stage is run either directly or due to pipeline or job run or re-run. This is not true because the counter value is incremented only at the end (in save() method above). So if a stage is run again and if it has run 3 times before, then only after save() method completes, stage counter would have a value of 4 and this counter value of 4 would be saved in DB (in a table called stages). If we depend on the counter value before save, then it would be incorrect. This is what has happened in isRerun() method wherein it is called before save and it relies on this counter value being > 1 for a run to be re-run.
Before save() is called, why does a stage not have a state on it’s run (or re-run)?
This is because the spring
Before save() is called, why does a stage has a state when a job is run (or re-run)?
This is because on a job re-run when rails controller (
So given all of the above, we can completely remove the warning message without any regression impact.