Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun old stage failing without providing console output #6376

Closed
arvindsv opened this issue Jun 3, 2019 · 8 comments · Fixed by #6389

Comments

@arvindsv
Copy link
Member

commented Jun 3, 2019

[Raising a new issue based on a comment in #1996]

We are experiencing a similar issue -- We are trying to go back in our pipeline history and manually invoke the second stage of in a previous pipeline. We see in the go-server.log file that that the stage is being assigned. However, after that the job is marked as failed. There is no console output in the job detail tab, nor can we find any corresponding error message or stack traces in the go-server or go-agent log files.

this is what we see in our log file:

go-server.log:2019-05-31 16:15:13,113 INFO  [qtp1166807841-17] ScheduleService:230 - [Stage Schedule] Scheduling stage publish_x for pipeline myapp_release
go-server.log:2019-05-31 16:15:20,068 INFO  [101@MessageListener for WorkFinder] BuildAssignmentService:191 - [Agent Assignment] Assigned job [JobIdentifier[myapp_release, 84, 2.3.5.84, publish_x, 9, aws, 50810]] to agent [Agent [ip-10-100-14-193.mydomain, 10.100.14.193, 194988d9-53e3-4f57-9a21-e38c53fdbc9d]]

The other thing thing to note is that even though we see in the go-server.log file that the stage is being assigned, on the detail page for the failed job we see this AGENT: Not yet assigned (Unknown).
There are two bits of information that might be playing a role here that are worth sharing:

  • We have recently changed the underlying template for the pipeline. So the template that was active when the original pipeline first stage ran is no longer active. However, the names of the stages and jobs within the pipeline remained the same.

  • For one of the materials in the pipeline, the current definition points to a different git branch that it did when the pipeline run we are trying to run the second stage on.

In our case, these are our environment details:

Basic environment details
  • Go Version: 19.4.0
  • JAVA Version: 1.8.0_101
  • OS: Linux 3.10.0-327.28.2.el7.x86_64

(We had noticed this problem on GoCD 19.3.0)

Not sure how to proceed in further troubleshooting this. Any help would be greatly appreciated.

Originally posted by @dlethin in #1996 (comment)

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

I suppose it's possible that something went wrong because the material is changed. We will need to reproduce it and see. Unfortunately, there won't be a guarantee of a rerun if the config changes too much, but I don't like that it showed no indication anywhere. It should be clear why something did not run, at least in the server logs.

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

/cc @dlethin - So that it subscribes them to this issue.

@ankitsri11

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

@arvindsv @dlethin - I was able to replicate the issue after changing the branch name and re-running the old stage. The stage failed with no console output.

However, when I changed the branch back to the original (master in my case) and re-ran older instance, it ran without any issue. Can you also confirm in your case?

Here are the steps I followed:

  1. Created a pipeline with below configuration (default master branch):
<pipeline name="DeployPipeline">
     <materials>
       <git url="https://github.com/some-repo />
     </materials>
     <stage name="DeployStage">
       <jobs>
         <job name="Job">
           <tasks>
             <exec command="ls">
               <runif status="passed" />
             </exec>
           </tasks>
         </job>
       </jobs>
     </stage>
   </pipeline>
  1. Ran it few times.
  2. Changed the branch to "feature" branch
      <materials>
        <git url="https://github.com/some-repo" branch="feature" />
      </materials>
  1. Re-ran the old instance of pipeline and it failed without console-output.

Screenshot 2019-06-03 at 15 19 08

Enabled debug logging by adding below in logback-include.xml.

<?xml version="1.0" encoding="UTF-8"?>
<included>
    <logger name="com.thoughtworks.go" level="DEBUG" additivity="true">
    <appender-ref ref="${gocd.server.logback.root.appender:-FileAppender}" />
  </logger>
</included>

Attaching the complete logs capturing the re-run activity:
tmp.txt

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 4, 2019

When a material is changed, it might not be able to recreate the previous state of the materials (and it knows it). Maybe. However, the console logs need to provide more information. I'm marking this as a bug we need to look into, to fix the console logs, if nothing else.

Thank you for bringing it up, @dlethin.

And thank you, @ankitsri11, for reproducing it and all the extra information.

@arvindsv arvindsv added this to the NextUp milestone Jun 4, 2019

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 4, 2019

Also, there is some error in the logs related to the cache - and I think it makes sense to investigate.

@dlethin

This comment has been minimized.

Copy link

commented Jun 4, 2019

Thanks for chasing this down so quickly @arvindsv -- I'm assuming I should be able to get past this by temporarily changing the current pipeline material branch definition as you say -- I just need to bubble this up my stack of priorities to be able to get to try this again. I will let you know otherwise if I run into problems. I agree it at the very least some error message would be helpful, though I would think it should be able to proceed using the githash of the material from the prior upstream stage run, but admittedly I don't know the internal workings very well to comment further.

cheers.

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 4, 2019

I will let you know otherwise if I run into problems.

👍

I agree it at the very least some error message would be helpful, though I would think it should be able to proceed using the githash of the material from the prior upstream stage run, but admittedly I don't know the internal workings very well to comment further.

Yes, someone will need to understand the flow and see what can be done, once the bug which is hiding the error is fixed. I agree, in an ideal world, it should try and rerun it, even if things have changed a bit.

@arvindsv

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

#6389 is related to this.

@maheshp maheshp added this to In progress in 19.8.0 Jul 25, 2019

@maheshp maheshp moved this from In progress to Done in 19.8.0 Sep 5, 2019

@rajiesh rajiesh moved this from Done to QA Done in 19.8.0 Sep 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.