New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for agent hang when process cannot be killed #4074
Fix for agent hang when process cannot be killed #4074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merlynomsft thank you so much for your work!
I artificially reproduced the issue to test these changes.
The agent is working properly and the issue is resolved.
Please note currently this behavior is opt-in. To do so, set the following environment variable on the agent: VSTSAGENT_CONTINUE_AFTER_CANCEL_PROCESSTREEKILL_ATTEMPT: true |
This PR fixes an issue where the agent will hang and fail to upload logs if the process executed cannot be killed during cancellation. The pipeline hangs until timeoutInMinutes + cancelTimeoutInMinutes
The log fails to upload and a blank log is shown in the log viewer UX:
Analysis
Killing process during cancellation is 'best effort' and may fail if, for some reason, the process cannot be killed. In that case, the agent waits forever in a loop for the process to exit. Fix is to exit the loop after attempting to kill the process.
Testing
This issue is not easy to reproduce, because it requires the agent to launch a process that cannot be killed
I simulated this condition by modifying ProcessInvoker to skip killing the process
**Knob / Enabling **
The behavior is disabled by default. This change is enabled by setting knob: VSTSAGENT_CONTINUE_AFTER_CANCEL_PROCESSTREEKILL_ATTEMPT
VSO 2009025