Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self hosted agent - random task freeze, timing out #3468

Closed
roofiq opened this issue Jul 23, 2021 · 9 comments
Closed

Self hosted agent - random task freeze, timing out #3468

roofiq opened this issue Jul 23, 2021 · 9 comments

Comments

@roofiq
Copy link

roofiq commented Jul 23, 2021

Agent Version and Platform

Version of your agent? 2.188.4

OS of the machine running the agent? Windows

Azure DevOps Type and Version

dev.azure.com (formerly visualstudio.com)

What's not working?

We are running parallel jobs using our Agent Pool, that has multiple agents defined, that are on mostly 1 VM.
What is happening is that we experience randomly task freeze, which turns out into job timing out. This error happens even with simple powershell task like just Start-Sleep command.
I'm adding logs from worker and agent one.
It seems like it's failing when trying to process this kind of request:
[2021-07-22 12:29:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'a55cc9bb-b139-5270-9f34-7219c10f9a19', success rate: 1/1.

Agent and Worker's Diagnostic Logs

Agent worker logs:
[2021-07-22 12:27:11Z INFO StepsRunner] Processing step: DisplayName='Remove temporary files', ContinueOnError=False, Enabled=True [2021-07-22 12:27:11Z INFO ExpressionManager] Evaluating: SucceededNode() [2021-07-22 12:27:11Z INFO ExpressionManager] Result: True [2021-07-22 12:27:11Z INFO StepsRunner] Starting the step. [2021-07-22 12:27:11Z INFO HostContext] Well known directory 'Bin': 'E:\Agent1\bin' [2021-07-22 12:27:11Z INFO HostContext] Well known directory 'Root': 'E:\Agent1' [2021-07-22 12:27:11Z INFO HostContext] Well known directory 'Work': 'E:\Agent1\_work' [2021-07-22 12:27:11Z INFO StepsRunner] Which: 'chcp' [2021-07-22 12:27:11Z INFO StepsRunner] Location: 'C:\windows\system32\chcp.COM' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Starting process: [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] File name: 'C:\windows\system32\chcp.COM' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Arguments: '65001' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Working directory: 'E:\Agent1\_work' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Require exit code zero: 'False' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Persist current code page: 'True' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] High priority process: 'False' [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only. [2021-07-22 12:27:11Z INFO ProcessInvokerWrapper] Process started with process id 16304, waiting for process exit. [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Exited process 16304 with exit code 0 [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Finished process 16304 with exit code 0, and elapsed time 00:00:00.2168421. [2021-07-22 12:27:12Z INFO StepsRunner] Successfully returned to code page 65001 (UTF8) [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Bin': 'E:\Agent1\bin' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Root': 'E:\Agent1' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Work': 'E:\Agent1\_work' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Tasks': 'E:\Agent1\_work\_tasks' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Bin': 'E:\Agent1\bin' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Root': 'E:\Agent1' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'Work': 'E:\Agent1\_work' [2021-07-22 12:27:12Z INFO HostContext] Well known directory 'TaskZips': 'E:\Agent1\_work\_taskzips' [2021-07-22 12:27:12Z INFO TaskManager] Loading task definition 'E:\Agent1\_work\_tasks\PowerShell_e213ff0f-5d5c-4791-802d-52ea3e7be1f1\2.186.0\task.json'. [2021-07-22 12:27:12Z INFO TaskRunner] Get handler data for target platform Windows [2021-07-22 12:27:12Z INFO TaskRunner] Handler data is of type Microsoft.VisualStudio.Services.Agent.Worker.PowerShell3HandlerData [2021-07-22 12:27:12Z INFO ExtensionManager] Getting extensions for interface: 'Microsoft.VisualStudio.Services.Agent.Worker.IJobExtension' [2021-07-22 12:27:12Z INFO ExtensionManager] Getting extensions for interface: 'Microsoft.VisualStudio.Services.Agent.Worker.Build.ISourceProvider' [2021-07-22 12:27:12Z INFO BuildJobExtension] The Default Path Root of Build JobExtension is repository.path: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO BuildJobExtension] After prefix Default Path Root provide by JobExtension: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO BuildJobExtension] Return absolute path after prefix DefaultPathRoot: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO TaskRunner] Build JobExtension resolved a rooted path:: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO ExtensionManager] Getting extensions for interface: 'Microsoft.VisualStudio.Services.Agent.Worker.IJobExtension' [2021-07-22 12:27:12Z INFO ExtensionManager] Getting extensions for interface: 'Microsoft.VisualStudio.Services.Agent.Worker.Build.ISourceProvider' [2021-07-22 12:27:12Z INFO BuildJobExtension] The Default Path Root of Build JobExtension is repository.path: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO BuildJobExtension] After prefix Default Path Root provide by JobExtension: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO BuildJobExtension] Return absolute path after prefix DefaultPathRoot: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO TaskRunner] Build JobExtension resolved a rooted path:: E:\Agent1\_work\700\s [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Generation: '1' [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Key name 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\1\PowerShellEngine', value name 'PowerShellVersion': '2.0' [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Unsupported version. Skipping. [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Generation: '3' [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Key name 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\3\PowerShellEngine', value name 'PowerShellVersion': '5.1.17763.1' [2021-07-22 12:27:12Z INFO PowerShellExeUtil] Key name 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\3\PowerShellEngine', value name 'ApplicationBase': 'C:\Windows\System32\WindowsPowerShell\v1.0' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Starting process: [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] File name: 'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Arguments: '-NoLogo -Sta -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command ". ([scriptblock]::Create('if ([Console]::InputEncoding -is [Text.UTF8Encoding] -and [Console]::InputEncoding.GetPreamble().Length -ne 0) { [Console]::InputEncoding = New-Object Text.UTF8Encoding $false } if (!$PSHOME) { $null = Get-Item -LiteralPath ''variable:PSHOME'' } else { Import-Module -Name ([System.IO.Path]::Combine($PSHOME, ''Modules\Microsoft.PowerShell.Management\Microsoft.PowerShell.Management.psd1'')) ; Import-Module -Name ([System.IO.Path]::Combine($PSHOME, ''Modules\Microsoft.PowerShell.Utility\Microsoft.PowerShell.Utility.psd1'')) }')) 2>&1 | ForEach-Object { Write-Verbose $_.Exception.Message -Verbose } ; Import-Module -Name 'E:\Agent1\_work\_tasks\PowerShell_e213ff0f-5d5c-4791-802d-52ea3e7be1f1\2.186.0\ps_modules\VstsTaskSdk\VstsTaskSdk.psd1' -ArgumentList @{ NonInteractive = $true } -ErrorAction Stop ; $VerbosePreference = 'Continue' ; $DebugPreference = 'Continue' ; Invoke-VstsTaskScript -ScriptBlock ([scriptblock]::Create('. ''E:\Agent1\_work\_tasks\PowerShell_e213ff0f-5d5c-4791-802d-52ea3e7be1f1\2.186.0\powershell.ps1'''))"' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Working directory: 'E:\Agent1\_work\_tasks\PowerShell_e213ff0f-5d5c-4791-802d-52ea3e7be1f1\2.186.0' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Require exit code zero: 'True' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Persist current code page: 'True' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] High priority process: 'False' [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only. [2021-07-22 12:27:12Z INFO ProcessInvokerWrapper] Process started with process id 18484, waiting for process exit. [2021-07-22 12:27:12Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1. [2021-07-22 12:27:17Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:27:28Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:27:39Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:27:50Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:01Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:12Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:23Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:34Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:45Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:28:57Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:29:08Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:29:19Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:29:30Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:29:41Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 6/6. [2021-07-22 12:29:52Z INFO JobServerQueue] Try to append 3 batches web console lines for record 'f08058a6-8ac2-5eb4-d632-928460b2e6de', success rate: 3/3. [2021-07-22 12:29:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record '751088d8-a804-5db8-3c22-86bf4a110490', success rate: 1/1. [2021-07-22 12:29:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'c3f5113a-d233-5370-538a-233bbdf3b8bb', success rate: 1/1. [2021-07-22 12:29:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'a55cc9bb-b139-5270-9f34-7219c10f9a19', success rate: 1/1. [2021-07-22 13:25:46Z INFO Worker] Cancellation/Shutdown message received. [2021-07-22 13:25:46Z INFO ExpressionManager] Evaluating: SucceededNode() [2021-07-22 13:25:46Z INFO ExpressionManager] Result: False [2021-07-22 13:25:46Z INFO StepsRunner] Cancel current running step. [2021-07-22 13:25:46Z INFO ProcessInvokerWrapper] Sending CTRL_C to process 18484. [2021-07-22 13:25:46Z INFO ProcessInvokerWrapper] Successfully sent CTRL_C to process 18484. [2021-07-22 13:25:46Z INFO ProcessInvokerWrapper] Waiting for process exit or 7.5 seconds after CTRL_C signal fired. [2021-07-22 13:25:46Z INFO ProcessInvokerWrapper] Ignore Ctrl+C to current process. [2021-07-22 13:25:50Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'a55cc9bb-b139-5270-9f34-7219c10f9a19', success rate: 1/1. [2021-07-22 13:25:54Z INFO ProcessInvokerWrapper] Process did not honor CTRL_C signal within 7.5 seconds. [2021-07-22 13:25:54Z INFO ProcessInvokerWrapper] Sending CTRL_BREAK to process 18484. [2021-07-22 13:25:54Z INFO ProcessInvokerWrapper] Successfully sent CTRL_BREAK to process 18484. [2021-07-22 13:25:54Z INFO ProcessInvokerWrapper] Waiting for process exit or 2.5 seconds after CTRL_BREAK signal fired. [2021-07-22 13:25:54Z INFO ProcessInvokerWrapper] Ignore Ctrl+Break to current process. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Process did not honor CTRL_BREAK signal within 2.5 seconds. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Kill entire process tree since both cancel and terminate signal has been ignored by the target process. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Scan all processes to find relationship between all processes. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Find all child processes of process '18484'. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Kill process '18484'. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Exited process 18484 with exit code -1 [2021-07-22 13:25:56Z INFO ProcessInvokerWrapper] Finished process 18484 with exit code -1, and elapsed time 00:58:44.7455484. [2021-07-22 13:25:56Z ERR StepsRunner] Caught cancellation exception from step: System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at Microsoft.VisualStudio.Services.Agent.Util.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, InputQueue1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at Microsoft.VisualStudio.Services.Agent.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, InputQueue1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at Microsoft.VisualStudio.Services.Agent.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)
at Microsoft.VisualStudio.Services.Agent.Worker.Handlers.PowerShell3Handler.RunAsync()
at Microsoft.VisualStudio.Services.Agent.Worker.TaskRunner.RunAsync()
at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
[2021-07-22 13:25:56Z INFO StepsRunner] Step result: Canceled
[2021-07-22 13:25:56Z INFO StepsRunner] No need for updating job result with current step result 'Canceled'.
[2021-07-22 13:25:56Z INFO StepsRunner] Current state: job state = 'Canceled'
[2021-07-22 13:25:56Z INFO StepsRunner] Processing step: DisplayName='ExtractFiles', ContinueOnError=False, Enabled=True
[2021-07-22 13:25:56Z INFO ExpressionManager] Evaluating: SucceededNode()
[2021-07-22 13:25:56Z INFO ExpressionManager] Result: False`

Agent logs from machine:
[2021-07-22 05:46:10Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Agent-1\.credentials_rsaparams [2021-07-22 05:46:11Z INFO VisualStudioServices] AAD Correlation ID for this token request: Unknown [2021-07-22 05:46:29Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 5:56:29 AM [2021-07-22 05:47:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 5:57:30 AM [2021-07-22 05:48:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 5:58:30 AM [2021-07-22 05:49:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 5:59:30 AM [2021-07-22 05:50:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:00:30 AM [2021-07-22 05:51:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:01:30 AM [2021-07-22 05:52:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:02:30 AM [2021-07-22 05:53:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:03:30 AM [2021-07-22 05:54:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:04:30 AM [2021-07-22 05:55:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:05:30 AM [2021-07-22 05:56:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:06:30 AM [2021-07-22 05:57:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:07:30 AM [2021-07-22 05:58:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:08:30 AM [2021-07-22 05:59:30Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:09:30 AM [2021-07-22 05:59:31Z INFO MessageListener] No message retrieved from session '6fc88fb9-783c-4266-b908-cc760af17052' within last 30 minutes. [2021-07-22 06:00:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:10:30 AM [2021-07-22 06:01:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:11:31 AM [2021-07-22 06:02:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:12:31 AM [2021-07-22 06:03:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:13:31 AM [2021-07-22 06:04:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:14:31 AM [2021-07-22 06:05:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:15:31 AM [2021-07-22 06:06:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:16:31 AM [2021-07-22 06:07:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:17:31 AM [2021-07-22 06:08:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:18:31 AM [2021-07-22 06:09:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:19:31 AM [2021-07-22 06:10:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:20:31 AM [2021-07-22 06:11:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:21:31 AM [2021-07-22 06:12:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:22:31 AM [2021-07-22 06:13:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:23:31 AM [2021-07-22 06:14:31Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:24:31 AM [2021-07-22 06:15:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:25:31 AM [2021-07-22 06:16:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:26:32 AM [2021-07-22 06:17:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:27:32 AM [2021-07-22 06:18:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:28:32 AM [2021-07-22 06:19:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:29:32 AM [2021-07-22 06:20:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:30:32 AM [2021-07-22 06:21:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:31:32 AM [2021-07-22 06:22:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:32:32 AM [2021-07-22 06:23:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:33:32 AM [2021-07-22 06:24:32Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:34:32 AM [2021-07-22 06:25:33Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:35:32 AM [2021-07-22 06:26:33Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:36:33 AM [2021-07-22 06:27:33Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:37:33 AM [2021-07-22 06:28:33Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:38:33 AM [2021-07-22 06:29:29Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Agent-1\.credentials_rsaparams [2021-07-22 06:29:29Z INFO MessageListener] Message '35785' received from session '6fc88fb9-783c-4266-b908-cc760af17052'. [2021-07-22 06:29:29Z INFO JobDispatcher] Job cancellation request 1be9e02d-deaa-5c44-fbd8-851205c1ddc4 received, cancellation timeout 5 minutes. [2021-07-22 06:29:29Z INFO JobDispatcher] Send job cancellation message to worker for job 1be9e02d-deaa-5c44-fbd8-851205c1ddc4. [2021-07-22 06:29:33Z INFO JobDispatcher] Successfully renew job request 827359, job is valid till 7/22/2021 6:39:33 AM [2021-07-22 06:30:02Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 06:30:02Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. [2021-07-22 06:30:02Z INFO ProcessInvokerWrapper] Exited process 13836 with exit code 103 [2021-07-22 06:30:02Z INFO ProcessInvokerWrapper] Finished process 13836 with exit code 103, and elapsed time 01:00:33.2781299. [2021-07-22 06:30:02Z INFO JobDispatcher] finish job request for job 1be9e02d-deaa-5c44-fbd8-851205c1ddc4 with result: Canceled

Let me know if I should provide more info.

Thanks,
Rafał

@roofiq roofiq changed the title Self hosted agent - random task freeze, timing out (103 exit code) Self hosted agent - random task freeze, timing out Jul 23, 2021
@ghost ghost added bug question and removed triage labels Jul 23, 2021
@roofiq
Copy link
Author

roofiq commented Jul 27, 2021

Hello, bumping it, can someone take a look?
Below another log snippet, there's an 1 hour break between logging and agent terminated job.

[2021-07-26 07:59:17Z INFO JobServerQueue] Try to append 1 batches web console lines for record '99166eec-5a6e-5280-8b3e-2799acccd397', success rate: 1/1.
[2021-07-26 07:59:24Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-07-26 08:59:06Z INFO Worker] Cancellation/Shutdown message received.
[2021-07-26 08:59:06Z INFO ExpressionManager] Evaluating: SucceededNode()
[2021-07-26 08:59:06Z INFO ExpressionManager] Result: False

And logs for the same build and same task that were successful:

[2021-07-26 07:38:29Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'dbad12a4-c45f-52d2-a0af-7f2f0da32396', success rate: 1/1.
[2021-07-26 07:38:30Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-07-26 07:38:33Z INFO HttpTrace] Trace System.Net.Http.HttpRequestOut.Start event:

What should we analyze to troubleshoot it? RAM usage, background processes?

@jok0
Copy link

jok0 commented Sep 23, 2021

Hello, bumping it, can someone take a look?
Below another log snippet, there's an 1 hour break between logging and agent terminated job.

[2021-07-26 07:59:17Z INFO JobServerQueue] Try to append 1 batches web console lines for record '99166eec-5a6e-5280-8b3e-2799acccd397', success rate: 1/1.
[2021-07-26 07:59:24Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-07-26 08:59:06Z INFO Worker] Cancellation/Shutdown message received.
[2021-07-26 08:59:06Z INFO ExpressionManager] Evaluating: SucceededNode()
[2021-07-26 08:59:06Z INFO ExpressionManager] Result: False

And logs for the same build and same task that were successful:

[2021-07-26 07:38:29Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'dbad12a4-c45f-52d2-a0af-7f2f0da32396', success rate: 1/1.
[2021-07-26 07:38:30Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2021-07-26 07:38:33Z INFO HttpTrace] Trace System.Net.Http.HttpRequestOut.Start event:

What should we analyze to troubleshoot it? RAM usage, background processes?

I am seeing this issue too, but more the 'try to append 1 batches web' message will have hours delay between. Our 1 hour job can sometimes hang for 12hours!

Did you ever get to the bottom of this?

@stephajn
Copy link

Here's a bump for this issue as well with the newest version of the agent running on an Azure Windows Server 2022 Datacenter edition OS.

The agent intermittently hangs on the simplest of tasks like doing something with the IIS management task, or running a simple powershell script that should take one or two seconds but instead goes and goes and goes for ages before I cancel a release and then hit redeploy and THEN it succeeds.

This is only happening with agents running version 2.198.3. Other agents that are running an older version of the software are experiencing no problems at all.

@anatolybolshakov
Copy link
Contributor

Hi @stephajn could you please share agent logs to investigate for problematic runs? You can send them to my email - v-anbols@microsoft.com

@toddwalstad-eaton
Copy link

We ran into this same issue with a new deployment of ADO agents last month. It turned out to be our Virus scanner (Trend) causing the issues. I created a simple script of 100 powershell commands and it would lock up about every other run. As soon as we disabled the virus scanner, we could run it 20 times in a row without an issue. I have also seen issues with deployments while using Norton.

We ended up getting to use a different vendor for our AV solution (Crowdstrike) and haven't had any issues since.

@github-actions
Copy link

github-actions bot commented Sep 6, 2022

This issue has had no activity in 180 days. Please comment if it is not actually stale

@LeaCCC
Copy link

LeaCCC commented May 7, 2024

Has any fix been implemented to this issue. I'm having the same issue with self hosted agent version 3.225.0.
The task got stuck at "Try to append 1 batches web console lines for record"
I set the pipeline time out to be 5 minutes and that cancelled the process afterwards.

[2024-05-07 01:44:36Z INFO ProcessInvokerWrapper] ContinueAfterCancelProcessTreeKillAttempt: 'True'
[2024-05-07 01:44:37Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only.
[2024-05-07 01:44:37Z INFO ProcessInvokerWrapper] Process started with process id 10500, waiting for process exit.
[2024-05-07 01:44:37Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[2024-05-07 01:44:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '771d55ec-34a7-5ddd-398e-3e8e57b4f3f0', success rate: 1/1.
[2024-05-07 01:44:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '6bc11752-9934-5965-4fc2-cd3b5ece4252', success rate: 1/1.
[2024-05-07 01:44:56Z INFO JobServerQueue] Try to append 1 batches web console lines for record '6bc11752-9934-5965-4fc2-cd3b5ece4252', success rate: 1/1.
[2024-05-07 01:48:57Z INFO Worker] Cancellation/Shutdown message received.
[2024-05-07 01:48:57Z INFO ExpressionManager] Evaluating: succeeded()
[2024-05-07 01:48:57Z INFO ExpressionManager] Result: False
[2024-05-07 01:48:57Z INFO StepsRunner] Cancel current running step.
[2024-05-07 01:48:57Z INFO ProcessInvokerWrapper] Sending CTRL_C to process 10500.

@Gt180Full
Copy link

Has any fix been implemented to this issue. I'm having the same issue with self hosted agent version 3.225.0. The task got stuck at "Try to append 1 batches web console lines for record" I set the pipeline time out to be 5 minutes and that cancelled the process afterwards.

[2024-05-07 01:44:36Z INFO ProcessInvokerWrapper] ContinueAfterCancelProcessTreeKillAttempt: 'True' [2024-05-07 01:44:37Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only. [2024-05-07 01:44:37Z INFO ProcessInvokerWrapper] Process started with process id 10500, waiting for process exit. [2024-05-07 01:44:37Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1. [2024-05-07 01:44:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '771d55ec-34a7-5ddd-398e-3e8e57b4f3f0', success rate: 1/1. [2024-05-07 01:44:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record '6bc11752-9934-5965-4fc2-cd3b5ece4252', success rate: 1/1. [2024-05-07 01:44:56Z INFO JobServerQueue] Try to append 1 batches web console lines for record '6bc11752-9934-5965-4fc2-cd3b5ece4252', success rate: 1/1. [2024-05-07 01:48:57Z INFO Worker] Cancellation/Shutdown message received. [2024-05-07 01:48:57Z INFO ExpressionManager] Evaluating: succeeded() [2024-05-07 01:48:57Z INFO ExpressionManager] Result: False [2024-05-07 01:48:57Z INFO StepsRunner] Cancel current running step. [2024-05-07 01:48:57Z INFO ProcessInvokerWrapper] Sending CTRL_C to process 10500.

I was with the same problem, I downgraded the version of terraform to version 1.7.5 and returned to work with normal time.

@LeaCCC
Copy link

LeaCCC commented May 23, 2024

@Gt180Full thank you for your comment. We are not using Terraform and still have this issue. Since #3468 has been closed, I just logged a new bug here:
#4812

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants