Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Windows Password never available on Jenkins after recent PRs #297

Closed
peterzhuamazon opened this issue May 29, 2023 · 17 comments
Closed
Assignees
Labels
bug Something isn't working packer v2.8.0 windows

Comments

@peterzhuamazon
Copy link
Member

After #289 #291 #292 #293 #294 #295 #296, windows stuck on not able to boot and get password.

EC2 (Amazon_ec2_cloud) - jenkinsAgentNode-Jenkins-Agent-Windows2019-X64-C524xlarge-Single-Host
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.
Waiting for password to be available. Sleeping 10s.

@peterzhuamazon
Copy link
Member Author

Seems like able to connect here:

Connecting to (10.0.101.113) with WinRM as Administrator

Wonder why it is stucking right now.

image

@peterzhuamazon
Copy link
Member Author

Right after attempt to connect it failed:

<pre id="out"><div>
</div><div>Waiting for WinRM to come up. Sleeping 10s.
</div><div>




<h2>HTTP ERROR 404 Not Found</h2>


URI: | /manage/computer/EC2%20%28Amazon%5Fec2%5Fcloud%29%20%2D%20jenkinsAgentNode%2DJenkins%2DAgent%2DWindows2019%2DX64%2DC524xlarge%2DSingle%2DHost%20%28i%2D019b97c099c560adc%29/logText/progressiveHtml
-- | --
404
Not Found
Stapler

</div></pre>

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 29, 2023

Manually remove queue:

[hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6315, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6330, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6331, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6332, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6333, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6334, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6335, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6337, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6373, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6374, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6397, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6402, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6403, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6404, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6415, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6419, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6430, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6431, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6436, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6441, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6442, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6443, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6446, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6447, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6448, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6449, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6450, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6451, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6452, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6453, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6454, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6455, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6456, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6461, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6475, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6493, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6495, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6499, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6503, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6533, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@<>[gradle-check]:6534, hudson.model.Queue$BlockedItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@a<>[gradle-check]:6535]

Aborting build #16445
Aborting build #16440
Aborting build #16434
Aborting build #16432
Aborting build #16430
Aborting build #16429
Aborting build #16428
Aborting build #16427
Aborting build #16424
Aborting build #16422
Aborting build #16417
Aborting build #16416
Aborting build #16412
Aborting build #16406
Aborting build #16391
Aborting build #16378
Aborting build #16361
Aborting build #16357
Aborting build #16348
Aborting build #16343
All running builds of job 'gradle-check' aborted.

@peterzhuamazon
Copy link
Member Author

Switch all the agents on build.ci.opensearch.org for windows back to 2023/05/19 versions.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 29, 2023

The above old ami confirms to connect with Jenkins


Waiting for password to be available. Sleeping 10s.
Connecting to (10.0.103.87) with WinRM as Administrator
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
Connected with WinRM.
Creating tmp directory if it does not exist
Executing init script

C:\Users\Administrator>echo
ECHO is on.
init script ran successfully
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar C:\Windows\Temp\remoting.jar -workDir C:/Users/Administrator/jenkins

Will check 2023/05/25 troubled AMI on personal account.

@peterzhuamazon
Copy link
Member Author

Test running normal windows without any issues on connection, very weird.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 29, 2023

Possibly is the 300 seconds connection timeout is not enough on the new ami, will test tomorrow morning on that.

@peterzhuamazon
Copy link
Member Author

Seems like the loading time of the new AMI is longer even when manually logging in.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 29, 2023

Seems like stucking on this for a while:
image

@peterzhuamazon
Copy link
Member Author

Takes 3 times longer just to get into Windows desktop.

@peterzhuamazon
Copy link
Member Author

Seeing some errors:


ERROR: Timed out after 1007 seconds of waiting for winrm to be connected
com.amazonaws.AmazonClientException: Timed out after 1007 seconds of waiting for winrm to be connected
	at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:142)
	at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:52)
	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:298)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 30, 2023

After removing cypress the connection is ontime now.

Waiting for password to be available. Sleeping 10s.
Connecting to (<>) with WinRM as Administrator
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
Connected with WinRM.
Creating tmp directory if it does not exist
Executing init script

C:\Users\Administrator>echo
ECHO is on.
init script ran successfully
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar C:\Windows\Temp\remoting.jar -workDir C:/Users/Administrator/jenkins
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3107.v665000b_51092
Launcher: EC2WindowsLauncher
Communication Protocol: Standard in/out
This is a Windows agent

@peterzhuamazon
Copy link
Member Author

Since cypress can be cached by npm install during ftrepo run, and does have windows native exe anyway, I will remove the cypress installation from the script for now.

@peterzhuamazon
Copy link
Member Author

The cypress is cached in %localappdata%\Cache (\AppData\Local\Cache) on windows by default which is different from ~/.cache in nix*.

Seems related:

@peterzhuamazon
Copy link
Member Author

Still not fully fix the old one takes 6010seconds while the new one takes 9010seconds or so.
Even if it login it will still error out due to timeout.

<pre id="out"><div>
Executing init script
</div><div>
C:\Users\Administrator&gt;echo
ECHO is on.
init script ran successfully
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar C:\Windows\Temp\remoting.jar -workDir C:/Users/Administrator/jenkins
</div><div>&lt;===[JENKINS REMOTING CAPACITY]===&gt;Remoting version: 3107.v665000b_51092
Launcher: EC2WindowsLauncher
</div><div>Communication Protocol: Standard in/out
</div><div>This is a Windows agent
</div><div>




<h2>HTTP ERROR 404 Not Found</h2>


URI: | /manage/computer/EC2%20%28Amazon%5Fec2%5Fcloud%29%20%2D%20jenkinsAgentNode%2DJenkins%2DAgent%2DWindows2019%2DX64%2DC54xlarge%2DSingle%2DHost%20%28i%2D009b281124464ab3a%29/logText/progressiveHtml
-- | --
404
Not Found
Stapler

</div></pre>

@peterzhuamazon
Copy link
Member Author

Seems like the delay is caused by adding scoop cache rm --all for some reason. Remove the line significantly boost the speed of startup.

@peterzhuamazon
Copy link
Member Author

I combined the above steps and increase timeouts of boot up on Windows Agent, eventually get both it connected on Windows after 16-20min each boot, slightly longer than previous boot time of less than 15min.

Next time we can invest into pure server core windows, or even nano to improve time, we are currently using base version.

@peterzhuamazon peterzhuamazon added v2.8.0 and removed untriaged Issues that have not yet been triaged labels May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working packer v2.8.0 windows
Projects
Development

No branches or pull requests

1 participant