-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure] WinRM timeout with Windows 2016-Datacenter Marketplace Image #8658
Comments
Interesting. It was definitely not working for quiet some time, but now I cannot reproduce this issue anymore. Even with the latest image and with the images between November and today it is working properly. I will reopen in case I start to see this issue again. |
I still can reproduce the issue, used image
From initial troubleshooting it looks to me a certificate issue, trying to run
And when trying to connect using openssl to retrieve the certificate i'm getting
Trying to re-generate self-signed certificate and reconfigure WinRM causing packer to immediately respond to the connection
and from openssl showcerts i'm getting a correct answer
I see that packer is using azure osProfile.windowsConfiguration.winRM value in the template to configure winRM on the VM, So here i would assume that either there is an issue with creating the certificate from packer side before uploading it to azure vault, or and issue with azure that prevents the VM from configuring winRM correctly using the values from the template, this may needs more troubleshooting.
|
@AliAllomani Okay... Which region are you deploying to? Few weeks ago I was thinking that this is an image related issue, I had no time to investigate further. Today I tried to use an older image and it started to work so I opened this, but then tried with the latest as well and it was also working. Don't know what is going on. Can you try with older versions as well? Also in WestUS2? Let's try to rule these out... Reopened this for now, but you are on your own, because it is now working for me.... |
@Dilergore I'm deploying to EU West, also faced the timeout issue with the latest windows 2019-Datacenter image but not sure if it's the same issue, will do more tests from my side on different images. |
@AliAllomani It was not happening for me with 2019. Usually it takes some time to configure WinRM by default. Using bigger machine, SSD, and increasing the timeout is usually working for this problem. My setup is: In my experience even with this sometimes it takes longer than 5-6 minutes to configure it and connect to it. |
@Dilergore it seems intermittent, The common thing i find out :
packer/builder/azure/arm/config.go Line 452 in af2c434
occurrence tests done so far ( All in EU West ) :
Replacing the listener test :
|
Just wanted to add I also get intermittent WinRM timeouts using both I was using I've also increased the WinRM timeout to 1 hour, and increased VM size to |
I've been having the same issues in US West 2 for the last couple of days: 2019-Datacenter builds are fine, but 2016-Datacenter and 2012-R2-Datacenter ones intermittently fail to connect via WinRM, with 2012-R2 being the most problematic. Builds are done using smalldisk image, initially with D2sV3 vm_size and 20 minute winrm_timeout values. Increasing the VM size or timeout doesn't show any perceptible improvement. |
I can fast track this with Microsoft but without the root cause... Also it totally seems working for me (for now), so I cannot even continue testing on my own. If you guys can find out what the issue is, I am happy to engage the support. |
I just started running into this problem today. The last two weeks I've been building images to test out an automated process using Packer and did not have any issues with WinRM. I'm running Packer on the Azure DevOps Hosted Agent |
From the my tests findings i’d assume that something is going wrong within the os during the auto winrm ssl configuration by azure vm template @Dilergore i think there is no way currently available by packer to configure the builder vm to use non-ssl winrm ? |
https://www.packer.io/docs/communicators/winrm.html#winrm-communicator-options Never tried it tho... |
@Dilergore The available parameters are to define the method that the communicator use, however on the builder side i see it's hardcoded packer/builder/azure/common/template/template_builder.go Lines 90 to 99 in df031db
|
And today, just to muddy the water a bit... Yesterday evening's (1800 GMT-8) pipeline failed due to WinRM timeout on all three builds - 2012 R2, 2016, and 2019. This morning's run (0400) ran correctly. This is the first WinRM timeout I've seen using the 2019-Datacenter source. All three builds use smalldisk, DS3v2, 60m WinRM timeout. In addition, afternoon/evening builds have a much higher incidence of failure than early morning ones. |
We have the similar issue, but this is imho doesn't depend on particular windows image, and we think that this is the issue with azure platform itself. For our case a little workaround is to change instance type from |
Hi Folks, thanks for keeping this thread up to date with your latest findings. I am looking into this issue on my end to see if there is any information that can help isolate what might be happening here. I too have observed that when using certain images connecting via
This is possible, but hard to tell with the information in the logs. @Dilergore have you, or anyone on the thread, opened a support ticket with Azure around this particular issue? |
@nywilken i will open it during the weekend. Will involve some people who can help us / can route the ticket inside Microsoft. If you want to contribute please send me your mail address privately. Thanks! |
As noted in the Packer Documentation - Getting started/Build an image
Unfortunately, while this is true on AWS using the userdata script, I'm not sure how Azure builder configures WINRM and perhaps run Also, take note that Packer documentation - Communicator/WINRM stills refers to A successful workaround that I am using on AWS is to use SSH communicator on windows 2016/2019 installing SSH using the userdata following installation instructions provided by Microsoft Documentation or Microsoft openssh portable. Not sure how this would translate for Azure. |
Hi All, Is there any updates on this? I can't repro the issue for a couple of days, probably some fix was rolled out? |
@AlexeyKarpushin - I definitely still have the issue. It seems to have an Azure load component. I still get WInRM timeouts on all three platforms (2012R2, 2016, 2019.) Failure rates on the three builds is 60% or more during weekday business hours, about 10-20% weeknights and weekend day, and very rare weekend overnight. In addition, the 2019 builds have a much lower failure rate than 2016 and 2012R2. |
getting this issue intermittently on windows 2019 now as well. I suspect it may be as others have said, something to do with the time of day. Seems to work in mornings/evenings but not in core hours
|
Hi folks, sorry for the slow response here. I have not been able to reproduce this issue since Friday although I do notice that 2016-Datacenter builds take longer than other Os versions to connect via WinRM. But I don't know why that is.
@Dilergore I don't have any new information to contribute so I'll refrain from reaching out privately. But thanks for offering to include me in the thread.
For folks who are still able to reproduce the issue, when WinRM connectivity is timing out.
|
same here, I was able to reproduce just now: telnet to 5986 works, telnet to 5985 does not work. 10:02:18 �[1;32m==> azure-arm: Waiting for WinRM to become available...�[0m |
Hi All, I've created a workaround which allows our Azure DevOps pipelines to run. It doesn't solve the problem but allows to ignore it.. I can't paste the whole code here, but I can make a short description, hopefully it will be useful. The main idea is to re-create WinRM listener on the temp machine during Packer build.
I hope the issue will be mitigated in the nearest future and this workaround will not be needed. Kind regards, |
I still have issues with building Windows Server 2019 (not 2016).
region: west europe |
with the size you recommend packer has no issues with creating VMs:
However, builds became much slower with standard disks... |
Hi, I have this timeout issue which is quite sporadic but sometimes I cant create an image for a whole day and it works the next day. Lately I started to see the issue with "ssh timeout" for a ubuntu Image creation as well :( This is so frustrating. |
@danielsollondon isn't this issue simply that the certificate name is wrong as pointed out by @AliAllomani above. The certificate name expected and configured by packer is machineName.cloudapp.net vs. the actual machine name which is this format machineName.region.cloudapp.azure.com. Thoughts? |
When creating images using MarketPlace images everything works great
When trying to use an image from a Shared Gallery with the same VM size I am getting timeouts.
During the build I have opened an RDP session and confirmed that the listener, the certificate, and the firewall rule are present. I have confirmed that port 5986 does respond The image from the SIG does is configured for CIS compliance so the following settings are configured:
All other winrm service settings are set to Not configured. As far as I can tell winrm should be working on this SIG image and I'm not sure if it's possible related to this issue or if there is a seemingly unrelated policy setting that is causing issues. Any advice on this would be appreciated. |
Hi folks, thanks for keeping this thread up to date with the latest finds and test results. The Packer team has been monitoring this issue closely to see if there is anything on the Packer side that can be changed to resolve the issue. In looking at the thread I see a possible cert domain change for self-signed certs and the change in
What I found is that by using Standard_D2_v2 I was always able to get a successful build regardless of the domain name used for the self-signed cert. I did find some Azure examples where the domain used for self-signed certs is With that being said, if you're still running into issues here. Please make sure you are trying to create builds with the recommended vm_sizes #8658 (comment). If builds are still failing with the new vm sizes please attach your build configuration and debug logs via a Gist in a new comment. If someone is using the same configuration as you please just thumbs up their config to help us determine the best set of configs to test against. Thanks!
Hello @nfischer2 I suspect this may be another issue, possibily related to the custom image with some of CIS settings enabled.We've seen WinRM issues related to CIS in the past #6951. Have you had a chance to reach out to our mailing list or community forums for your issue? We have a bigger community that may have people who can answer your questions better than me. After reaching out to the community forums if you suspect that you are running into a bug please open a new issue and attach the related Packer debug logs |
Thanks for the reply. My issue was not related to this thread. After further troubleshooting I found that the registry setting I wanted to provide an update in case anyone else who may be working with Packer and CIS windows images. |
@danielsollondon The solution to use Standard_D2_v2 worked for us. If it is useful for resolving the bug, i would like to point out the things I have noted when it work vs not work. When the winrm connection works, the certificate available in the LocalMachine (imported from keyvault) has a private key associated with it, and when it does not there is no private key associated . So something is breaking while getting/importing this certificate from keyvault. This is not an issue from packer as the certificate secret passed into the ARM template in keyvault has both private key and public key.
Output from a VM that does not work
|
|
the winrm timeout with windows server 2016 is still happening. Is there an open ticket with Azure for this? |
@pmozbert what VM size are you using? Since I moved to |
@amarkulis We had to move away from Packer and now use Azure custom script extension to configure Windows CIS images for Azure. |
Agreed, the CIS benchmarks can definitely complicate matters. |
I increase vm size to Standard_DS_v2 and that worked for a while several months ago, along with a 30m timeout, but now the timeouts are back. |
Having the same level of build flakiness with |
I have built quite a few Windows 2016 images with "image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2016-Datacenter-smalldisk",
"image_version": "latest",
"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "30m",
"winrm_username": "packer",
"temp_compute_name": "swazpkr00",
"location": "WestEurope",
"vm_size": "Standard_D4s_v3",
"managed_image_storage_account_type": "Standard_LRS" |
I am getting this issue as well with
Also seeing it with Windows 10 Pro RS4:
|
hi , same issue with one of our client's WVD win 10 multisession deployments
|
This issue has been automatically migrated to hashicorp/packer-plugin-azure#38 because it looks like an issue with that plugin. If you believe this is not an issue with the plugin, please reply to hashicorp/packer-plugin-azure#38. |
Hello, We are seeing the same issue here as well. first it was a 503 error and determined the listner was set to HTTPS even though the winrm insecure parameter was set to true, we manually reset the listener to use http only and are now seeing the following error. 2021/09/10 14:16:55 packer.exe plugin: [DEBUG] connecting to remote shell using WinRM |
Please refer the end of this thread to see other users complaining that this is not working.
MicrosoftDocs/azure-docs#31188
Issue:
Started: December, 2019.
Packer cannot connect with WinRM to machines provisioned from Windows 2016 (2016-Datacenter) Marketplace image in Azure.
Further details:
WinRM timeout increase is not working. It seems the last image working is version: "14393.3326.1911120150" (Released 12th of Nov). It stopped working with "14393.3384.1912042333" (Released 10th of Dec).
This issue is only impacting 2016-Datacenter. 2019 is working properly.
To get image Details for a Region:
az vm image list --location northeurope --offer WindowsServer --publisher MicrosoftWindowsServer --sku 2016-Datacenter --all
URL to the Last Working Image:
https://support.microsoft.com/en-us/help/4525236/windows-10-update-kb4525236
URL to the Image where something went wrong:
https://support.microsoft.com/en-us/help/4530689/windows-10-update-kb4530689
Notes:
This is currently applying to North EU. I had no time to investigate in other regions but I believe the same images getting distributed to every region.
I am opening a Microsoft case and planning to update the thread with the progress.
The text was updated successfully, but these errors were encountered: