Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packer provisioners not completing entirely #9757

Open
ghost opened this issue Aug 12, 2020 · 12 comments
Open

Packer provisioners not completing entirely #9757

ghost opened this issue Aug 12, 2020 · 12 comments

Comments

@ghost
Copy link

ghost commented Aug 12, 2020

When filing a bug, please include the following headings if possible. Any
example text in this template can be deleted.

Overview of the Issue

A paragraph or two about the issue you're experiencing.
Not sure if there is a current card/solution for this, but I am experiencing issues with my packer provisioners not completing/skipping to the next provisioner before completing. This doesnt happen during every run but it does happen frequently. My provisioners are a series of powershell scripts that execute at time of AMI bake in AWS. At times it will recognize the script and run the provisioner flawlessly and at other times completely skip. Example below of the skip:
image

In the example above install_dependencies.ps1 have a bunch of Write-Host messages that arent writing to log and after checking the server the services that are supposed to be installed are also missing on the ec2.

Reproduction Steps

Steps to reproduce this issue

  • Reproducing isnt always possible due to its sporadic success, but when it fails to provision a script, it is when it is being called from a packerfile.json from a provisioners list of scripts to run.

Packer version

From packer version

Simplified Packer Buildfile

If the file is longer than a few dozen lines, please include the URL to the
gist of the log or use the Github detailed
format

instead of posting it directly in the issue.

Operating system and Environment details

OS, Architecture, and any other information you can provide about the
environment.

  • A windows 2016 server

Log Fragments and crash.log files

Include appropriate log fragments. If the log is longer than a few dozen lines,
please include the URL to the gist of the log or
use the Github detailed format instead of posting it directly in the issue.

Set the env var PACKER_LOG=1 for maximum log detail.

@nywilken
Copy link
Contributor

Thanks for the bug report @travrat. We'll try to take a look as soon as we can.

In the meantime, is it possible to get a redacted log PACKER_LOG=1 packer build template.json and a sample of your template so that we can try to reproduce. If you could provide a sample of the script you are running that may help.

As for this being an open issue we have seen other issues, which may still be open, about the provisioner exiting too soon. That may be able to provide some helpful tips.

@SwampDragons
Copy link
Contributor

I wonder if this is a duplicate of #4916. Are you setting $LastExitCode in your script? If it exits without a failing code, Packer doesn't know it's failed.

@ghost
Copy link
Author

ghost commented Aug 13, 2020

@nywilken PACKER_LOG=1 has been added:

·[1;32m==> amazon-ebs: Pausing 15s before the next provisioner...·[0m
658 | ·[1;32m==> amazon-ebs: Provisioning with Powershell...·[0m
659 | ·[1;32m==> amazon-ebs: Provisioning with powershell script: ./iac/Packer/install_dependencies.ps1·[0m
660 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Opening ./iac/Packer/install_dependencies.ps1 for reading
661 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Uploading env vars to c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1
662 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 [INFO] 76 bytes written for 'uploadData'
663 | 2020/08/13 15:16:47 [INFO] 76 bytes written for 'uploadData'
664 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Uploading file to 'c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1'
665 | 2020/08/13 15:16:48 packer: #< CLIXML
666 | 2020/08/13 15:16:48 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 #< CLIXML
667 | 2020/08/13 15:16:48 packer: 2020/08/13 15:16:48 [INFO] 1515 bytes written for 'uploadData'
668 | 2020/08/13 15:16:48 [INFO] 1515 bytes written for 'uploadData'
669 | 2020/08/13 15:16:48 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 2020/08/13 15:16:48 Uploading file to 'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1'
670 | 2020/08/13 15:16:49 packer: #< CLIXML
671 | 2020/08/13 15:16:49 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 #< CLIXML
672 | 2020/08/13 15:16:49 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 2020/08/13 15:16:49 [INFO] starting remote command: powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1; &'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1'; exit $LastExitCode }"
673 | ·[0;32m amazon-ebs: Grabbing the software from bucket...·[0m
674 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] command 'powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1; &'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1'; exit $LastExitCode }"' exited with code: 0
675 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] RPC endpoint: Communicator ended with: 0
676 | 2020/08/13 15:16:53 [INFO] 0 bytes written for 'stderr'
677 | 2020/08/13 15:16:53 [INFO] 37 bytes written for 'stdout'
678 | 2020/08/13 15:16:53 [INFO] RPC client: Communicator ended with: 0
679 | 2020/08/13 15:16:53 [INFO] RPC endpoint: Communicator ended with: 0
680 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] 37 bytes written for 'stdout'
681 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] 0 bytes written for 'stderr'
682 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] RPC client: Communicator ended with: 0
683 | 2020/08/13 15:16:53 [INFO] (telemetry) ending powershell
684 | 2020/08/13 15:16:53 [INFO] (telemetry) Starting provisioner file
·[1;32m==> amazon-ebs: Pausing 15s before the next provisioner...·[0m

Sample packer.json:

{
  "variables": {
    "build_version": "1.0.0",
    "aws_instance_type": "t2.medium",
    "aws_region": "us-east-1",
    "aws_source_ami": "",
    "aws_security_group_id": "sg-exampleBlah",
    "aws_subnet_id": "subnet-exampleBlah",
    "aws_vpc_id": "vpc-exampleBlah",
    "iam_instance_profile": "exampleBlah",
    "pipeline_exec_id": "",
    "pipeline_name": "exampleBlah-XXXXX"
  },

  "builders": [
    {
      "type": "amazon-ebs",
      "region": "exampleBlah-XXXXX",
      "source_ami": "exampleBlah-XXXXX",
      "instance_type": "exampleBlah-XXXXX",
      "temporary_key_pair_name": "exampleBlah-XXXXX",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "user_data_file": "./iac/Packer/packer_userdata.ps1",
      "winrm_use_ssl": true,
      "winrm_insecure": true,
      "winrm_port": XXXX,
      "windows_password_timeout": "20m",
      "iam_instance_profile": "{{user `iam_instance_profile`}}",
      "security_group_id": "{{user `aws_security_group_id`}}",
      "subnet_id": "{{user `aws_subnet_id`}}",
      "vpc_id": "{{user `aws_vpc_id`}}",
      "ami_name": "exampleBlah-XXXXX",
      "ami_description": "exampleBlah-XXXXX",
    }
  ],
  "provisioners": [
    {
      "type": "powershell",
      "pause_before": "15s",
      "script": "./iac/Packer/install_dependencies.ps1"
    },
    {
      "type": "file",
      "pause_before": "15s",
      "source": "./Ec2-AutoScallingGroup-SampleApp/",
      "destination": "C:\\inetpub\\wwwroot"
    },
    {
      "destination": "C:\\Program Files\\Amazon\\SSM\\Plugins\\awsCloudWatch\\AWS.EC2.Windows.CloudWatch.json",
      "source": "./iac/Packer/CloudWatch/AWS.EC2.Windows.CloudWatch.json",
      "type": "file",
      "pause_before": "15s"
    },
    {
      "type": "powershell",
      "pause_before": "120s",
      "script": "./iac/Packer/install_aws_services.ps1"
    }
  ],

  "post-processors": [
    {
      "type": "manifest",
      "output": "manifest.json",
      "strip_path": true
    }
  ]
}

install_dependencies.ps1 script is nothing fancy pretty simple:

#  Get sample app file from s3
try{
    Write-Host("Grabbing the software from bucket...")
    copy-s3object -bucketname packerSampleScript -keyprefix packerSampleScript1 -localfolder c:\packerSampleScript;
    Write-Host("Successfully copied software from bucket...")
}catch{
    write-host ("Failed to copy windows sample app file...")
}
get-childitem -Path c:\packerSampleScript

# Install sample software
try{
    Write-Host ("Installing sample app service...")
    $params = @{
        Name = "SampleApp"
        BinaryPathName = "c:\packerSampleScript\packerSampleScript.exe"
        DisplayName = "SampleApp"
        StartupType = "Automatic"
        Description = "This is the sample app service."
        }
    New-Service @params

}catch{
    Write-Host "Failed to create sample app service..."
}
# Start Sample App service
try{
    Write-Host "Starting sample app service..."
    Start-Service -Name SampleApp
    Get-Service -Name SampleApp
}catch{
    Write-Host "Failed to start Sample App service..."
}

@ghost ghost removed stage/waiting-reply labels Aug 13, 2020
@ghost
Copy link
Author

ghost commented Aug 13, 2020

@SwampDragons we are not

@SwampDragons
Copy link
Contributor

Okay, so that suggests to me this is a duplicate of #4916 and what's happening is that your script is failing, but not telling Packer there's an error, so Packer just says "exit status is 0, so all must be good" and moves on to run the next provisioner rather than returning an error and aborting the build.

The other possibility is that your install_dependencies.ps1 script is triggering a reboot which causes Packer to disconnect before the script has finished running. This is a known issue; winrm doesn't really handle disconnects and reconnects nicely.
We recommend using https://www.packer.io/docs/provisioners/windows-restart to handle reboots so Packer knows when it's rebooting and is able to wait for the reboot to complete. This means using a different provisioner between each reboot.

@ghost
Copy link
Author

ghost commented Aug 19, 2020

@SwampDragons thanks for the advice will present what I find, at the moment Im noticing after removing some winRM stops from the provisioner script taking place prior it isn't helping Im going to try some other things and let you know thanks

@ghost ghost removed the stage/waiting-reply label Aug 19, 2020
@tylernap
Copy link

I have been working with @travrat on this for a while and it looks like we do have a race condition between the instance userdata and the provisioners. In order to get the instance joined to our domain, we need to configure WinRM and then restart the service. I suspect that this would be why the script is suddenly stopping and continuing. We are currently looking at adding more time in the pause_before parameter to allow the userdata to properly do its thing.

In the meantime, is there a way for Packer to be a little more strict or nuanced on how it handles WinRM interruptions? Or is there anyway that Packer can find out the status of the userdata execution?

@ghost
Copy link
Author

ghost commented Aug 20, 2020

also @SwampDragons is there a builder we have to add to:

  "builders": [
    {
      "type": "amazon-ebs",
      "region": "exampleBlah-XXXXX",
      "source_ami": "exampleBlah-XXXXX",
      "instance_type": "exampleBlah-XXXXX",
      "temporary_key_pair_name": "exampleBlah-XXXXX",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "user_data_file": "./iac/Packer/packer_userdata.ps1",
      "winrm_use_ssl": true,
      "winrm_insecure": true,
      "winrm_port": XXXX,
      "windows_password_timeout": "20m",
      "iam_instance_profile": "{{user `iam_instance_profile`}}",
      "security_group_id": "{{user `aws_security_group_id`}}",
      "subnet_id": "{{user `aws_subnet_id`}}",
      "vpc_id": "{{user `aws_vpc_id`}}",
      "ami_name": "exampleBlah-XXXXX",
      "ami_description": "exampleBlah-XXXXX",
    }

that allows us to say if the $lastExitCode is 1 stop the build?
If I define $lastExitCode in my winRM script to determine failure or not what builder or identifier does Packer use to determine that it is a 1 or 0?

@SwampDragons
Copy link
Contributor

@travrat I believe that Inside your ps script, you need to call "exit $lastExitCode", and Packer will pick up the exit code from there. We have an option valid_exit_codes which you can use to set which codes will be considered successful. By default, exiting with anything other than a 0 will lead Packer to determine that the provisioning run was a failure.

@tylernap there isn't really any way for Packer to know the userdata has finished running. I'd recommend having your user data script create a file , e.g. finished.txt, and have the first part of your first provisioner check for that file and wait until it's present before continuing. Something like while (!(Test-Path "C:\finished.txt")) { Start-Sleep 10 }

Packer uses a third party library for its winrm connection, and it could probably use some improvements. I'm not sure when I'll be able to prioritize that work, though.

Probably a better choice than pause_before in the provisioner is the similar-but-different pause_before_connecting communicator option. Set this to a long enough wait so that it waits until after you domain join and this should allow you to refresh your winrm connection. There was a recent bug I fixed in that option so you may want to wait to try this until the 1.6.2 release later this week, or use the "nightly" release: https://github.com/hashicorp/packer/releases/tag/nightly

@jcarasick-geog8r
Copy link

I have the same (or similar) problem. I have a packer build file trying to provision a windows 2019 aws instance. If i use the "Administrator" account that comes with the EC2 as the winrm user and allow packer to determine the password, the provisioning executes in a serial fashion. If i create a local user in the Administrators group, and use that username/password in the winrm user and password, i get the provisioners that don't seem to complete (they seem to literally just stop executing or get cutoff), logs do not track any failure the process just stops. If i add in to the build file, "pause_before": "60m", for example in the following step, the previous provisioner will complete what seems to be successfully.

@kintarowins
Copy link

I'm running into a very similar problem and adding "windows-restart" provisioner AFTER my dependencies installation script seems to be helping. This was very annoying as it doesn't show you any messages or clues as to why it just skips straight to finishing. Took me almost a day of trial and error to finally found this thread.

@FlorianHeigl
Copy link

FlorianHeigl commented Sep 1, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants