Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session Manager SSH hanging on shell provisioner #10584

Closed
artis3n opened this issue Feb 7, 2021 · 2 comments
Closed

Session Manager SSH hanging on shell provisioner #10584

artis3n opened this issue Feb 7, 2021 · 2 comments

Comments

@artis3n
Copy link
Contributor

artis3n commented Feb 7, 2021

Overview of the Issue

I am attempting to create an AMI using the amazon-ebs builder. My provisioners are an Ansible playbook, a shell that restarts the server, then another shell to verify everything is ok after reboot. When using ssh_interface: session_manager, Packer freezes trying to open a new SSH session for the final shell provisioner (works fine on the first two). I can start a new Session Manager session through the AWS console to the packer builder machine during this period where it hangs locally.

I can change the ssh_interface to public_ip and the AMI build completes in ~31 minutes. The hanging is consistent at the same place when I use session_manager.

This seems materially different than these existing issues with similar-sounding titles - #10424 , #10508

Reproduction Steps

  1. Create Packer file. If it matters, I am using the new HCL2 format.
  2. Run PACKER_LOG=1 packer build wiki.pkr.hcl
  3. Observe the build hangs at the final shell provisioner
==> Personal Wiki.amazon-ebs.wiki: Pausing 1m0s before the next provisioner...
==> Personal Wiki.amazon-ebs.wiki: Provisioning with shell script: /tmp/packer-shell694354012
2021/02/06 20:00:40 packer-provisioner-shell plugin: Opening /tmp/packer-shell694354012 for reading
2021/02/06 20:00:40 packer-provisioner-shell plugin: [INFO] 66 bytes written for 'uploadData'
2021/02/06 20:00:40 [INFO] 66 bytes written for 'uploadData'
2021/02/06 20:00:40 packer-builder-amazon-ebs plugin: [DEBUG] Opening new ssh session

Packer version

➜ packer version              
Packer v1.6.6

Simplified Packer Buildfile

Latest version of the file can be found here.
I receive the error with the file included below, in case I've materially changed the file since creating this issue.

Packer HCL2 setup
source "amazon-ebs" "wiki" {
  access_key              = var.aws_access_key
  secret_key              = var.aws_secret_key
  ami_description         = "Gollum wiki hosted on AWS"
  ami_name                = "${var.ami_name}-${local.timestamp}"
  ami_virtualization_type = "hvm"
  iam_instance_profile    = var.iam_instance_profile
  instance_type           = var.instance_type[var.architecture]
  region                  = var.aws_region
  ssh_interface           = "session_manager"
  ssh_username            = var.ec2_username

  launch_block_device_mappings {
    delete_on_termination = true
    device_name           = "/dev/xvda"
    encrypted             = true
    kms_key_id            = var.kms_key_id_or_alias
    volume_size           = var.disk_size
    volume_type           = var.disk_type
    throughput            = var.disk_throughput
    iops                  = var.disk_iops
  }

  source_ami_filter {
    filters = {
      architecture        = var.architecture
      name                = "amzn2-ami-hvm*"
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["amazon"]
  }

  tags = {
    Base_AMI      = "{{ .SourceAMI }}"
    Base_AMI_Name = "{{ .SourceAMIName }}"
  }
}

build {
  sources = ["source.amazon-ebs.wiki"]
  name    = "Personal Wiki"

  provisioner "shell" {
    inline = [
      "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done",
      "echo Beginning to build ${build.ID}",
      "echo Connected via SSM at '${build.User}@${build.Host}:${build.Port}'"
    ]
  }

  provisioner "shell" {
    inline = [
      "sudo yum update -y",
      "sudo yum install -y python3 python3-pip python3-wheel python3-setuptools coreutils shadow-utils yum-utils"
    ]
  }

  provisioner "ansible" {
    galaxy_file      = "packer/ansible/requirements.yml"
    host_alias       = "wiki"
    playbook_file    = "packer/ansible/main.yml"
    user             = var.ec2_username
    ansible_env_vars = ["ANSIBLE_VAULT_PASSWORD_FILE=${var.ansible_vault_pwd_file}"]
  }

  provisioner "shell" {
    inline = ["sudo reboot"]
    expect_disconnect = true
  }

  provisioner "shell" {
    inline       = ["echo ${build.ID} rebooted, done provisioning"]
    pause_before = "1m"
  }

}

# "timestamp" template function replacement
locals { timestamp = regex_replace(timestamp(), "[- TZ:]", "") }

variable "ec2_username" {
  type        = string
  description = "The username of the default user on the EC2 instance."
  default     = "ec2-user"
}

variable "ami_name" {
  type        = string
  description = "The name of the AMI that gets generated."
  default     = "packer-gollum-wiki"
}

variable "architecture" {
  type        = string
  description = "The type of source AMI architecture: either x86_64 or arm64."
  default     = "arm64"
}

variable "aws_access_key" {
  type        = string
  description = "AWS_ACCESS_KEY_ID env var."
  default     = env("AWS_ACCESS_KEY_ID")
}

variable "aws_region" {
  type        = string
  description = "The AWS region to create the image in. Defaults to us-east-2."
  default     = "us-east-2"
}

variable "aws_secret_key" {
  type        = string
  description = "AWS_SECRET_ACCESS_KEY env var."
  default     = env("AWS_SECRET_ACCESS_KEY")
  sensitive   = true
}

variable "disk_size" {
  type        = number
  description = "The size of the EBS volume to create."
  default     = 15
}

variable "disk_type" {
  type        = string
  description = "The type of EBS volume to create. Defaults to gp3."
  default     = "gp3"
}

variable "disk_throughput" {
  type        = number
  description = "The MB/s of throughput for the EBS volume. For GP3 volumes, this defaults to 125."
  default     = 125
}

variable "disk_iops" {
  type        = number
  description = "The IOPS for the EBS volume. For GP3 volumes, this defaults to 3000."
  default     = 3000
}

variable "iam_instance_profile" {
  type        = string
  default     = "AmazonSSMRoleForInstancesQuickSetup"
  description = "IAM instance profile configured for AWS Session Manager. Defaults to the default AWS role for Session Manager."
}

variable "instance_type" {
  type        = map(string)
  description = "The type of EC2 instance to create. Defaults are set for x86_64 and arm64 architectures. Overwrite the one that you want by architecture."
  default = {
    "x86_64" : "t3.micro",
    "arm64" : "t4g.micro"
  }
}

variable "kms_key_id_or_alias" {
  type        = string
  description = "The KMS key ID or alias to encrypt the AMI with. Defaults to the default EBS key alias."
  default     = "alias/aws/ebs"
}

variable "ansible_vault_pwd_file" {
  type        = string
  description = "The relative or absolute path to the Ansible Vault password file."
  default     = env("ANSIBLE_VAULT_PASSWORD_FILE")
}

Operating system and Environment details

➜ cat /etc/lsb-release                 
DISTRIB_ID=Pop
DISTRIB_RELEASE=20.10
DISTRIB_CODENAME=groovy
DISTRIB_DESCRIPTION="Pop!_OS 20.10"

PopOS should be equivalent to Ubuntu.

Log Fragments and crash.log files

2021/02/06 19:59:40 [INFO] (telemetry) Starting provisioner shell
==> Personal Wiki.amazon-ebs.wiki: Pausing 1m0s before the next provisioner...
==> Personal Wiki.amazon-ebs.wiki: Provisioning with shell script: /tmp/packer-shell694354012
2021/02/06 20:00:40 packer-provisioner-shell plugin: Opening /tmp/packer-shell694354012 for reading
2021/02/06 20:00:40 packer-provisioner-shell plugin: [INFO] 66 bytes written for 'uploadData'
2021/02/06 20:00:40 [INFO] 66 bytes written for 'uploadData'
2021/02/06 20:00:40 packer-builder-amazon-ebs plugin: [DEBUG] Opening new ssh session

Once I pressed ctrl+c to end the command execution, I got the following logs. Not sure if that's expected with user interruption or if there are useful nuggets in here.

2021/02/06 20:54:30 packer-provisioner-shell plugin: Received interrupt signal (count: 1). Ignoring.
Cancelling build after receiving interrupt
2021/02/06 20:54:30 packer-provisioner-shell-local plugin: Received interrupt signal (count: 1). Ignoring.
2021/02/06 20:54:30 Cancelling builder after context cancellation context canceled
    Personal Wiki.amazon-ebs.wiki: Terminate signal received, exiting.
2021/02/06 20:54:30 packer-provisioner-shell plugin: Received interrupt signal (count: 1). Ignoring.
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: Received interrupt signal (count: 1). Ignoring.
2021/02/06 20:54:30 packer-provisioner-ansible plugin: Received interrupt signal (count: 1). Ignoring.
2021/02/06 20:54:30 packer-provisioner-shell plugin: Received interrupt signal (count: 1). Ignoring.
==> Personal Wiki.amazon-ebs.wiki: Terminating the source AWS instance...
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: [ERROR] ssh session open error: 'ssh: unexpected packet in response to channel open: <nil>', attempting reconnect
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: [DEBUG] reconnecting to TCP connection for SSH
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: Cancelling provisioning due to context cancellation: context canceled
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: Cancelling hook after context cancellation context canceled
    Personal Wiki.amazon-ebs.wiki: Exiting session with sessionId: terraform-0a5c79bb8713db77e.
2021/02/06 20:54:30 Cancelling provisioner after context cancellation context canceled
2021/02/06 20:54:30 packer-builder-amazon-ebs plugin: [DEBUG] handshaking with SSH
    Personal Wiki.amazon-ebs.wiki: Cannot perform start session: write tcp 192.168.1.162:59110->52.95.19.43:443: write: broken pipe
2021/02/06 20:54:30 packer-provisioner-shell plugin: Retryable error: Error uploading script: ssh: handshake failed: read tcp 127.0.0.1:36674->127.0.0.1:8772: read: connection reset by peer
2021/02/06 20:54:30 [INFO] (telemetry) ending shell
==> Personal Wiki.amazon-ebs.wiki: Cleaning up any extra volumes...
==> Personal Wiki.amazon-ebs.wiki: No volumes to clean up, skipping
==> Personal Wiki.amazon-ebs.wiki: Deleting temporary security group...
==> Personal Wiki.amazon-ebs.wiki: Deleting temporary keypair...
2021/02/06 20:55:17 [INFO] (telemetry) ending 
==> Wait completed after 1 hour 22 minutes
2021/02/06 20:55:17 [INFO] (telemetry) Finalizing.
@ghost
Copy link

ghost commented Mar 29, 2021

This issue has been automatically migrated to hashicorp/packer-plugin-amazon#28 because it looks like an issue with that plugin. If you believe this is not an issue with the plugin, please reply to hashicorp/packer-plugin-amazon#28.

@ghost
Copy link

ghost commented Apr 29, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 29, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants