Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Exec Failing Even After Successful Execution of the Script #18517

Closed
ponvino opened this issue Jul 23, 2018 · 10 comments
Closed

Remote Exec Failing Even After Successful Execution of the Script #18517

ponvino opened this issue Jul 23, 2018 · 10 comments
Labels
bug provisioner/remote-exec v0.11 Issues (primarily bugs) reported against v0.11 releases waiting for reproduction unable to reproduce issue without further information

Comments

@ponvino
Copy link

ponvino commented Jul 23, 2018

Terraform Version

Terraform v0.11.7

* provider.null: version = "~> 1.0"
* provider.template: version = "~> 1.0"

Terraform Configuration Files

 resource "null_resource" "pr13_remote_exec_0" {
    count = "1"
    
    provisioner "file" {
        content      = "${element(data.template_file.pr13_template_0.*.rendered, count.index)}"
        destination  = "/tmp/remote-exec.sh"
        connection {
            type     = "ssh"
            user     = "povijayan"
            private_key = "${file("/Users/povijayan/.ssh/id_rsa")}"
            host     = "${element(var.hosts_0,count.index)}"
        }
    }

    provisioner "remote-exec" {
        inline = [
            "sudo mkdir -p /opt/test/remote-exec-scripts",
            "sudo cp -Rvf /tmp/remote-exec.sh /opt/test/remote-exec-scripts/",
            "sudo chmod +x /opt/test/remote-exec-scripts/remote-exec.sh",
            "sudo /opt/test/remote-exec-scripts/remote-exec.sh"
        ]
        connection {
            type     = "ssh"
            user     = "povijayan"
            private_key = "${file("/Users/povijayan/.ssh/id_rsa")}"
            host     = "${element(var.hosts_0,count.index)}"
            timeout = "30m"
        }
    }
}

Debug Output

Error: Error applying plan:

1 error(s) occurred:

  • null_resource.pr13_remote_exec_0: error executing "/tmp/terraform_1857918131.sh": wait: remote command exited without exit status or exit signal

Expected Behavior

Terraform remote exec should complete remote exec without any errors since remote exec script used having proper exit codes.

Actual Behavior

Remote exec script which we are executing using above configuration is failing with above error.
When checked the script execution log it got executed fine and ended with proper exit code, but still remote exec is failing.

We are able to reproduce this with simple script like below(but it is not consistent). Not able to understand the reason for this error. Please help here.

#! /bin/bash
exec > >(tee /var/log/test.log|logger -t test -s 2>/dev/console) 2>&1

sleep 10m

exit 0

Steps to Reproduce

  1. terraform init
  2. terraform apply
@ghost
Copy link

ghost commented Aug 22, 2018

This could be due to general SSH keepalive logic. (http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html)

If there is not response from your bash script in a specific time, your connection broken due to KeepAlive logic. And the script continue to work on server. But you can't handle.

To solve it, you can change your sshd_config file (/etc/ssh/sshd_config) on your server with ClientAliveInterval and ClientAliveCountMax parameters.

My provider DigitalOcean has cloud-init feature. (https://cloudinit.readthedocs.io/en/latest/). I send the cloud-config data in user_data parameter (This parameter works on DigitalOcean droplets) to change sshd_config file when cloud initialization. For example;

#cloud-config
write_files:
  - content: |
        ...
        ...
        ...
        ClientAliveInterval 120
        ClientAliveCountMax 720
    path: /etc/ssh/sshd_config

It means the client keeps alive in 120x720 seconds (1 day) without doing anything. Server sends 720 empty packet per 120 seconds to the client. I think best way is cloud-init feature to solve this problem.

If your provider has not this feature, you can solve with provisioners. For example;

resource <YOUR_PROVIDER_OR_NULL_RESOURCE> <RESOURCE_NAME> {
    ...
    ...
    ...

    provisioner file {
        destination = "/etc/ssh/sshd_config"
        source      = "<YOUR_SSHD_CONFIG_FILE_PATH>"
    }
    
    provisioner remote-exec {
        inline = [
            "systemctl restart sshd", # This works Centos. If you use another OS, you must change this line.
        ]
    }
}

resource null_resource <RESOURCE_NAME> {
    connection {
        ...
        ...
        ...
    }

    provisioner remote-exec {
        inline = [
             # The following commands to test. You can remove these commands and write your commands.
             "sleep 10m",
             "echo COMPLETED"
        ]
    }
}

I hope solve your problem.

@hashibot hashibot added the v0.11 Issues (primarily bugs) reported against v0.11 releases label Aug 29, 2019
@lancerkind
Copy link
Contributor

So I'm assuming I'd need to make the change on the server in a separate, short remote_exec in order to prepare for my complicated provisioning script which will run on the next remote_exec session.

@danieldreier
Copy link
Contributor

If I'm understanding the history on this correctly, this appears to be due to the lack of ssh keepalive in 0.11.x. It was added and now I don't think this should happen anymore. If anyone is still seeing this behavior, please leave a note here, ideally with a reproduction case in 0.13.x or 0.14.0 pre-releases. Otherwise, I'll close this around the second 0.14 beta and consider it resolved.

@danieldreier danieldreier added the waiting for reproduction unable to reproduce issue without further information label Oct 14, 2020
@gsinghab2
Copy link

I am still seeing this issue with RedHat7.

@bit-factor
Copy link

I do. Ubuntu 20.04.1 LTS (Focal Fossa)

@KesavanKing
Copy link

I too face the issue

@anupugalavat
Copy link

Hello,

I am also facing the same issue in Ubuntu18 version can you please look into it ASAP.

@marinomeneghel
Copy link

marinomeneghel commented Jan 1, 2021

Also experiencing this issue on Ubuntu 20.04.
In my case it seems that a combination of docker run and docker exec as the two last commands in remote-exec are the cause. Removing the docker exec command the issue is not happening, which makes the keepalive bit mentioned above sound like the culprit.

In case anyone still has issues after the changes to keepalive, I noticed that explicitly exiting with status 0 at the end of the remote-exec block also works, not sure whether it might cause false-positives in some cases tho

  provisioner "remote-exec" {
    inline = [
      ....
      "exit 0"
    ]
  }

EDIT @ersoyfilinte's solution of setting ClientAliveInterval and ClientAliveCountMax through cloud-config worked for me!

@danieldreier
Copy link
Contributor

I want to apologize for the slow response time on this issue, and also let you know that I am bulk-closing all issues exclusively reported against Terraform 0.11.x, including this issue, because we are no longer investigating issues reported against Terraform 0.11.x. In most cases, when we try to reproduce issues reported against 0.11, we either can't reproduce them anymore, or the reporter has moved on, so we believe we can better support the Terraform user community by prioritizing more recent issues.

Terraform 0.12 has been available since May of 2019, and there are really significant benefits to adopting it. I know that migrating from 0.11 to versions past 0.12 can require a bit of effort, but it really is worth it, and the upgrade path is pretty well understood in the community by now. 0.14 is available and stable, and we are quickly approaching an 0.15 release.

We have made a significant effort in the last year to stay on top of bug reports; we have triaged almost all new bug reports within 1-2 weeks for 6+ months now. If you are still experiencing this problem, please submit a new bug report with a reproduction case that works on 0.14.x, link this old issue for context, and we will triage it.

@ghost
Copy link

ghost commented Feb 27, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provisioner/remote-exec v0.11 Issues (primarily bugs) reported against v0.11 releases waiting for reproduction unable to reproduce issue without further information
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants