Remote Exec Failing Even After Successful Execution of the Script #18517

ponvino · 2018-07-23T02:58:20Z

Terraform Version

Terraform v0.11.7

* provider.null: version = "~> 1.0"
* provider.template: version = "~> 1.0"

Terraform Configuration Files

 resource "null_resource" "pr13_remote_exec_0" {
    count = "1"
    
    provisioner "file" {
        content      = "${element(data.template_file.pr13_template_0.*.rendered, count.index)}"
        destination  = "/tmp/remote-exec.sh"
        connection {
            type     = "ssh"
            user     = "povijayan"
            private_key = "${file("/Users/povijayan/.ssh/id_rsa")}"
            host     = "${element(var.hosts_0,count.index)}"
        }
    }

    provisioner "remote-exec" {
        inline = [
            "sudo mkdir -p /opt/test/remote-exec-scripts",
            "sudo cp -Rvf /tmp/remote-exec.sh /opt/test/remote-exec-scripts/",
            "sudo chmod +x /opt/test/remote-exec-scripts/remote-exec.sh",
            "sudo /opt/test/remote-exec-scripts/remote-exec.sh"
        ]
        connection {
            type     = "ssh"
            user     = "povijayan"
            private_key = "${file("/Users/povijayan/.ssh/id_rsa")}"
            host     = "${element(var.hosts_0,count.index)}"
            timeout = "30m"
        }
    }
}

Debug Output

Error: Error applying plan:

1 error(s) occurred:

null_resource.pr13_remote_exec_0: error executing "/tmp/terraform_1857918131.sh": wait: remote command exited without exit status or exit signal

Expected Behavior

Terraform remote exec should complete remote exec without any errors since remote exec script used having proper exit codes.

Actual Behavior

Remote exec script which we are executing using above configuration is failing with above error.
When checked the script execution log it got executed fine and ended with proper exit code, but still remote exec is failing.

We are able to reproduce this with simple script like below(but it is not consistent). Not able to understand the reason for this error. Please help here.

#! /bin/bash
exec > >(tee /var/log/test.log|logger -t test -s 2>/dev/console) 2>&1

sleep 10m

exit 0

Steps to Reproduce

terraform init
terraform apply

The text was updated successfully, but these errors were encountered:

ghost · 2018-08-22T13:01:54Z

This could be due to general SSH keepalive logic. (http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html)

If there is not response from your bash script in a specific time, your connection broken due to KeepAlive logic. And the script continue to work on server. But you can't handle.

To solve it, you can change your sshd_config file (/etc/ssh/sshd_config) on your server with ClientAliveInterval and ClientAliveCountMax parameters.

My provider DigitalOcean has cloud-init feature. (https://cloudinit.readthedocs.io/en/latest/). I send the cloud-config data in user_data parameter (This parameter works on DigitalOcean droplets) to change sshd_config file when cloud initialization. For example;

#cloud-config
write_files:
  - content: |
        ...
        ...
        ...
        ClientAliveInterval 120
        ClientAliveCountMax 720
    path: /etc/ssh/sshd_config

It means the client keeps alive in 120x720 seconds (1 day) without doing anything. Server sends 720 empty packet per 120 seconds to the client. I think best way is cloud-init feature to solve this problem.

If your provider has not this feature, you can solve with provisioners. For example;

resource <YOUR_PROVIDER_OR_NULL_RESOURCE> <RESOURCE_NAME> {
    ...
    ...
    ...

    provisioner file {
        destination = "/etc/ssh/sshd_config"
        source      = "<YOUR_SSHD_CONFIG_FILE_PATH>"
    }
    
    provisioner remote-exec {
        inline = [
            "systemctl restart sshd", # This works Centos. If you use another OS, you must change this line.
        ]
    }
}

resource null_resource <RESOURCE_NAME> {
    connection {
        ...
        ...
        ...
    }

    provisioner remote-exec {
        inline = [
             # The following commands to test. You can remove these commands and write your commands.
             "sleep 10m",
             "echo COMPLETED"
        ]
    }
}

I hope solve your problem.

lancerkind · 2020-05-15T03:44:20Z

So I'm assuming I'd need to make the change on the server in a separate, short remote_exec in order to prepare for my complicated provisioning script which will run on the next remote_exec session.

danieldreier · 2020-10-14T17:41:28Z

If I'm understanding the history on this correctly, this appears to be due to the lack of ssh keepalive in 0.11.x. It was added and now I don't think this should happen anymore. If anyone is still seeing this behavior, please leave a note here, ideally with a reproduction case in 0.13.x or 0.14.0 pre-releases. Otherwise, I'll close this around the second 0.14 beta and consider it resolved.

gsinghab2 · 2020-10-19T00:12:17Z

I am still seeing this issue with RedHat7.

bit-factor · 2020-10-25T23:17:07Z

I do. Ubuntu 20.04.1 LTS (Focal Fossa)

KesavanKing · 2020-12-15T11:18:21Z

I too face the issue

anupugalavat · 2020-12-15T11:29:45Z

Hello,

I am also facing the same issue in Ubuntu18 version can you please look into it ASAP.

marinomeneghel · 2021-01-01T19:00:04Z

Also experiencing this issue on Ubuntu 20.04.
In my case it seems that a combination of docker run and docker exec as the two last commands in remote-exec are the cause. Removing the docker exec command the issue is not happening, which makes the keepalive bit mentioned above sound like the culprit.

In case anyone still has issues after the changes to keepalive, I noticed that explicitly exiting with status 0 at the end of the remote-exec block also works, not sure whether it might cause false-positives in some cases tho

  provisioner "remote-exec" {
    inline = [
      ....
      "exit 0"
    ]
  }

EDIT @ersoyfilinte's solution of setting ClientAliveInterval and ClientAliveCountMax through cloud-config worked for me!

danieldreier · 2021-01-27T15:07:16Z

I want to apologize for the slow response time on this issue, and also let you know that I am bulk-closing all issues exclusively reported against Terraform 0.11.x, including this issue, because we are no longer investigating issues reported against Terraform 0.11.x. In most cases, when we try to reproduce issues reported against 0.11, we either can't reproduce them anymore, or the reporter has moved on, so we believe we can better support the Terraform user community by prioritizing more recent issues.

Terraform 0.12 has been available since May of 2019, and there are really significant benefits to adopting it. I know that migrating from 0.11 to versions past 0.12 can require a bit of effort, but it really is worth it, and the upgrade path is pretty well understood in the community by now. 0.14 is available and stable, and we are quickly approaching an 0.15 release.

We have made a significant effort in the last year to stay on top of bug reports; we have triaged almost all new bug reports within 1-2 weeks for 6+ months now. If you are still experiencing this problem, please submit a new bug report with a reproduction case that works on 0.14.x, link this old issue for context, and we will triage it.

ghost · 2021-02-27T01:52:03Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

mildwonkey added bug provisioner/remote-exec labels Jul 23, 2018

spion06 mentioned this issue Nov 9, 2018

Add ssh keepalives to ssh communicator #19339

Closed

hashibot added the v0.11 Issues (primarily bugs) reported against v0.11 releases label Aug 29, 2019

danieldreier added the waiting for reproduction unable to reproduce issue without further information label Oct 14, 2020

danieldreier closed this as completed Jan 27, 2021

ghost locked as resolved and limited conversation to collaborators Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Exec Failing Even After Successful Execution of the Script #18517

Remote Exec Failing Even After Successful Execution of the Script #18517

ponvino commented Jul 23, 2018 •

edited

Loading

ghost commented Aug 22, 2018 •

edited by ghost

Loading

lancerkind commented May 15, 2020

danieldreier commented Oct 14, 2020

gsinghab2 commented Oct 19, 2020

bit-factor commented Oct 25, 2020

KesavanKing commented Dec 15, 2020

anupugalavat commented Dec 15, 2020

marinomeneghel commented Jan 1, 2021 •

edited

Loading

danieldreier commented Jan 27, 2021

ghost commented Feb 27, 2021

Remote Exec Failing Even After Successful Execution of the Script #18517

Remote Exec Failing Even After Successful Execution of the Script #18517

Comments

ponvino commented Jul 23, 2018 • edited Loading

Terraform Version

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

ghost commented Aug 22, 2018 • edited by ghost Loading

lancerkind commented May 15, 2020

danieldreier commented Oct 14, 2020

gsinghab2 commented Oct 19, 2020

bit-factor commented Oct 25, 2020

KesavanKing commented Dec 15, 2020

anupugalavat commented Dec 15, 2020

marinomeneghel commented Jan 1, 2021 • edited Loading

danieldreier commented Jan 27, 2021

ghost commented Feb 27, 2021

ponvino commented Jul 23, 2018 •

edited

Loading

ghost commented Aug 22, 2018 •

edited by ghost

Loading

marinomeneghel commented Jan 1, 2021 •

edited

Loading