Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider Install Failure when using Azure CloudShell's CloudDrive #17115

Closed
WodansSon opened this issue Jan 16, 2018 · 7 comments
Closed

Provider Install Failure when using Azure CloudShell's CloudDrive #17115

WodansSon opened this issue Jan 16, 2018 · 7 comments

Comments

@WodansSon
Copy link

WodansSon commented Jan 16, 2018

Terraform Version

terraform v0.11.0

Configuration File

# Configure the Azure Provider
provider "azurerm" { }

Crash Output

jeffrey@Azure:~/clouddrive$ terraform init
 
Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "azurerm" (1.1.0)...
 
Error installing provider "azurerm": chmod .terraform/plugins/linux_amd64/terraform-provider-azurerm_v1.0.1_x4: operation not permitted.

Expected Behavior

Provider should have installed without error.

Actual Behavior

Installing provider error occurred.

Steps to Reproduce

  1. terraform init

Additional Context

This issue happens when using the Azure CloudShell's CloudDrive, the CloudDrive is an SMB mounted volume where CHMOD is not supported which causes this issue when installing any provider on the CloudShell instance. This appears to be an issue in the hashicorp/go-getter code base, which I believe could be fixed by either:

  • Checking the drive type/permissions on the resource before issuing the CHMOD command.
  • Displaying a more descriptive error message of what the actual issue is, meaning that the resource is an SMB drive which doesn't support file system modifications.
@apparentlymart
Copy link
Contributor

Hi @jeffreyCline! Sorry for this annoying error and thanks for reporting it.

I just reproduced this in Cloud Shell when running in a directory under /usr/team/clouddrive.

Terraform here is trying to apply the executable mode to the plugin binary so it can be executed as a child process of Terraform. Indeed, it works when run in the home directory of the Cloud Shell user, creating a binary with the executable flag set.

I see that the mount settings for /usr/team/clouddrive include the following relevant settings:

uid=0,noforceuid,gid=0,noforcegid,file_mode=0777,dir_mode=0777,nounix,mapposix

The file_mode setting here does seem like it'd allow the plugin binaries to execute if they were installed without setting the executable mode, and indeed it seems like that works if I copy my already-initialized work tree into the clouddrive directory:

team@Azure:~/clouddrive$ cp -r ~/terraform-config .
team@Azure:~/clouddrive$ cd terraform-config/
team@Azure:~/clouddrive/terraform-config$ ls -lah .terraform/plugins/linux_amd64/
total 12M
drwxrwxrwx 2 root root   0 Jan 17 01:03 .
drwxrwxrwx 2 root root   0 Jan 17 01:03 ..
-rwxrwxrwx 1 root root  80 Jan 17 01:03 lock.json
-rwxrwxrwx 1 root root 12M Jan 17 01:03 terraform-provider-null_v1.0.0_x4

So with all of this said, a possible fix here would be to test to see if the extracted file already has the executable mode set before trying to set it. This is a little tricky because on a different system the forced mode might be something more ambiguous, like 0755, which as an executable bit set but leaves two of them unset. I think we'll need to experiment a bit to see what makes the most sense here.

An extra tricky detail here is that it looks like this mode is coming from the archive metadata, and thus go-getter doesn't really "know" that the goal is just to set the executable bit, and is instead trying to apply verbatim the mode given in the archive. We could investigate here a more complex behavior where it checks if the current mode is at least as permissive as the requested mode, but we need to tread carefully here because go-getter is also used by other programs such as Nomad, and they may have different needs. (Nomad in particular uses it to download artifacts into a job filesystem, where exact compliance with the requested mode is generally desirable.)

@WodansSon
Copy link
Author

WodansSon commented Jan 18, 2018

@apparentlymart Another possible idea for a fix would be to move the check up into the Terraform code base. That way we wouldn't have to worry about regressions in the go-getter project due to code changes. We could add some additional validation checks in the get.go Get function when it calls i.install and an error is returned.

When an error is returned we could do the below additional checks:

  • Verify the provider was actually downloaded in to the directory
  • Verify the executable bit has been set on the provider file or is at least as permissive as the requested mode
  • Update the error message to a more descriptive error message depending on what error is returned from the install call

If all of the checks pass we could safely ignore the error and continue execution else return PluginMeta{}. err as the current behavior.

Would you by any chance happen to have an ETA for this fix as it is a blocking issue for many of our core CloudShell scenarios.

@apparentlymart
Copy link
Contributor

I notice the reference here from Azure/terraform-azurerm-network#10 which raises the fact that this is a more general problem than just plugin installation: anything done with go-getter (which also includes module installation) would run into this issue, because go-getter expects that (when it's running on Linux, at least) the underlying filesystem will support Unix-style modes on directory entries.

go-getter's goal here, then, is to make the on-disk mode match the mode given in the archive, or fail if it cannot accurately represent the archive contents. (One might also wonder, for example, what happens if a plugin or module archive contains a file whose name is valid within usual Unix naming but invalid on Windows, such as containing a question mark.)

Since I think any fix here must support both provider and module installation (since both are key features of Terraform that Cloud Shell users will presumably want), it seems like the only reasonable path here would be to make that chmod failure be silently ignored rather than an error, and then have Terraform check after the fact whether an installed plugin binary is marked as executable.

For modules, this would mean that any module containing executable files (e.g. a script intended to be run using the external data source or a local-exec provisioner) may fail to execute if the working directory is on a filesystem that doesn't allow the executable bit to be set, but it's unlikely the user would get that far because the plugin installer have already failed having detected that some provider cannot be run.

This breaking change to go-getter feels a little more palatable since it will continue to work the same in all non-error cases. In situations where we would previously have returned that explicit chmod error, instead in most cases there will be some downstream behavior when the calling application tries to do something with the extracted files, which is worse UX (now the user must make the mental leap that they are working on a filesystem that doesn't support chmod, rather than it being explicit) but not too hard for the calling application to either compensate for or document.

I'm not able to given an ETA on fixing this at the moment since we need to first discuss the above proposal with the teams working on other go-getter-calling applications, but if you raise it via the usual internal communication channels then I'm sure we can prioritize this relative to other Azure-specific work.

@metacpp
Copy link

metacpp commented Feb 9, 2018

@apparentlymart any update on this issue ? We have been asked by customers for this issue again and again.

@WodansSon
Copy link
Author

@apparentlymart I don't think there is anything for the Hashicorp engineers to do here, I think the root cause of this issue is the way the CloudShell team is mounting the cifs share. I looked at their bash file and they do not have the noperm argument. I will work with the CloudShell team to add this argument to the mount call which should solve this issue.

@apparentlymart
Copy link
Contributor

Thanks for following up, @jeffreyCline!

Given that outcome, I'm going to close this issue for now. It does sound like noperm would do the trick here.

@ghost
Copy link

ghost commented Apr 4, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants