Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform can't access private providers from TFC organizations with underscores in their names #32694

Open
nfagerlund opened this issue Feb 15, 2023 · 12 comments · May be fixed by hashicorp/terraform-registry-address#20

Comments

@nfagerlund
Copy link
Collaborator

nfagerlund commented Feb 15, 2023

Terraform Version

1.4.0-beta2

Terraform Configuration Files

terraform {
  required_providers {
    fakerancher = {
      source  = "app.staging.terraform.io/example_corp/fakerancher"
      version = ">= 0.0.1"
    }
  }

  cloud {
    hostname = "app.staging.terraform.io"
    organization = "example_corp"
    workspaces {
      name = "beta-jam"
    }
  }
}

Debug Output

(Omitting, since I already found the code in question.)

Expected Behavior

Terraform should be able to download and use providers published in any valid organization namespace, from the public registry or from TFC/TFE's private registry.

Actual Behavior

If a TFC organization has underscores in its name (edit: or doubled hyphens --), Terraform cannot access any providers it publishes:

Initializing Terraform Cloud...
There are some problems with the configuration, described below.

The Terraform configuration must be valid before initialization so that
Terraform can determine which modules and providers need to be installed.
╷
│ Error: Invalid provider namespace
│
│   on main.tf line 4, in terraform:
│    4:       source  = "app.staging.terraform.io/example_corp/fakerancher"
│
│ Invalid provider namespace "" in source "app.staging.terraform.io/example_corp/fakerancher": must contain only
│ letters, digits, and dashes, and may not use leading or trailing dashes"
╵

(Note also the malformed error message, with an empty string in place of the provided namespace.)

I've tracked this down to tfaddr.ParseProviderPart, in the hashicorp/terraform-registry-address package. It uses idna hostname parsing rules to parse namespace and name segments, which I'm pretty sure is invalid given the semantics of those segments; in addition to excluding valid org names, it also flattens the case of the address, which can result in self-inflicted 404s for namespaces that include capital letters.

It should instead match the behavior of our own registries, which basically comes down to [0-9A-Za-z][\w\-]*[0-9A-Za-z].

Steps to Reproduce

  1. terraform init (that's all)

Additional Context

No response

References

No response

@nfagerlund nfagerlund added bug new new issue not yet triaged labels Feb 15, 2023
@nfagerlund
Copy link
Collaborator Author

WIP on a fix for this, but getting the issue filed before I move much further.

@jbardin
Copy link
Member

jbardin commented Feb 16, 2023

@nfagerlund, IIRC that limitation was intentional, because the namespace was required to be a valid GitHub organization name, which also disallows underscore (At least one reason is because the organization must also be a valid hostname for GitHub pages). There may be other considerations I'm not remembering, but this was at least intentional, though the empty error string is definitely a bug.

@nfagerlund
Copy link
Collaborator Author

nfagerlund commented Feb 16, 2023

@jbardin Ah, that clears up some of the mystery of why we did this. Thank you!

However, that "matches GitHub username" restriction isn't applicable to Terraform Cloud, which uses the TFC org name (underscores and all) as the registry namespace for private providers.

I think this still stands as a bug; having a secret "surprise" restriction on TFC org names (where we don't prevent you from doing it but we punish you for it by making your registry inaccessible) isn't really the move. And loosening the restriction in the name validator seems like a more pragmatic solution than retroactively outlawing underscores and renaming everyone's organizations.

@jbardin jbardin added upstream and removed new new issue not yet triaged labels Feb 16, 2023
@apparentlymart
Copy link
Member

It was more like "we need to find a reasonable baseline for what is allowed that fits well with a variety of systems" and GitHub was one source of consideration (due to how the public registry works, since this protocol was originally designed for that) but not the only consideration.

I believe the specific rule we implemented here was to reuse the rule for segments within a hostname because we already needed the logic for validating and normalizing those for the hostname portion anyway, and it has good support for non-latin scripts that folks outside the US might need to use, and it seemed like a reasonable thing to reuse because it had had lots of standards org scrutiny.

It's unfortunate that Terraform Cloud and Enterprise went in a different direction here. Clearly we should talk to each other more. 😖

I don't see any way around this other than to loosen it either, since I don't really see how we can retroactively make a "valid" organization name invalid, but that probably means we're going to need to copy a bunch of code out of the external IDNA library we use so that we can implement "the IDNA spec except for this one special exception".

@apparentlymart
Copy link
Member

Note that we cannot retroactively make this case-sensitive because that will be a breaking change for any registry addresses that are not on case-sensitive systems. These identifiers are a key part of how Terraform tracks provider identity between configuration, plan, and state so we cannot change the meaning of any address that is already valid; we can only make new addresses valid that were not valid before.

@nfagerlund
Copy link
Collaborator Author

@apparentlymart OK, point taken about the case-collapsing; Radek raised a similar concern on my registry-address PR. 👍🏼 (And on further thought, TFC can work around that for org names if it needs to.)

A belated realization: underscores aren't the only problem, the IDNA restriction on double-hyphens is also a no-go for TFC. On production, there's a mid-double-digit number of real organizations with -- in their names.

Do we really want to keep applying all the really complex stuff the IDNA library does to these path segments? My understanding was that those transformations were specifically intended only for use on hostnames, and that the path portion of a URL (or URL-like address) was specific to the semantics of the target system. Doing exotic transformations like canonicalization of alternate combining forms doesn't seem generalizable to the way any arbitrary service might handle its path routing... but maybe I'm misunderstanding something!

@apparentlymart
Copy link
Member

apparentlymart commented Feb 16, 2023

We need to be able to define what it means for two addresses to be "equal" in a way that isn't US-centric. The IDNA processing rules are a predefined collection of Unicode algorithms that happen to do that well for identifiers like hostnames; technically it's nameprep we are relying on here, but the IDNA library is the easiest way to use it in Go.

We don't necessarily need to use the particular bundle of algorithms that the IDNA spec requires, but we do still need to make sure both that we don't change the meanings of any addresses that are already valid (making them non-equal when they weren't before or vice-versa) and we need to make sure we don't punish folks whose languages use non-ASCII characters.

I don't think we should try to reinvent this ourselves because the Unicode folks already put a lot of time and energy into figuring this stuff out. I think the safest option is to loosen a few specific rules that we forgot to handle in Terraform Cloud but otherwise keep using the IDNA definition of "equal".

From what you've learned so far it seems like the smallest change is:

  • Allow underscores.
  • Allow dashes in any position and in any combination (including at the start, end, and multiple consecutive dashes).

It looks like we can get there by using a custom IDNA profile that has both "CheckHyphens" and "StrictDomainName" disabled.

We should also probably consider changing Terraform Cloud so it won't allow creating any new organizations whose names would compare equal to an existing one, though of course if some already exist then it is too late for them now.

(Note that Terraform CLI does the normalization before making any requests to a registry API, so a registry is only required to respond to already-normalized forms, not to do that normalization dynamically on incoming requests. Doing this normalization is what avoids every implementation having to reimplement the same normalization on the server, and ensures that there's only one correct way to encode each unicode character in the wire protocol.)

@qwerty1979bg
Copy link

A related problem (shall I open a new GH issue or it can be tracked here?) is having uppercase character(s) in the “provider” part of the module name. Terraform CLI does not accept it, but you can happily publish such module in Terraform Cloud and I couldn't find this restriction stated anywhere in the documentation.

For example, this code fails to init, because of the uppercase W in Whatever:

module "Some-Module" {
  source  = "app.terraform.io/test_org/Some-Module/Whatever"
  version = "0.0.1"
}
│ Module "Some-Module" (declared at main.tf line 1) has invalid source address "app.terraform.io/test_org/Some-Module/Whatever":
│ invalid target system "Whatever": must be between one and 64 ASCII letters or digits.
│ 
│ Terraform assumed that you intended a module registry source address because you also set the argument "version", which applies only to
│ registry modules.

Omitting the "version" argument, gives a slightly different, but not much helpful error:

│ Error: Invalid module source address
│ 
│ Module "Some-Module" (declared at main.tf line 1) has invalid source address "app.terraform.io/ticket_82774/Some-Module/Whatever":
│ Terraform cannot detect a supported external module source type for app.terraform.io/ticket_82774/Some-Module/Whatever.

@nfagerlund
Copy link
Collaborator Author

@qwerty1979bg Oh oof, that's a good catch. Fwiw, module and provider source addresses go through totally separate code paths in terraform (and the go module it defers to).

@jbardin
Copy link
Member

jbardin commented Feb 16, 2023

Yeah, that is a different codepath which is even more restrictive for similar reasons. The “system” segment must also be a valid provider local name.

@apparentlymart
Copy link
Member

Let's talk about the module address syntax in a separate issue. The constraints on the "target system" portion are far tighter than anything else in these addresses for annoying historical reasons, so I think worth separating that discussion so we can work through those tradeoffs and figure out what ought to change to make that consistent.

@laurenolivia
Copy link
Contributor

Hello! Jumping in for a second. It sounds like the two paths forward are:

  1. (smallest change) loosen a few specific rules that we forgot to handle in Terraform Cloud but otherwise keep using the IDNA definition of "equal"
  2. consider changing Terraform Cloud so it won't allow creating any new organizations whose names would compare equal to an existing one

Thank you @apparentlymart for the feedback.

This may actually be a product-level conversation, so the CLI Team will begin those conversations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants