Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Official Registry #258

Closed
1 task done
cube2222 opened this issue Sep 4, 2023 · 33 comments
Closed
1 task done

Replace Official Registry #258

cube2222 opened this issue Sep 4, 2023 · 33 comments
Assignees
Labels
accepted This issue has been accepted for implementation. needs-community-input

Comments

@cube2222
Copy link
Contributor

cube2222 commented Sep 4, 2023

Due to the Public Terraform Registry ToC change, OpenTF will not use it as the default registry.

However, we still want OpenTF to be a drop-in replacement, so all references to providers and modules should work as-is. This is esp. important because people can use modules which themselves reference other modules, and those indirect references are hard to change.

This means we need to replace the official registry in OpenTF.

It's worth noting that the registry is mostly a redirector to GitHub, and is heavily coupled with GitHub concepts like Releases. Only HashiCorp provider binaries (not modules) are hosted directly from the registry.

UPDATE 2023-10-12

Alright, now that we have the alpha release behind us and a working alpha registry, it's time to start the discussion around the stable registry design! This is a design that we'll implement to make OpenTofu production-ready, and will be included in the OpenTofu 1.6 stable release.

Below you can find a list of requirements for the stable registry design. I welcome you to discuss the requirements, as well as submit RFCs for designs that are satisfying those requirements. Keep in mind that any changes we make to OpenTofu as part of that, we'll need to be backwards-compatible with in the future.

Any proposed RFCs should generally follow the RFC issue template, and should specifically and explicitly explain how they address each of the requirements listed before. Using the RFC issue template for that might be clunky (due to the large size), so feel free to use the blank issue form, just keep in mind that it should still generally follow the RFC issue template.

The deadline for submitting and discussing RFCs is the 27th of October (Friday), after which the technical steering committee will vote on the RFC to go with. However, please submit your RFCs earlier than the deadline, so that there's ample time to discuss them.

Requirements for Stable Registry Design

  • Functional
    • It must be a drop-in replacement for resolving providers and modules. That is
      • existing versions of these have to be available;
      • new versions have to be picked up without action on the provider/module author’s side.
    • It must make available a way for authors to submit/revoke/update their public keys, in a way compatible with existing provider signatures.
    • Handling of public keys and signatures must provide security gains equal to or above those of the HashiCorp registry.
      • To be more specific, in the case of the HashiCorp registry, capturing the session of an organization GitHub admin would let you add a new key and publish malicious artifacts through GitHub actions.
    • Even though we can start by hosting artifacts (providers) in GitHub releases, the solution should allow swapping out the artifact download URL to a 3rd party on a provider-by-provider basis in a provider-author-controlled way (as in, it shouldn’t require the end user to change config on their side).
      • E.g. if we see the aws provider is downloaded a lot, and GitHub release artifacts for whatever reason cannot be used anymore for it, we need to be able to switch to a CDN or other download location, without any action required from the CLI user.
    • Needs to redirect hashicorp-namespaced providers to the opentofu namespace.
    • Must allow for warnings to be attached to provider version metadata. E.g. the registry currently serves a custom warning when a user tries to fetch the long-deprecated terraform provider. See this issue for more details.
    • It must support a single “identity” running thousands of concurrent executions of Tofu (that list and fetch providers and modules) without getting rate-limited.
      • An identity could be an IP address, a logged-in user (if a login-gate is involved), a company, etc.
    • Should provide provider and module discovery and documentation via a human-readable web UI, similar to the legacy Terraform registry (this must be part of the design, but can be implemented after the stable release).
      • Worth noting that it seems like the legacy Terraform registry hosts json files with metadata to provide required information for each provider/module, and then renders it client-side based on the json file for the currently-viewed provider/module.
  • Non-Functional
    • We’ll be expected to provide high availability (multi-region failover should be viable) and high security guarantees.
    • We’d like to minimize the amount of maintenance work required to keep the registry up.
    • Implementation of the registry must be fully open-source.
  • Strong nice-to-have's
    • Download stats for each provider/module.
      • This is very useful for discoverability purposes.
    • Implements the V1 registry protocol for providers and modules.
      • This means it will work with legacy Terraform and will not require CLI-side changes in OpenTofu.
    • It should be easy to host mirrors of this registry.

UPDATE 2023-11-03

Important news!

The technical steering committee convened yesterday and has chosen the Homebrew-like registry design as the one to go with and implement.

There will be a wider summary of yesterday's meeting, but the reasoning in favour of this design was, in short:

  • Has the least maintenance burden, so that the Core OpenTofu team can focus on the CLI, and not maintaining a service.
  • Maximizes availability due to being based on static file hosting.
  • The git repository-based approach maximizes transparency, which is in line with the goals and ideals of the OpenTofu project.
  • Decoupling between the mission critical task of resolving artifacts and that of serving documentation.

Huge thanks to everybody who has submitted a design, it's very appreciated!

I will now be closing all design RFCs other than the chosen one.

Subtasks:

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 4, 2023

For the alpha release we're planning to replace the registry with our own, hosted on a domain like registry.opentf.org.

That work has two parts. First part, we need to have a registry that's a github redirector. @Yantrio owns this work and is writing a Lambda fronted by a heavily caching API Gateway, to avoid making too many calls to the GitHub API. Second part is also mirror HashiCorp's providers, which aren't hosted in GitHub releases. This work is owned by @tomasmik and @mbialon who are setting up forks of those providers that will track the source without changes, other than creating usable artifacts in releases. Those mirrors will then be used by the registry.

On the OpenTF side, we need to fork the terraform-registry-address repository and replace the default registry URL with ours. Ideally, the default registry URL should just be configurable to the user.

All in all, this should provide a fully drop-in replacement for the current registry, with all references to modules and providers working as-is. This is good enough for the alpha release.

We will reevaluate and consult the community before implementing the final solution for the stable release.

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 4, 2023

See opentofu/roadmap#24 for previous roadmap repository issue.

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 4, 2023

This issue is open for discussion now and proposing possible solutions other than the one we're currently working on for the alpha.

Once we have a satisfying solution proposal in place, we'll go through the public RFC process with it.

@avnerenv0
Copy link

Could we possibly continue/fork the work of an already existing registry like https://github.com/outsideris/citizen or https://github.com/MatthewJohn/terrareg ?

It seems like a great deal of work has already been done there, and I think in the long term we should aim at a "feature rich" repository, as it will support growth and expansion of the OTF eco-system.

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 5, 2023

It seems like a great deal of work has already been done there, and I think in the long term we should aim at a "feature rich" repository, as it will support growth and expansion of the OTF eco-system.

I think it's an open question what features we'd aim to support, and we'll have to balance usability with maintainability here.

So the question is what would we like the registry to do, in the end. I can see a few things:

  • The obvious one, so being a drop-in replacement without any prior community interaction needed.
  • We want to show documentation. For this we should be able to proxy and render docs from GitHub, in a similar manner to https://pkg.go.dev
  • We want to manage signatures, which will plausibly require people to log in with their GitHub credentials.

E.g. both of the projects you mentioned support uploading modules / providers, which is a feature not even supported in the official registry (which only uses stuff from GitHub). If we add features like this, we'd have to carefully consider how that would interact with the existing "dynamically translated" providers and modules, and I'm not sure it's worth the added complexity and developer time (in terms of development and maintenance) that we could be spending elsewhere.

Generally, I do think we should aim to limit features here to the bare minimum that's required for a good OpenTF usage experience. Ideally, we'd ultimately end up with a more decentralized approach, taking advantage of existing registries, like via the OCI Registry approach.

Note: terrareg is AGPL-licensed.

@jamengual
Copy link

When replacing the Registry please take this into consideration.

hashicorp/terraform#31134

It is pretty easy to give the ability to change the hostname for people.

@Yantrio
Copy link
Member

Yantrio commented Sep 5, 2023

Thanks for mentioning this @jamengual

To help us maintain a clear separation between opentf and hashicorp's offerings, we're asking that people describe issues that are in other repositories rather than linking those directly. Would you mind giving us a brief overview or description of this issue please?

@jamengual
Copy link

jamengual commented Sep 5, 2023

Use-cases

Make registry.opentf.org(I do not know the current URL) a configurable parameter instead of a constant to be able to use a module/submodule internally hosted registry.

When using a module like so :

module "alb" {
  source = " source = "cloudposse/alb/aws"
}

the source URL basically translates to :

source = "https://registry.opentf.org/cloudposse/alb/aws"

if the constant mentioned in L24 was configurable it would be possible to serve the .well-known/terraform.json with the URL of the module registry and index pointing to an internal repo.

Right now the registry URL is configurable BUT the problem is that when using modules in the registry that use the short notation ie. source = "cloudposse/alb/aws" and that root module calls other submodules using the short notation then the root module will pull from the internal configured registry URL by doing something like source = "pepe.myrepo.com/cloudposse/alb/aws" but the submodule will still have the short notation pointing to the registry and then the internally hosted index will not be used.

@Yantrio
Copy link
Member

Yantrio commented Sep 5, 2023

@jamengual I believe that this is a perfectly reasonable request and we'll be sure to include the discussion of this into the RFC for the registry.

I don't see any reason this couldn't be accomplished.

@Magnitus-
Copy link

Magnitus- commented Sep 6, 2023

It would be nice if the github redirector implementation would have a reusable core that is not coupled with a specific cloud solution (ex: lambda in this case) so that those of us wishing to implement our own internal mirror registries wouldn't have to reinvent the wheel (it seems like a nice opportunity to do centralized work on a well maintained common core).

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 6, 2023

@Magnitus- I wonder if for internal registries it wouldn't make sense to use something like https://github.com/outsideris/citizen, since that also handles hosting your artifacts (as opposed to this one, which will fully rely on GitHub)?

Anyway, for now we'll probably do it in a very unclean way, to get it working for the alpha as soon as possible, but if we decide via RFC that this is actually the approach we want to go with long-term, then we'll definitely be cleaning up the codebase and considerations such as this one can then be taken into account.

@Magnitus-
Copy link

Magnitus- commented Sep 6, 2023

@Magnitus- I wonder if for internal registries it wouldn't make sense to use something like https://github.com/outsideris/citizen, since that also handles hosting your artifacts (as opposed to this one, which will fully rely on GitHub)?

Anyway, for now we'll probably do it in a very unclean way, to get it working for the alpha as soon as possible, but if we decide via RFC that this is actually the approach we want to go with long-term, then we'll definitely be cleaning up the codebase and considerations such as this one can then be taken into account.

Something we (my employer, especially the team I work in) need to be mindful about is the trustworthiness of our dependencies. Ideally, the things we use come from well established entities and if not, we need to spend time auditing the source code (and its dependencies) and make sure that whatever we pull can't easily be changed upstream by an individual (either them or their hacked account).

I can of course audit and lock an individual-dependant dependency on our end and I do that sometimes, but it is extra work and that has to be replicated by anyone in a situation like mine. Its nice to have that work done under the umbrella of a well established community when possible, especially since they need it anyways.

For the dependency on Github, either it pulls on Github (either directly from the release or else clones and does processing), it pulls from the existing provider registry or someone pushes the artifact. A flexible solution might want to support all cases, but pulling the artifacts from github releases is not a bad start (I like that suggestion quite a bit for starters).

I will have to ponder about what I really want some more, but should you feel inclined to publish your "unclean" solution, there will be an interest on my end in looking at it and suggesting (via RFCs) and doing improvements to make it more generic.

@dex4er
Copy link

dex4er commented Sep 7, 2023

If OpenTF supports (will support) OCI registries then I don't mind if the default registry is Hashicorp's Terraform repository or OCI registry. All I expect is that using source = "namespace/provider" or source = "namespace/name/module" I'll still fetch my providers and modules. Wouldn't replacing this proprietary registry API with something standard would be just a quick win?

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 7, 2023

@dex4er The main problem with replacing the registry really is keeping all the providers and modules available, so that it's as seamless a transition as possible.

With the official registry being OCI, we'd need people to port their providers and modules, which is fairly unrealistic, esp. for historical versions. Our current approach would make sure that all of this is still available.

Additionally, the OCI approach would be brand new functionality and ideally we'd first introduce it as an experimental feature, and only once we're sure it's working fine and the design really is sound and proven, stabilize it. In other words, we don't want to rush it by making it a blocker for the new public registry (and thus, the stable 1.6 release of OpenTF).

@dex4er
Copy link

dex4er commented Sep 7, 2023

Ok, this is a really complex problem. Gathering all artifacts is tricky. As far as I know, Hashicorp now forbids to use of their registry from non-terraform tools. Is it legal to use the original Terraform to download all binaries with providers (terraform providers mirror)? I think so. So it might be really easy to download all 3492 providers that way. Especially if you host some platform to run Terraform you have some binaries downloaded already or at least you should know what providers are really important, and focus only on top 20%.

Later you can use direct links to binary releases for some providers: many of vendors serve complete set of binaries on github or gitlab.

Modules are even more trivial: these are just git repos with tags: you don't need to use Terraform binary to fetch them: you just need to map the module name to git repo URL, eventually git clone it and save it as an artifact.

And finally, there is a question: how to serve all these artifacts? My point is that you don't need to use any complex software for artifacts (JFrog Artifactory? cloudsmith? Gitlab?) If it will be a registry for a community, it might be some registry easy to browse, download, and mirror. OCI is a natural choice: easy to use, easy to serve, easy to mirror.

@cube2222
Copy link
Contributor Author

cube2222 commented Sep 7, 2023

As described here #258 (comment) all providers and modules other than Hashicorps are hosted on GitHub (so basically like you're saying) and the registry is just a redirector for that, so that's what we're initially doing + some special casing for those not available on GitHub.

Downloading everything from everywhere to rehost somewhere else is not trivial, esp. as new versions of providers and modules come out. Which is why we want to serve them from the place they're already at.

and focus only on top 20%

We're aiming for a 100% drop-in replacement experience.

@dex4er
Copy link

dex4er commented Sep 7, 2023

Ok, got it: then only 1% of providers are problematic: Hashicorps only. And you want to avoid mirroring anything if sending redirects to original place is just good enough.

@krsmanovic
Copy link

This open source tool might be useful as initial scaffolding layer: https://github.com/terrariumcloud/terrarium

It is GRPC based solution with DynamoDB and S3 support as a backend.

@MatthewJohn
Copy link

MatthewJohn commented Sep 12, 2023

Could we possibly continue/fork the work of an already existing registry like https://github.com/outsideris/citizen or https://github.com/MatthewJohn/terrareg ?

@avnerenv0 Hey, Work on Terrareg is still ongoing :) If you ever want anything in particular implemented, I'm happy to help out - and happy to accept any contributions :)

E.g. both of the projects you mentioned support uploading modules / providers

I'm not sure if this is meant to mean that they only support uploading modules, but if you wanted something with baked in github support, Git-based use is the main "go-to" method for Terrareg, which serves traffic straight from the git provider.

So the support for Github could be relatively simple (since configured the "github" provider can easily take the module "Namespace" as a Github username/org and the module as the (maybe with some suffix?). E.g. https://github.com/matthewjohn/terraform-module-ecs-aws can be configured to translate to:

  • Namespace: matthewjohn
  • Module: ecs
  • Provider: aws

We support OpenIDC and SAML for authentication - so wouldn't be a stretch to support Github authentication and support automated permissions to the user's namespace and the orgs that they're associated to. We've tried to include quite a few customisations to suite authentication mechanisms, RBAC etc. and, as I say, happy to investigate implementing any others ;)

There is also work on the backlog (that I'm keen to work on, if there's a use for it) to add support as a provider registry as well.

Note: terrareg is AGPL-licensed.

@cube2222 Terrareg is no longer AGPL licensed - it's GNU General Public License v3.0 :) This was changed last year - if there's somewhere that still describes it as AGPL, please let me know :)

We want to show documentation. For this we should be able to proxy and render docs from GitHub, in a similar manner to https://pkg.go.dev/

As well as providing information about the module, submodules etc, we currently pre-render several pages by default (README, LICENSE and changelog), but this is customisable.

To some degree, I should say that Terrareg was built to serve medium size traffic (we use it with ~100 modules, ~3K downloads a day), so not taxing by any means - but can certainly do some testing/investigation to serve "public" traffic, I'd imagine - though disabling analytic collection would certainly decrease load.

Anyways, I'm not trying to push in any direction - just wanted to clear up the licensing and let you know that I'm here, if you did have any questions, or did want a demo or want any features to make it more usable (anything I can do to help out) etc. :)

Many thanks

Edit: For those that are interested and haven't seen Terrareg, I've quickly stuck up a demo with a couple of modules: https://registry-demo.mattsbit.co.uk/

Edit Edit: Sorry, I've been told that the CONTRIBUTING does point out the old licensing and that we might want to change (which we did), so will update - apologies for the confusion!

@alrs
Copy link
Contributor

alrs commented Sep 12, 2023

I'll note that Terrareg is written in Python, Citizen is written in Javascript.

@cube2222 cube2222 added the accepted This issue has been accepted for implementation. label Sep 14, 2023
@x4e-jonas
Copy link

+1 for an easy to mirror solution.

The mirror command can only download single binaries but neither multiple versions nor platforms. You'd have to loop over all .terraform.lock.hcl and -platform=os_arch combinations to mirror providers or parse various JSON files. AFAIK there is no index of all plugins available, so the mirror will fail whenever you add a new dependency in your project.

The registry should provide a simple method to mirror and sync. This means, the mirror must not depend on vendor specific commands or extensive parsing of various files. Enabling directory index and serving correct timestamps would already simplify things a lot.

When rebuilding providers from source it's even more complex to build and wrap the checksums in the various JSON and (non-deterministic) ZIP files. It might be worth to also think about the rebuilding, signing and packaging process.

NixOS for example already provides their own packages for plugins. A patch is required to find them in the global directory. It's not clear to me how they deal with the different checksums in .terraform.lock.hcl though.

@Yantrio
Copy link
Member

Yantrio commented Sep 20, 2023

Hey all 👋 I'm eager to not detract from the current conversation around long term solutions but I just wanted to raise the fact that we have recently made our temporary registry solution public here: https://github.com/opentofu/registry 🎉

Feel free to provide any feedback on the repo or in here :D

Edit: Please note that we are not currently accepting contributions in this repo yet. Please hold off with any PRs for the immediate future.

Thanks!

@cube2222
Copy link
Contributor Author

The issue has been updated with new info and now includes a list of requirements for the stable registry design. We welcome you to discuss these and/or submit RFC's for it!

@Yantrio
Copy link
Member

Yantrio commented Oct 13, 2023

I've opened up an RFC for what my personal vision would be for what I believe to be a core component of the registry : [RFC] Crawler Strategy for OpenTofu Registry #722

@cube2222
Copy link
Contributor Author

Update: We've removed implementing the v2 registry protocol from the "nice to have"'s list. It's what the registry currently uses to render the UI.

Originally, we thought that implementing it would help us more easily adapt the language server and editor plugins to OpenTofu, as they use that API. However, we did some digging, and it seems like those plugins do a dump of the schema of official and partner providers and embed them into the language server binary. We're not particularly keen on that approach, and we'd prefer to just use the local OpenTofu CLI to fetch the schema of installed providers.

@cube2222
Copy link
Contributor Author

Another update:

  • Ease of hosting mirrors has been added as a strong nice to have.
  • There's a clarification that even though docs/search must be part of the design, it doesn't have to be implemented as part of the stable release.

@dgreisen
Copy link

Software supply chain security is hard. Please consider integrating something like tuf that has already solved these problems for a bunch of other ecosystems (e.g. PyPI, docker).

Their work on gittuf may be of particular interest for OpenTofu. I realize there are hard backwards-compatibility requirements, but the NYU team behind TUF is incredibly helpful and I'm sure would be happy to work with you to make sure your implementation is both backwards-compatible and maximally secure.

@cube2222
Copy link
Contributor Author

@dgreisen It's definitely something to look at for inspiration and further development.

In the beginning the goal is to have something that's at least as secure as the HashiCorp registry, and delivering that as soon as possible, so that the community can start using OpenTofu in production as soon as possible. Perfect is the enemy of good, as they say.

Once that is done, we're very happy to dive deeper into improving the security even more. One avenue we're planning to explore are keyless signatures and the https://www.sigstore.dev ecosystem. I'm not yet sure how that ties in with tuf, but will definitely do some reading around that.

If you have any concrete ideas though, you're welcome to submit an RFC, too!

@cube2222
Copy link
Contributor Author

cube2222 commented Nov 3, 2023

The technical steering committee has met yesterday and I've updated the top-level issue with a new update.

@mering
Copy link

mering commented Nov 3, 2023

Thanks for the update!

In case anyone else is wondering where to find the "top-level issue", it the issue body/description of this issue (first text block, so scroll all the way up).

@jamengual
Copy link

Use-cases

Make registry.opentf.org(I do not know the current URL) a configurable parameter instead of a constant to be able to use a module/submodule internally hosted registry.

When using a module like so :

module "alb" {
  source = " source = "cloudposse/alb/aws"
}

the source URL basically translates to :

source = "https://registry.opentf.org/cloudposse/alb/aws"

if the constant mentioned in L24 was configurable it would be possible to serve the .well-known/terraform.json with the URL of the module registry and index pointing to an internal repo.

Right now the registry URL is configurable BUT the problem is that when using modules in the registry that use the short notation ie. source = "cloudposse/alb/aws" and that root module calls other submodules using the short notation then the root module will pull from the internal configured registry URL by doing something like source = "pepe.myrepo.com/cloudposse/alb/aws" but the submodule will still have the short notation pointing to the registry and then the internally hosted index will not be used.

I think this is not on the updates of the current issue unless I read it all wrong.

@cube2222
Copy link
Contributor Author

https://github.com/opentofu/registry/milestone/2 🎉

@sap147
Copy link

sap147 commented Dec 26, 2023

Does the openTofu private module registry also provide a GUI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This issue has been accepted for implementation. needs-community-input
Projects
None yet
Development

No branches or pull requests