Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to Cargo for alternative registries #2141

Merged
merged 25 commits into from Sep 29, 2017

Conversation

@carols10cents
Copy link
Member

carols10cents commented Sep 6, 2017

Rendered

Tracking issue

This RFC built on previous work done in RFC 2006. The biggest difference is that this RFC includes a specification for the index format that any registry will need to conform to. Another difference is that this RFC proposes configuring registry locations once, in a .cargo/config, rather than multiple times in each project, both to avoid duplication and to discourage including credentials in each project.

@natboehm and @shepmaster also worked on this RFC :)

# Rationale and Alternatives
[alternatives]: #alternatives

A [previous RFC](https://github.com/rust-lang/rfcs/pull/2006) proposed having the registry

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 6, 2017

Member

The Java ecosystem has gone the other direction. Gradle requires that you specify all of your upstream repositories in your build.gradle, and Maven supports both configuration in the project itself and at the user level.

It seems kind of messy for the dev setup instructions to go from "clone the repo" to "clone the repo, add these registries to your ~/.cargo/config, and make sure the names agree across all of the projects you're working on".

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

This comment has been minimized.

Copy link
@sedrik

sedrik Sep 6, 2017

One of the points for having the .cargo/config outside of the repository is to avoid checking authentication information into the code-base. From my view this would be a way to support private registries for closed source projects and the common use case is most likely that you will have one internal registry and use crates.io for all publicly available code.

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 6, 2017

Member

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 12, 2017

Author Member

@sfackler:

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

It continues looking and unifies all of them. I just made a PR to cargo's docs to make this more readily apparent.

@sedrik:

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

That sounds like a great idea! I'll add a note about that :)

@sfackler

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

You're right that usernames and passwords should probably go in .cargo/credentials instead of .cargo/config, I'll make that change. Right now, only the token to authenticate to a registry's API is stored in .cargo/credentials, so this RFC will be adding the ability to specify a username and password to enable access to either a registry index or an API.

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 13, 2017

Member

Cool, as long as it's something that can be checked into the repo and doesn't totally suppress user-level configuration I'm on board.

@sedrik

This comment has been minimized.

Copy link

sedrik commented Sep 6, 2017

Thanks for proposing this support. This is a blocker for any kind of Rust adoption at my employer (sadly it does not guarantee that we will adopt rust).

Has there been a discussion about supporting organizations and private repositories in crates.io similar to how npmjs does it?


```toml
[dependencies]
secret-crate = { version = "1.0", registry = "my-registry" }

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 6, 2017

Member

It'd be nice to support a short form of this for convenience:

"my-registry/secret-crate" = "1.0"

This comment has been minimized.

Copy link
@est31

est31 Sep 7, 2017

Contributor

@sfackler could we have that syntax reserved for crate namespacing?

Instead I'd propose: [dependencies.my-registry] secret-crate = "1.0".

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 7, 2017

Member

What would this mean in that setup?

[dependencies.foobar]
version = "1.0"

Is it a crate called "version" at 1.0 in the "foobar" registry or a crate called "foobar" in the default registry at version 1.0?

This comment has been minimized.

Copy link
@est31

est31 Sep 7, 2017

Contributor

Oh dumb me, that syntax already has a meaning... What about this then:

[registry.my-registry.depdendencies]
secret-crate="1.0"

This comment has been minimized.

Copy link
@est31

est31 Sep 10, 2017

Contributor

Another alternative (loosely based on how URLs work):

"//my-registry/secret-crate" = "1.0"

```toml
[registry.$choose-a-name]
index = "https://username:password@my-intranet:8080/index"

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 6, 2017

Member

It seems like credentials should live separately. We've recently moved the crates.io publish token out of .cargo/config.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Sep 6, 2017

Awesome RFC @carols10cents (et al!).

I have a branch of cargo which I believe implements this, though I haven't tested it thoroughly. The only pertinent difference I'm aware of is in the format for declaring a new registry. What I went with was:

  • I called the table registries instead of registry; I believe the registry name is already used in .cargo/config for something else (possibly deprecated? I don't recall at the moment).
  • I supported both just making the registry a key to a URL and having an object with an index member like you propose here.

e.g:

[registries]
foobar = "https://github.com/foobar-co/foobar-index"

[registries.bazquux]
index = "https://github.com/bazquux-org/bazquux-index"

Another possible format choice would be to instead support a syntax like this in the toml, instead of having a registry key in the dependency object itself:

[registry.foobar-co.dependencies]
# all the dependencies in this table come from foobar-co

My branch doesn't implement that, but its worth considering, since it makes it easier to add more dependencies from that alternate registry.

it is possible to have a local crates.io server which crates can be pushed to, while still making
use of the public crates.io server.

We would also like to support the use of crates.io mirrors. These differ from alternative

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 7, 2017

Member

How would mirrors work in this setup? We'd need some way to say that a registry "acts as" https://github.com/rust-lang/crates.io-index, right?

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 12, 2017

Author Member

Oops. I forgot to put in details about mirrors. I'm starting to think that could be separate from this RFC-- we already support source replacement but I want to extend it so you can list multiple mirrors and cargo will automatically fall back if one is inaccessible. That's starting to feel separate, so I'm going to take this paragraph about mirrors out.

@Erik-S

This comment has been minimized.

Copy link

Erik-S commented Sep 7, 2017

This is a feature that's important for corporate use, so I'll chime in with my experience.

Storing the passwords in the working directory is generally a bad idea, because in corporate environments the working directory is often on a share drive with fairly open permissions. (Even allowing them to be stored there isn't ideal, because someone will make a mistake.)

The best solution is to put username and password into the OS-specific keystore (GNOME Keyring / Windows Credentials Management / Apple Keychain).

If I read #3978 correctly, Cargo access tokens are already stored in ~/.cargo/credentials. Putting passwords there wouldn't be ideal, but would be much better than the working directory or main Cargo config file.

Storing the username and password as part of the URL is very inflexible. In the future, we may want to support Kerberos/SAML/LDAP/etc logon, so storing the USER/PASSWORD/AUTH_TYPE as separate fields is a good idea.

Ideally the user name would not be in the same file as the registry-name to URL mapping, so the mapping file can be checked in and it will "just work" within a company LAN.

@bbatha

This comment has been minimized.

Copy link

bbatha commented Sep 11, 2017

I mentioned this on #2006 and rust-lang/cargo#4208 but I want to make sure that it doesn't get lost in the shuffle. I want to make sure that when new registries are specified that it is possible to specify the full hostname and root path for the registry and not just the host name. For multiple repository hosting solutions like nexus and artifactory it needs to be possible to specify a path as well. For instance, artifactory hosts npm repos at https://host.company.com/api/npm/private-repo so you can host multiple repo types and multiple repos for the same language. Specifying just the host should have a good default but it should be overridable.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Sep 11, 2017

@bbatha You specify the url of the registry index, which is required to contain the url of the backing store. It is not possible to specify just the hostname.

For example, crates.io would be declared:

[registry.crates-io]
index = "https://github.com/rust-lang/crates.io-index"
- `name`: the name of the crate
- `vers`: the version of the crate this row is describing
- `deps`: a list of all dependencies of this crate
- `cksum`: a checksum of this version's files

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Sep 11, 2017

Member

s/this version's files/the tarball downloaded/

{
"name": "serde",
"req": "^1.0",
"registry": "https://crates.io",

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Sep 11, 2017

Member

Should this, like allowed-registries above, specify the index rather than this URL?

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 12, 2017

Author Member

Yeah, probably.

specifying the list of registries that are allowed with `cargo publish`.

```
publish-registries = ["my-registry"]

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Sep 11, 2017

Member

Cargo currently has a publish = false key for totally disallowing publishing, I wonder if we could perhaps overload it?

publish = true # default, publish to crates.io
publish = false # don't publish this anywhere
publish = [] # don't publish this anywhere
publish = ["https://some-other-registry.com"] # publish somewhere other than crates.io

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 12, 2017

Author Member

I thought about that, do TOML/serde support different types like that???

This comment has been minimized.

Copy link
@sfackler

sfackler Sep 13, 2017

Member

Yeah - being able to do that kind of thing was one of the main advantages of serde over rustc-serialize. A simple way of doing it is via the "untagged" enum representation: https://serde.rs/enum-representations.html

This comment has been minimized.

Copy link
@alexcrichton

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 15, 2017

Author Member

TIL!

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 11, 2017

In the detailed design section there's a note of related issues:

In order to make working with multiple registries more convenient, we would also like to support

Just to be clear, though, this RFC isn't specifically proposing solutions to these? Are they possible future extensions?

(I'd be fine adding solutions for them to this RFC, I think they may be all relatively trivially fixable)

@carols10cents

This comment has been minimized.

Copy link
Member Author

carols10cents commented Sep 12, 2017

@sedrik

Has there been a discussion about supporting organizations and private repositories in crates.io similar to how npmjs does it?

crates.io is likely to remain open source only, but stay tuned :)

@carols10cents

This comment has been minimized.

Copy link
Member Author

carols10cents commented Sep 12, 2017

@withoutboats

I have a branch of cargo which I believe implements this, though I haven't tested it thoroughly. The only pertinent difference I'm aware of is in the format for declaring a new registry. What I went with was:

Awww I was close!!! I like what you've implemented though, I'm going to update this to go with yours :)

@carols10cents

This comment has been minimized.

Copy link
Member Author

carols10cents commented Sep 15, 2017

@Ericson2314

I'd also like to see a self registry---meaning "access this dependency of mine the same way as you accessed me":

For remote crates, that means the same registry / combination of registries in which the current crate is resolved will be used to the resolve the dep.
For local crates, will also be resolved from the current workspace.

The self registry is import so that the crates in alternate registries don't need to know their own name: a huge pain if stuff needs to be moved around later.

I think moving a crate and its dependencies to a different location is already supported by the separation of registry names from registry locations. If I have this crate:

[package]
name = "big-co-api"
version = "0.5.0"
publish = ["big-co"]

[dependencies]
hyper = "0.11" // from crates.io
custom-protocol = { version = "0.1.0", registry = "big-co" } // this crate does not have any dependencies

And this .cargo/config:

[registries.big-co]
index = "http://big-co.com/index"

And then BigCo decides to use artifactory instead of an internally hosted registry, in order to move these crates, the devs at BigCo who publish these crates will need to update their .cargo/config to say:

[registries.big-co]
index = "http://big-co.artifactory.com/index"

Then publish all their crates again, starting with the leaves in the dependency graph (so in this case, custom-protocol first and then big-co-api). But each crate doesn’t need to be updated.

Am I missing something?


Could the RFC mention how this fits along side vendoring and mirrors? Perhaps it would be nice to discuss those a bit, so we can be sure everything looks nice together and the division of labor is clear. I straight-up forget how those work, so the interaction isn't clear to me off-hand.

I've added a note with links to those features' documentation and specified that this RFC does not need or propose changes to them.


Also, can publish-registries be stripped on publishing? Consumers of packages don't need it so one is just giving them noise---or worse, leaking the names of private registries if public and private are whitelisted.

I don't think the name of a private registry is sensitive information?

@carols10cents

This comment has been minimized.

Copy link
Member Author

carols10cents commented Sep 15, 2017

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Sep 15, 2017

Team member @carols10cents has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Currently, the knowledge of how to create a file in the registry index format is spread between
Cargo and crates.io. This RFC proposes the addition of a Cargo command that would generate this
file locally for the current crate so that it can be added to the git repository using a mechanism
other than a server running crates.io's codebase.

This comment has been minimized.

Copy link
@matklad

matklad Sep 16, 2017

Member

Hm, for this use-case, we'll also need a way to make a .crate file manually. This is already handled by cargo package. Then perhaps cargo package could create both a .crate tarbol, and a .json index metadata?

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 18, 2017

Author Member

I could see us rolling the metadata into package eventually, yeah. I think we should try having them separate at first, cargo already has enough things tangled up with each other that could be independent ;)

@matklad

This comment has been minimized.

Copy link
Member

matklad commented Sep 16, 2017

@rfcbot reviewed

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Sep 18, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@tomwhoiscontrary

This comment has been minimized.

Copy link

tomwhoiscontrary commented Sep 20, 2017

This seems pretty cool. I really like the idea of putting the registry names in the checked-in project, and the name-to-address mapping in the environment!

However, if i work for a paranoid company that wants all crate downloads to come from an internal registry, and i want to build some random project i've cloned off Github, can i do that?

That is, if i have a project whose Cargo.toml contains this:

[dependencies]
byteorder = "1.0.0"

Can i force Cargo to go to repo.initech.com rather than crates.io to get it?

I got the impression on reading the RFC that i wouldn't be able to do that. AIUI, the only way to get a crate to come from a specific registry is to say so in the dependency declaration. I would have to say:

[dependencies]
byteorder = { version = "1.0.0", registry = "initech-internal" }

Happily, i don't work for such a paranoid company, so i can get public crates from crates.io and internal crates from some internal registry. But in the past, i have worked for companies where this would not have flown. So, if this isn't currently possible, could we have it? Perhaps we could define a name for crates.io ("default", "crates-io", "pub", whatever), and say that will be used by default. Then i could get those crates from my internal registry by redefining the address that name maps to.


A valid registry index meets the following criteria:

- The registry index is stored in a git repository so that Cargo can efficiently fetch incremental

This comment has been minimized.

Copy link
@tomwhoiscontrary

tomwhoiscontrary Sep 20, 2017

(1) Can the address of the index be a file: URL, or a plain file path? That could be really useful for setting up a local repository. I've done this a few times in the Java world. It could also be useful in reptilian corporate environments where it's easy to put something on a shared drive, but much harder to stand up a server.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

This comment has been minimized.

Copy link
@carols10cents

carols10cents Sep 20, 2017

Author Member

(1) Can the address of the index be a file: URL, or a plain file path?

Yes indeed, this is how I publish to a local instance of crates.io when developing, actually.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

For now, we're going to stay with git; being able to send only the delta of changes rather than the whole change is a huge win. While you might only have a few crates to start with, you might have more later, or just more versions of those few crates.

Git includes straightforward ways to run a server, if it's within your firewall and unauthenticated, it's not bad at all.

This comment has been minimized.

Copy link
@tomwhoiscontrary

tomwhoiscontrary Sep 28, 2017

Please accept my belated thanks for your reponse! The point of writing a server would be to proxy to our existing infrastructure, so being able to run a Git server doesn't really help. But being able to use local indices addresses most of of the internal use cases i can imagine, so that shouldn't matter.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Sep 20, 2017

@tomwhoiscontrary repository mirrors were originally discussed a bit in this RFC but have since been pulled out. There'll presumably be a follow-up RFC to deal with that use case.

@carols10cents

This comment has been minimized.

Copy link
Member Author

carols10cents commented Sep 20, 2017

@tomwhoiscontrary

Can i force Cargo to go to repo.initech.com rather than crates.io to get it?

Cargo already supports source replacement, so you are able to do this today! 🎉

What isn't supported yet is being able to list multiple mirrors and automatically falling back to whichever is available. Running a mirror, whether pre-emptively caching everything on crates.io or only caching what's requetsed, is also not simple right now. As @sfackler noted, neither of these concerns are especially related to the changes in this RFC.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Sep 28, 2017

The final comment period is now complete.

@aturon aturon merged commit 206318a into rust-lang:master Sep 29, 2017

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Sep 29, 2017

This RFC has been merged! Tracking issue.

Thanks @carols10cents, @natboehm and @shepmaster!

@carols10cents carols10cents deleted the integer32llc:alternative-registries branch Oct 2, 2017

@Centril Centril added the A-registry label Nov 23, 2018

@przygienda

This comment has been minimized.

Copy link

przygienda commented Feb 21, 2019

Doing some work my observations:

  • "empty string" means now historically = crates.io as registry. this is risky since it will prevent third-party registries to defined in "allowed-registries" whether crates is allowed or not unless they add an empty string to it which looks strange ...
  • are crates ID unique across registries? I don't think that's possible. with that "copying" across regitries will become untracktable because names can clash
@przygienda

This comment has been minimized.

Copy link

przygienda commented Feb 21, 2019

Writing code & struggling to see what the schema would look like if multiple registries are involved, most importantly:

  • what is the primary key of a registry, it can't be the GIT URL since that moves, neither SHA of first commt, that's too risky

Suggestion: add to config.json a registry-id which is something like an UUID where mirrors of same registry and moving same registry can be recognized ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.