Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to Cargo for alternative registries #2141

Merged
merged 25 commits into from Sep 29, 2017
Merged
Changes from 10 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
29419e7
initial template for alternative registries
natboehm Aug 31, 2017
8d838be
Reference level design of the registry index format
carols10cents Aug 31, 2017
9a26559
initial template for alternative registries
natboehm Aug 31, 2017
2b94a91
preliminary summary and motivation
natboehm Aug 31, 2017
1bc07ac
merge conflict
natboehm Aug 31, 2017
5d23150
notes on defining registries
natboehm Sep 1, 2017
614e92a
suggestion on registry specification
shepmaster Sep 1, 2017
87cb28e
More detail on registry format
carols10cents Aug 31, 2017
3e50ff2
Fleshing out the rest of the RFC
carols10cents Sep 6, 2017
96b45ff
Fill in metadata
carols10cents Sep 6, 2017
3fd5009
Add note about a possible enhancement of command
carols10cents Sep 12, 2017
d775222
Change from credentials being in .cargo/config to be in .cargo/creden…
carols10cents Sep 12, 2017
01f739f
Switch the .cargo/config format to what boats has implemented
carols10cents Sep 12, 2017
3001e1e
Remove mirrors since that's seeming like a separate concern
carols10cents Sep 12, 2017
0773b29
Move related issues to end of reference-level explanation
carols10cents Sep 12, 2017
d2e5bf5
Clarify that enhancement to specify API would be a full URL
carols10cents Sep 12, 2017
92d8e68
Fix toml
carols10cents Sep 15, 2017
25fd149
Use existing key for the list of publishable registries
carols10cents Sep 15, 2017
2086856
Add a note about env var support for .cargo/credentials
carols10cents Sep 15, 2017
89e6c02
Specify the checksum algorithm
carols10cents Sep 15, 2017
18ddf78
Possibly support URLs w/o authentication directly
carols10cents Sep 15, 2017
c570bd9
Add links to documentation of related features and note they aren't c…
carols10cents Sep 15, 2017
8495202
Remove username/pw from a place I missed
carols10cents Sep 15, 2017
8d0bd58
Missed a spot where api location should be index location for now
carols10cents Sep 15, 2017
cbae4a3
RFC 2141: Add support to Cargo for alternative registries
aturon Sep 29, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
315 changes: 315 additions & 0 deletions text/0000-alternative-registries.md
@@ -0,0 +1,315 @@
- Feature Name: cargo_alternative_registries
- Start Date: 2017-09-06
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

This RFC proposes the addition of the support for alternative crates.io servers to be used
alongside the public crates.io server. This would allow users to publish crates to their own
private instance of crates.io, while still able to use the public instance of crates.io.

# Motivation
[motivation]: #motivation

Cargo currently has support for getting crates from a public server, which works well for open
source projects using Rust, however is problematic for closed source code. A workaround for this is
to use Git repositories to specify the packages, but that means that the helpful versioning and
discoverability that Cargo and crates.io provides is lost. We would like to change this such that
it is possible to have a local crates.io server which crates can be pushed to, while still making
use of the public crates.io server.

We would also like to support the use of crates.io mirrors. These differ from alternative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would mirrors work in this setup? We'd need some way to say that a registry "acts as" https://github.com/rust-lang/crates.io-index, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I forgot to put in details about mirrors. I'm starting to think that could be separate from this RFC-- we already support source replacement but I want to extend it so you can list multiple mirrors and cargo will automatically fall back if one is inaccessible. That's starting to feel separate, so I'm going to take this paragraph about mirrors out.

registries in that a mirror completely replicates the functionality and content of crates.io. A
mirror would be useful if we ever need a fallback for when crates.io goes down, or in areas of the
world where crates.io is blocked.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Registry definition specification
[registry-definition-specification]: #registry-definition-specification

We need a way to define what registries are valid for Cargo to pull from and publish to. For this
purpose, we propose that users would be able to define multiple registries in a [`.cargo/config`
file](http://doc.crates.io/config.html). This allows the user to specify the locations of
registries in one place, in a parent directory of all projects, rather than needing to configure
the registry location within each project's `Cargo.toml`. Once a registry has been configured with
a name, each `Cargo.toml` can use the registry name to refer to that registry.

Another benefit of using `.cargo/config` is that these files are not typically checked in to the
projects' source control. The registries might have credentials associated with them, which should
not be checked in. Separating the URLs and the use of the URLs in this way encourages good security
practices of not checking in credentials.

In order to tell Cargo about a registry other than crates.io, you can specify and name it in a
`.cargo/config` as follows:

```toml
[registry.$choose-a-name]
index = "https://username:password@my-intranet:8080/index"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like credentials should live separately. We've recently moved the crates.io publish token out of .cargo/config.

```

Instead of `$choose-a-name`, place the name you'd like to use to refer to this registry in your
`Cargo.toml` files. The `index` key should contain the location of the registry index for this
registry; the registry format is specified in the [Registry Index Format Specification
section][registry-index-format-specification].

### CI

Because this system discourages checking in the registry configuration, the registry configuration
won't be immediately available to continuous integration systems like TravisCI. However, Cargo
currently supports configuring any key in `.cargo/config` using environment variables instead:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also allow environment variables for .cargo/credentials?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is environment variable support for .cargo/credentials at this time, but I'll list that under related issues.


> Cargo can also be configured through environment variables in addition to the TOML syntax above.
> For each configuration key above of the form `foo.bar` the environment variable `CARGO_FOO_BAR`
> can also be used to define the value. For example the build.jobs key can also be defined by
> `CARGO_BUILD_JOBS`.

To configure TravisCI to use an alternate registry named `my-registry` for example, you can use
[Travis' encrypted environment variables feature](https://docs.travis-ci.com/user/environment-variables/#Defining-encrypted-variables-in-.travis.yml) to set:

`CARGO_REGISTRY_MY_REGISTRY_INDEX=https://username:password@my-intranet:8080/index`

## Using a dependency from another registry

*Note: this syntax will initially be implemented as an [unstable cargo
feature](https://github.com/rust-lang/cargo/pull/4433) available in nightly cargo only and
stabilized as it becomes ready.*

Once you've configured a registry (with a name, for example, `my-registry`) in `.cargo/config`, you
can specify that a dependency comes from an alternate registry by using the `registry` key:

```toml
[dependencies]
secret-crate = { version = "1.0", registry = "my-registry" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to support a short form of this for convenience:

"my-registry/secret-crate" = "1.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler could we have that syntax reserved for crate namespacing?

Instead I'd propose: [dependencies.my-registry] secret-crate = "1.0".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this mean in that setup?

[dependencies.foobar]
version = "1.0"

Is it a crate called "version" at 1.0 in the "foobar" registry or a crate called "foobar" in the default registry at version 1.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dumb me, that syntax already has a meaning... What about this then:

[registry.my-registry.depdendencies]
secret-crate="1.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative (loosely based on how URLs work):

"//my-registry/secret-crate" = "1.0"

```

## Publishing to another registry; preventing unwanted publishes

In order to specify that a crate should only be published to a particular set of registries,
specify in the `[package]` section the allowed registries using the `publish-registries` key and
specifying the list of registries that are allowed with `cargo publish`.

```
publish-registries = ["my-registry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo currently has a publish = false key for totally disallowing publishing, I wonder if we could perhaps overload it?

publish = true # default, publish to crates.io
publish = false # don't publish this anywhere
publish = [] # don't publish this anywhere
publish = ["https://some-other-registry.com"] # publish somewhere other than crates.io

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, do TOML/serde support different types like that???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - being able to do that kind of thing was one of the main advantages of serde over rustc-serialize. A simple way of doing it is via the "untagged" enum representation: https://serde.rs/enum-representations.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can even find this in Cargo today!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

```

If you run `cargo publish` without specifying an `--index` argument pointing to an allowed
registry, the command will fail. This prevents accidental publishes of private crates to crates.io,
for example.

## Running a minimal registry

The most minimal form of a registry that Cargo can use will consist of:

- A registry in the format specified in the [Registry index format specification
section][registry-index-format-specification], which contains a pointer to:
- A location containing the `.crate` files for the crates in the registry.

## Running a fully-featured registry

This RFC does not attempt to standardize or specify any of crates.io's APIs, but it should be
possible to take crates.io's codebase and run it along with a registry index in order to provide
crates.io's functionality as an alternate registry.

## Crates.io

Because crates.io's purpose is to be a reliable host for open source crates, crates that have
dependencies from registries other than crates.io will be rejected at publish time. Crates.io
cannot make availability guarantees about alternate registries, so much like git dependencies
today, publishing with dependencies from other registries won't be allowed.

In crates.io's codebase, we will add a configuration option that specifies a list of approved
alternate registry locations that dependencies may use. For private registries run using
crates.io's code, this will likely include the private registry itself plus crates.io, so that
private crates are allowed to depend on open source crates. Any crates with dependencies from
registries not specified in this configuration option will be rejected at publish time.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## Related issues

In order to make working with multiple registries more convenient, we would also like to support:

- [Being able to specify the API host rather than the index
location](https://github.com/rust-lang/cargo/issues/4208), so that, for example, you could
specify `https://crates.io` rather than `https://github.com/rust-lang/crates.io-index`. We do not
want to *require* specifying the API host, since some registries will choose not to have an API
host at all and only supply an index and a location for crate files. This would require the API
to have a way to tell Cargo where the associated registry index is located.
- [Being able to save multiple tokens in
`.cargo/credentials`](https://github.com/rust-lang/cargo/issues/3365), one per registry, so that
people publishing to multiple registries don't need to log in over and over or specify tokens on
every publish.
- Being able to specify `--registry registry-name` for all Cargo commands that currently take
`--index`
- Being able to use a dependency under a different name. Alternate registries that are not mirrors
should be allowed to have crates with the same name as crates in any other registry, including
crates.io. In order to allow a crate to depend on both, say, the `http` crate from crates.io and
the `http` crate from a private registry, at least one will need to be renamed when listed as a
dependency in `Cargo.toml`. [RFC
2126](https://github.com/aturon/rfcs/blob/path-clarity/text/0000-path-clarity.md#basic-changes)
proposes this change as follows:

> Cargo will provide a new crate key for aliasing dependencies, so that e.g. users who want to
> use the `rand` crate but call it `random` instead can now write `random = { version = "0.3",
> crate = "rand" }`.

## Registry index format specification
[registry-index-format-specification]: #registry-index-format-specification

Cargo needs to be able to get a registry index containing metadata for all crates and their
dependencies available from an alternate registry in order to perform offline version resolution.
The registry index for crates.io is available at
[https://github.com/rust-lang/crates.io-index](https://github.com/rust-lang/crates.io-index), and
this section aims to specify the format of this registry index so that other registries can provide
their own registry index that Cargo will understand.

This is version 1 of the registry index format specification. There may be other versions of the
specification someday. Along with a new specification version will be a plan for supporting
registries using the older specification and a migration plan for registries to upgrade the
specification version their index is using.

A valid registry index meets the following criteria:

- The registry index is stored in a git repository so that Cargo can efficiently fetch incremental

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Can the address of the index be a file: URL, or a plain file path? That could be really useful for setting up a local repository. I've done this a few times in the Java world. It could also be useful in reptilian corporate environments where it's easy to put something on a shared drive, but much harder to stand up a server.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Can the address of the index be a file: URL, or a plain file path?

Yes indeed, this is how I publish to a local instance of crates.io when developing, actually.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

For now, we're going to stay with git; being able to send only the delta of changes rather than the whole change is a huge win. While you might only have a few crates to start with, you might have more later, or just more versions of those few crates.

Git includes straightforward ways to run a server, if it's within your firewall and unauthenticated, it's not bad at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please accept my belated thanks for your reponse! The point of writing a server would be to proxy to our existing infrastructure, so being able to run a Git server doesn't really help. But being able to use local indices addresses most of of the internal use cases i can imagine, so that shouldn't matter.

updates to the index.
- There will be a file at the top level named `config.json`. This file will be a valid JSON object
with the following keys:

```json
{
"dl": "https://my-crates-server.com/api/v1/crates",
"api": "https://my-crates-server.com/",
"allowed-registries": ["https://crates.io", "https://my-other-crates-server.com"]
}
```

The `dl` key is required specifies where Cargo can download the tarballs containing the source
files of the crates listed in the registry.

The `api` key is optional and specifies where Cargo can find the API server that provides the
same API functionality that crates.io does today, such as publishing and searching. Without the
`api` key, these features will not be available. This RFC is not attempting to standardize
crates.io's API in any way, although that could be a future enhancement.

The `allowed-registries` key is optional and specifies the other registries that crates in this
index are allowed to have dependencies on. The default will be nothing, which will mean only
crates that depend on other crates in the current registry are allowed. This is currently the
case for crates.io and will remain the case for crates.io going forward. Alternate registries
will probably want to add crates.io to this list.

- There will be a number of directories in the git repository.
- `1/` - holds files for all crates whose names have one letter.
- `2/` - holds files for all crates whose names have two letters.
- `3/` - holds files for all crates whose names have three letters.
- `aa/aa/` etc - for all crates whose names have four or more letters, their
files will be in a directory named with the first and second letters of
their name, then in a subdirectory named with the third and fourth letters
of their name. For example, a file for a crate named `sample` would be
found in `sa/mp/`.

- For each crate in the registry, there will be a file with the name of that crate in the directory
structure as specified above. The file will contain metadata about each version of the crate,
with one version per line. Each line will be valid JSON with, minimally, the keys as shown. More
keys may be added, but Cargo may ignore them. The contents of one line are pretty-printed here
for readability.

```json
{
"name": "my_serde",
"vers": "1.0.11",
"deps": [
{
"name": "serde",
"req": "^1.0",
"registry": "https://crates.io",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this, like allowed-registries above, specify the index rather than this URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably.

"features": [],
"optional": true,
"default_features": true,
"target": null,
"kind": "normal"
}
],
"cksum": "f7726f29ddf9731b17ff113c461e362c381d9d69433f79de4f3dd572488823e9",
"features": {
"default": [
"std"
],
"derive": [
"serde_derive"
],
"std": [

],
},
"yanked": false
}
```

The top-level keys for a crate are:

- `name`: the name of the crate
- `vers`: the version of the crate this row is describing
- `deps`: a list of all dependencies of this crate
- `cksum`: a checksum of this version's files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this version's files/the tarball downloaded/

- `features`: a list of the features available from this crate
- `yanked`: whether or not this version has been yanked

Within the `deps` list, each dependency should be listed as an item in the `deps` array with the
following keys:

- `name`: the name of the dependency
- `req`: the semver version requirement string on this dependency
- `registry`: **New to this RFC: the registry from which this crate is available**
- `features`: a list of the features available from this crate
- `optional`: whether this dependency is optional or not
- `default_features`: whether the parent uses the default features of this dependency or not
- `target`: on which target this dependency is needed
- `kind`: can be `normal`, `build`, or `dev` to be a regular dependency, a build-time
dependency, or a development dependency

If a dependency's registry is not specified, Cargo will assume the dependency can be located in the
current registry. By specifying the registry of a dependency in the index, cargo will have the
information it needs to fetch crate files from the registry indices involved without needing to
involve an API server.

## New command: `cargo generate-index-metadata`

Currently, the knowledge of how to create a file in the registry index format is spread between
Cargo and crates.io. This RFC proposes the addition of a Cargo command that would generate this
file locally for the current crate so that it can be added to the git repository using a mechanism
other than a server running crates.io's codebase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, for this use-case, we'll also need a way to make a .crate file manually. This is already handled by cargo package. Then perhaps cargo package could create both a .crate tarbol, and a .json index metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us rolling the metadata into package eventually, yeah. I think we should try having them separate at first, cargo already has enough things tangled up with each other that could be independent ;)


# Drawbacks
[drawbacks]: #drawbacks

Supporting alternative registries, and having multiple public registries, could fracture the
ecosystem. However, we feel that supporting private registries, and the Rust adoption that could
enable, outweighs the potential downsides of having multiple public registries.

# Rationale and Alternatives
[alternatives]: #alternatives

A [previous RFC](https://github.com/rust-lang/rfcs/pull/2006) proposed having the registry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java ecosystem has gone the other direction. Gradle requires that you specify all of your upstream repositories in your build.gradle, and Maven supports both configuration in the project itself and at the user level.

It seems kind of messy for the dev setup instructions to go from "clone the repo" to "clone the repo, add these registries to your ~/.cargo/config, and make sure the names agree across all of the projects you're working on".

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the points for having the .cargo/config outside of the repository is to avoid checking authentication information into the code-base. From my view this would be a way to support private registries for closed source projects and the common use case is most likely that you will have one internal registry and use crates.io for all publicly available code.

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler:

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

It continues looking and unifies all of them. I just made a PR to cargo's docs to make this more readily apparent.

@sedrik:

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

That sounds like a great idea! I'll add a note about that :)

@sfackler

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

You're right that usernames and passwords should probably go in .cargo/credentials instead of .cargo/config, I'll make that change. Right now, only the token to authenticate to a registry's API is stored in .cargo/credentials, so this RFC will be adding the ability to specify a username and password to enable access to either a registry index or an API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, as long as it's something that can be checked into the repo and doesn't totally suppress user-level configuration I'm on board.

information completely defined within `Cargo.toml` rather than using `.cargo/config`. This requires
repeating the same information multiple times for multiple projects, and encourages checking in
credentials that might be needed to access the registries. That RFC also didn't specify the format
for the registry index, which needs to be shared among all registries.

An alternative design could be to support specifying the registry URL in either `.cargo/config` or
`Cargo.toml`. This has the downsides of creating more choices for the user and potentially
encouraging poor practices such as checking credentials into a project's source control. The
implementation of this feature would also be more complex. The upside would be supporting
configuration in ways that would be more convenient in various situations.

# Unresolved questions
[unresolved]: #unresolved-questions

- Are the names of everything what we want?
- `cargo generate-index-metadata`?
- `registry = my-registry`?
- `publish-registries = []`?