Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the significance of "specification" of index format/API? #4574

Closed
withoutboats opened this issue Oct 4, 2017 · 21 comments
Closed

What is the significance of "specification" of index format/API? #4574

withoutboats opened this issue Oct 4, 2017 · 21 comments
Labels
A-interacts-with-crates.io Area: interaction with registries A-registries Area: registries C-question Category: question T-cargo Team: Cargo

Comments

@withoutboats
Copy link
Contributor

In the alt registries RFC, @carols10cents specified the format that a registry index takes. I remember that we made a conscious choice to specify the index format, but not to specify the API that crates.io uses for publishing, yanking, etc.

I'm unclear, however, what it really means to specify these things. It seems to be about a level of service we provide and a level of expectation we put on maintainers of alt registries, but we haven't really nailed down what that means.

Here are some things I think this could maybe mean:

  • Does this mean we will maintain accurate documentation of the parts of the interface we've specified (but not necessarily other parts)?
  • Does this mean we require that all registries conform to the specified parts, but not the unspecified parts (i.e. if this were an IETF spec we would be using MUST for the index, but only SHOULD or MAY for the api parts).
  • Does this mean we should not implement features that depend on the "unspecified" parts of the interface (e.g. we should not make it easier to cargo publish to alternative registries, since that requires them to support the unspecified API)?
  • Or rather, restating the previous bullet, specifying a part of the interface is a necessary prerequisite toward generalizing a workflow that is built on top of that interface?

Probably, there are other significances this distinction could have to people that haven't occurred to me.

@withoutboats withoutboats added A-interacts-with-crates.io Area: interaction with registries C-question Category: question T-cargo Team: Cargo A-registries Area: registries labels Oct 4, 2017
@carols10cents
Copy link
Member

I'm going to think more on these questions but in the meantime I wanted to tag in @wycats for his thoughts on this.

@alexcrichton
Copy link
Member

To me it does seem like the high-order bit is getting cargo build working rather than publication/browsing of packages, that seems to be the real enabler. In that sense it makes sense to me to start off by specifying the index (which cargo build downloads and consumes) without the API (which is very rarely used only by cargo publish)

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 4, 2017

I had overlooked before that rust-lang/rfcs#2141 specified that the publish key in a Cargo.toml could take a list of registries that cargo publish will publish this to. @cswindle has implemented this in #4568.

I have some concerns about this:

First, the highest priority issue - cargo currently stores a single auth token in the config as registry.token. The RFC doesn't mention anything about this, and #4568 would have us send that auth token to every registry. We definitely can't have users sharing a single auth token among all registries they publish to.

But this raises a sort of deeper question. Clearly not all registries will use GitHub as the auth mechanism; how much do we expect them to share? The request made by cargo publish contains a number of details, some of them very particular:

  • It communicated with the server using a single HTTP PUT request to the path api/v1/crates/new (appended to the URL in the index's api member).
  • It sets an Authorization header, currently with that single token received from GitHub.
  • The body it sends has a very particular format:
    • A JSON blob containing package metadata preceeded by a little endian u32 of the length of that blob.
    • A tarball of the package preceded by a little endian u32 of the length of the tarball.
  • It expects to receive a JSON blob in response, from which it looks for warnings from crates.io, listing the invalid categories and badges.

I could imagine alternative registries wanting to make very different choices around this, especially around how they handle authentication & authorization. And if the servers and robust in how they handle the metadata JSON (e.g. not being extensible to new fields), we could have frustrating problems with breaking peoples' registries in the future as we extend the features of cargo.

What I especially don't want is a situation where you interact with 3 or 4 alternative registries, and half of them support cargo publish and half of them don't, because then I'll have to look up every time whether or not this registry supports cargo publish. Right now, I feel a lot more comfortable with saying that cargo publish et al only speak to the default registry, which is crates.io, and expecting that out of band tools will be built for mutating other registries (with our support, e.g. the metadata and index stuff).

@alexcrichton
Copy link
Member

Oh I was sort of expecting that for now a custom registry is just running literally the crates.io software with different configuration or perhaps a patch or two, in which case we wouldn't have to worry too too much about the implementation details. You're definitely right though in that we'd have to worry about the publish key here especially wrt tokens. The good news though is that it's all unstable!

My thinking is that if we enable publishing to different registries then we'd probably change the format of the credentials file in a backwards-compatible fashion to be able to store tokens for multiple registries.

@cswindle
Copy link
Contributor

cswindle commented Oct 4, 2017

Regarding the issue with sending a single authentication token, you would need to override this as part of the publish, until the related issue of saving multiple tokens in .cargo/credentials is solved (which is referenced in the RFC).

The last point I agree is potentially an issue, but the key thing that the publish key provides is ensuring that we don't publish to a server that we don't want to (which I think is required to stop private crates getting out onto crates.io as there is no easy way to remove them once they have been published).

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 4, 2017

Oh I was sort of expecting that for now a custom registry is just running literally the crates.io software with different configuration or perhaps a patch or two, in which case we wouldn't have to worry too too much about the implementation details.

See to me this seems like overkill for a private registry - and also ties them to using GitHub for auth. I could imagine tooling existing to maintain a private registry just based on a hosted git repo and an Amazon S3 bucket, using the out of the box auth mechanisms of those two systems, with no HTTP API of its own to speak of.

This would not use cargo publish but some other tool, probably with a similar CLI but doing a totally different transaction under the hood.

@cswindle
Copy link
Contributor

cswindle commented Oct 4, 2017

We went the route of running a private crates.io, although we made some changes to crates.io to allow integration with gitlab OAuth, this way we have a nice way to search through private crates in the same way that we can for public crates.

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 5, 2017

@cswindle I think that's a viable choice, but my concern is that we're going to lock people into requiring that for their tooling to work how they expect it to.


To try to frame my concern, I'd like to think in terms of a "minimum viable registry." What is the least a registry must support to be considered a "conformant" alternative registry?

Then, I'd like to only extend workflows to suppot alternative registries if that minimum registry would satisfice that workflow. If we want to extend a workflow to alternative registries that requires more, the minimum viable registry needs to include that support.

What I don't want is a mix of "opportunistic" workflows, each of which require different levels of support from the registry, and so each of which may or may not be supported by any particular registry.

Lastly, I want to consider what support to alternative registries we're committing to providing. That is, how extensively will we document/spec that "minimum registry" requirement, vs things beyond that? I don't want you to have to clone crates.io or read the source to create a registry that supports a workflow we consider "supported for alternate registries."

@cswindle
Copy link
Contributor

cswindle commented Oct 5, 2017

I acknowledge that concern, but if people do have the option of using publish to push the crates to an alternate registry (which they do as they can run their own crates.io), then I really think that we need a method of saying what registries the crate is permitted to be pushed to, otherwise we run the risk of code ending up on crates.io, or the wrong alternate registry, when it shouldn't.

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 6, 2017

So I talked today to André Arko, who is a maintainer of bundler, and he had an interesting proposal.

We could add a key to the config.json in the index called commands, which is an array of strings. For example:

{
    "api": "https://crates.io",
    "crates": "https://crates.io/api/crates",
    "commands": ["publish", "yank", "owner"] // etc
}

If a user attempts to use such a command on an alternative registry, if that registry's index does not claim to support the command, instead of attempting and barfing, cargo could report that the command is not supported by that registry. This way, registries with less capabilities don't just throw random errors, they report that they don't support those commands.

Additionally, we assume that the default registry supports all commands, and we assume that all alternative registries support fetching & building.

@cswindle
Copy link
Contributor

cswindle commented Oct 6, 2017

Interesting idea, although wouldn’t this still require that you define what constitutes each command, although I guess it only requires the parameters to be defined for each. A neat feature that you can do once you have this is add a version field, so if a new version of publish is added you could add publish-2, which would the be a nice way to allow breaking changes to the interfaces. Might be nice to include a version in the commands from the start to make it more obvious that it is a versioned interface.

Regarding default registry supporting all by default, I think I would prefer that we have a single code path by ensuring that crates.io returns a sensible list, that way if someone does not use crates.io as the default they can still control the features.

@withoutboats
Copy link
Contributor Author

wouldn’t this still require that you define what constitutes each command

I think this is a separable question - given that cargo fails politely when a registry doesn't support a workflow, I feel more comfortable allowing cargo to make requests to other registries without providing as high a level of support as we do for the index (e.g. we might not thoroughly document the API).

Regarding default registry supporting all by default, I think I would prefer that we have a single code path by ensuring that crates.io returns a sensible list, that way if someone does not use crates.io as the default they can still control the features.

Yea I was thinking about the fact that cargo today assumes the default registry (its only registry) supports all commands, but there's no reason we can't change that as long as we add this field to the crates.io index first.

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 6, 2017

If we wanted to be more thorough in the index, we could use a format like this:

{
    "api": "https://crates.io", // deprecated, only to support old cargos
    "crates": "https://crates.io/api/crates",
    "commands": {
        "publish": {
             "api": "https://crates.io/api/crates/new",
             "version": "1"
        },
        "yank": {
             "api": "https://crates.io/api/crates/yank",
             "version": "1"
        },
        // etc
    }
}

@cswindle
Copy link
Contributor

cswindle commented Oct 6, 2017

I like that approach, seems to me to be nice and flexible, while still being intuitive.

@withoutboats
Copy link
Contributor Author

withoutboats commented Oct 9, 2017

We talked about this in the cargo meeting today, and we decided that a format along the lines of my most recent comment would be good. As part of the alternative-registries feature, we will add a check for the api for each command through one of these objects. If the object doesn't exist, we will report that that registry doesn't support the command.

We also agreed that we won't particularly go out of our way to document how any of these APIs work at the present time. Only the registry format (including this new commands object) and the dl endpoint will have maintained, hosted documentation, since they are a part of the minimum requirements for a registry.

@cswindle
Copy link
Contributor

@withoutboats, is it worth also including on the publish what nightly cargo-features the registry supports, this way the admin of the registry is able to allow unstable features to be pushed to allow for testing of new features.

@withoutboats
Copy link
Contributor Author

@cswindle To be clear, you mean that the registry will accept publishes using those features?

Is the check in crates.io or cargo right now? We could just put the check in crates.io and let other registries set their own policies about publishing nightly features without looping the cargo client into it.

@cswindle
Copy link
Contributor

cswindle commented Nov 1, 2017

@withoutboats. currently we do not send the cargo-features list to crates.io, however I am not sure if we start sending that in cargo how crates.io will respond.

@alexcrichton, do you think that it is valid to send the cargo features in the publish so that owners of a registry are able to specify that they accept unstable features for trying out features.

@alexcrichton
Copy link
Member

Hm I don't really have an answer here. I'm struggling to envision what nightly features on publish would be and how they interact with registries...

@stale
Copy link

stale bot commented Sep 19, 2018

As there hasn't been any activity here in over 6 months I've marked this as stale and if no further activity happens for 7 days I will close it.

I'm a bot so this may be in error! If this issue should remain open, could someone (the author, a team member, or any interested party) please comment to that effect?

The team would be especially grateful if such a comment included details such as:

  • Is this still relevant?
  • If so, what is blocking it?
  • Is it known what could be done to help move this forward?

Thank you for contributing!

(The cargo team is currently evaluating the use of Stale bot, and using #6035 as the tracking issue to gather feedback.)

If you're reading this comment from the distant future, fear not if this was closed automatically. If you believe it's still an issue please leave a comment and a team member can reopen this issue. Opening a new issue is also acceptable!

@stale stale bot added the stale label Sep 19, 2018
@stale
Copy link

stale bot commented Oct 19, 2018

As I didn't see any updates in 30 days I'm going to close this. Please see the previous comment for more information!

@stale stale bot closed this as completed Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interacts-with-crates.io Area: interaction with registries A-registries Area: registries C-question Category: question T-cargo Team: Cargo
Projects
None yet
Development

No branches or pull requests

4 participants