Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept hyphen in crate name in place of underscore #2775

Open
dtolnay opened this Issue Jun 6, 2016 · 19 comments

Comments

Projects
None yet
10 participants
@dtolnay
Copy link
Member

dtolnay commented Jun 6, 2016

Crates.io currently accepts hyphens for crates that use underscores, both in the web interface and in the API.

Cargo does not, but should.

error: no matching package named `serde-codegen` found (required by `testing`)
location searched: registry https://github.com/rust-lang/crates.io-index
version required: *
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jun 7, 2016

Cargo is somewhat agnostic between - and _, but it doesn't consider the two characters equivalent. Crates decide whether they want - or _ to begin with and then they must be referenced through that name, disallowing usage of the other.

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

@dtolnay

This comment has been minimized.

Copy link
Member Author

dtolnay commented Jun 7, 2016

Makes sense, and I don't have a strong preference myself. I saw this comment from @steveklabnik and figured this would be a step toward tooling that reflects the conventions we would like developers to use.

Is there a rationale for crates.io vs cargo behaving differently from each other?

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

Fair, but I would rather see serde-codegen and serde_codegen and not need to care, vs see library-a and library_b and need to remember which one to use in each case. "They mean the same thing" is a thing you learn once, while library-a and library_b is a thing that will bite you for as long as you use Rust.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jun 8, 2016

I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.

Yeah it's easier to not have to remember, but to me it's more of a downside as it's disguising what's actually happening under the hood.

@dtolnay

This comment has been minimized.

Copy link
Member Author

dtolnay commented Apr 25, 2017

The crates.io change was rust-lang/crates.io@89bc5dd.

@lfairy

This comment has been minimized.

Copy link
Contributor

lfairy commented May 13, 2017

I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.

The current behavior of crates.io is defined by RFC 940:

Right now, crates.io compares package names case-insensitively. This means, for example, you cannot upload a new package named RUSTC-SERIALIZE because rustc-serialize already exists.

Under this proposal, we will extend this logic to identify - and _ as well.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Sep 6, 2017

I'd like to re-open this issue, I think this is a bug and we should fix it. (Maybe we can talk about it in a cargo meeting).

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

At first glance, this makes sense, but I think it doesn't hold as much water when you consider:

  • In practice, the name almost always appears in your Cargo.toml exactly once, so its not like they'll mean the same thing in the same file, just that some crates will write them differently.
  • If the name contains - (which I believe is prefered), both names will already appear, one in the source and one in the manifest, because - is not valid in source.

In contrast, users who accidentally add serde-json always know what they meant, and because of the conversion users are taught that - and _ are interchangeable in package names. In practice, this seems to be a frustrating wart of cargo which is not adding any clarity for people & possibly even confusing them (if they don't guess that the reason adding error_chain failed was that its actually called error-chain).

@dtolnay dtolnay reopened this Sep 6, 2017

@dtolnay

This comment has been minimized.

Copy link
Member Author

dtolnay commented Sep 6, 2017

Reopened. I have also come to feel much more strongly about this in the past year.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 6, 2017

I also agree this is worthwhile to fix.

Implementation-wise this won't be easy though, I think, as it'll require changes to the crate index. The changes in Cargo itself after that, though, are likely nominal.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Sep 6, 2017

@alexcrichton That would be a problem, why isn't it sufficient to change how we match the dependency name against the index file to be neutral to underscores (e.g. retry if there's no file with the characters swapped)?

(It'd be better if the index were normalized, but that seems quite challenging).

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 6, 2017

Right now we don't load the entire index in-memory and we currently also don't try to browse the entire index, rather given a crate we drill into exactly which file it's supposed to be. If we have - and _ normalization we'd have a set of filenames that would be the plausible right one, and we'd in theory have to try to check all of them. That's ok on first builds, but we'd need to ensure that if you've got a lock file that this fallback behavior doesn't happen a lot, as it could add up time-wise I think.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Sep 6, 2017

Makes sense, so I think what we should do:

  • If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name
  • When generating the lock file, be sure to generate it with the index name, not the name in the toml

Is a divergence between the name in the lock and the toml going to be a problem?

EDIT: Also I'd like to write the PR for this to get more acquainted with cargo's codebase :)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 6, 2017

Yeah that sounds like it could work!

I think we'll have to maek sure that a Dependency::name isn't compared to a PackageId::name, although we can perhaps either assume that doesn't happen or otherwise use separate types there if necessary. Sounds plausible at least!

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Jul 5, 2018

If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name

Is there a better way of doing this then brute force? With out changing the index in ways that brack older cargos / exiting projects?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Jul 5, 2018

Brute force take an exponential O(2^n) time, but that’s not really a problem when n is almost never greater than two.

bors added a commit that referenced this issue Jul 16, 2018

Auto merge of #5691 - Eh2406:hyphen-underscore, r=alexcrichton
Make index lookup robust to _ vs -, but don't let the user get it wrong.

This does a brute force search thru combinations of hyphen and underscores to allow queries of crates to pass the wrong one.

This is a small first step of fixing #2775

Where is best to add test?
@dhardy

This comment has been minimized.

Copy link

dhardy commented Aug 4, 2018

Why not go further and normalise all names to use hyphens -? The first step would be a new version of the query which only returns/finds normalised names; the second step would be a Cargo update to normalise then use the new version. The third step (a bit later perhaps) would be to only show the normalised names on crates.io.

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Aug 12, 2018

Because the index wuld need to have both names so that pre-normalise cargo and post-normalise cargo can find it, and that makes for 2 sources of truth.

@dhardy

This comment has been minimized.

Copy link

dhardy commented Aug 13, 2018

No it wouldn't if deployed via a new version of Cargo. Unfortunately this would not be backwards compatible (i.e. old versions of Cargo would require correct - vs _; new versions could accept either).

@mqudsi

This comment has been minimized.

Copy link

mqudsi commented Sep 29, 2018

A much nicer solution would have been to restrict crate names to a whitelist of characters that contains only one of those two symbols from day 1. Alas, time travel is not really an option.

@danielwaterworth

This comment has been minimized.

Copy link

danielwaterworth commented Oct 19, 2018

One thing that could be done to lessen the problem over time would be to stop crates.io allowing new crates with underscores (assuming hyphens are preferred).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.