Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept hyphen in crate name in place of underscore #2775

Open
dtolnay opened this issue Jun 6, 2016 · 38 comments
Open

Accept hyphen in crate name in place of underscore #2775

dtolnay opened this issue Jun 6, 2016 · 38 comments
Labels
A-crate-dependencies Area: [dependencies] of any kind A-interacts-with-crates.io Area: interaction with registries C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.

Comments

@dtolnay
Copy link
Member

dtolnay commented Jun 6, 2016

Crates.io currently accepts hyphens for crates that use underscores, both in the web interface and in the API.

Cargo does not, but should.

error: no matching package named `serde-codegen` found (required by `testing`)
location searched: registry https://github.com/rust-lang/crates.io-index
version required: *
@alexcrichton
Copy link
Member

Cargo is somewhat agnostic between - and _, but it doesn't consider the two characters equivalent. Crates decide whether they want - or _ to begin with and then they must be referenced through that name, disallowing usage of the other.

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

@dtolnay
Copy link
Member Author

dtolnay commented Jun 7, 2016

Makes sense, and I don't have a strong preference myself. I saw this comment from @steveklabnik and figured this would be a step toward tooling that reflects the conventions we would like developers to use.

Is there a rationale for crates.io vs cargo behaving differently from each other?

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

Fair, but I would rather see serde-codegen and serde_codegen and not need to care, vs see library-a and library_b and need to remember which one to use in each case. "They mean the same thing" is a thing you learn once, while library-a and library_b is a thing that will bite you for as long as you use Rust.

@alexcrichton
Copy link
Member

I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.

Yeah it's easier to not have to remember, but to me it's more of a downside as it's disguising what's actually happening under the hood.

@dtolnay dtolnay closed this as completed Jun 8, 2016
@dtolnay
Copy link
Member Author

dtolnay commented Apr 25, 2017

The crates.io change was rust-lang/crates.io@89bc5dd.

@lambda-fairy
Copy link
Contributor

I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.

The current behavior of crates.io is defined by RFC 940:

Right now, crates.io compares package names case-insensitively. This means, for example, you cannot upload a new package named RUSTC-SERIALIZE because rustc-serialize already exists.

Under this proposal, we will extend this logic to identify - and _ as well.

@withoutboats
Copy link
Contributor

I'd like to re-open this issue, I think this is a bug and we should fix it. (Maybe we can talk about it in a cargo meeting).

I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.

At first glance, this makes sense, but I think it doesn't hold as much water when you consider:

  • In practice, the name almost always appears in your Cargo.toml exactly once, so its not like they'll mean the same thing in the same file, just that some crates will write them differently.
  • If the name contains - (which I believe is prefered), both names will already appear, one in the source and one in the manifest, because - is not valid in source.

In contrast, users who accidentally add serde-json always know what they meant, and because of the conversion users are taught that - and _ are interchangeable in package names. In practice, this seems to be a frustrating wart of cargo which is not adding any clarity for people & possibly even confusing them (if they don't guess that the reason adding error_chain failed was that its actually called error-chain).

@dtolnay dtolnay reopened this Sep 6, 2017
@dtolnay
Copy link
Member Author

dtolnay commented Sep 6, 2017

Reopened. I have also come to feel much more strongly about this in the past year.

@alexcrichton
Copy link
Member

I also agree this is worthwhile to fix.

Implementation-wise this won't be easy though, I think, as it'll require changes to the crate index. The changes in Cargo itself after that, though, are likely nominal.

@withoutboats
Copy link
Contributor

withoutboats commented Sep 6, 2017

@alexcrichton That would be a problem, why isn't it sufficient to change how we match the dependency name against the index file to be neutral to underscores (e.g. retry if there's no file with the characters swapped)?

(It'd be better if the index were normalized, but that seems quite challenging).

@alexcrichton
Copy link
Member

Right now we don't load the entire index in-memory and we currently also don't try to browse the entire index, rather given a crate we drill into exactly which file it's supposed to be. If we have - and _ normalization we'd have a set of filenames that would be the plausible right one, and we'd in theory have to try to check all of them. That's ok on first builds, but we'd need to ensure that if you've got a lock file that this fallback behavior doesn't happen a lot, as it could add up time-wise I think.

@withoutboats
Copy link
Contributor

withoutboats commented Sep 6, 2017

Makes sense, so I think what we should do:

  • If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name
  • When generating the lock file, be sure to generate it with the index name, not the name in the toml

Is a divergence between the name in the lock and the toml going to be a problem?

EDIT: Also I'd like to write the PR for this to get more acquainted with cargo's codebase :)

@alexcrichton
Copy link
Member

Yeah that sounds like it could work!

I think we'll have to maek sure that a Dependency::name isn't compared to a PackageId::name, although we can perhaps either assume that doesn't happen or otherwise use separate types there if necessary. Sounds plausible at least!

@carols10cents carols10cents added A-crate-dependencies Area: [dependencies] of any kind A-interacts-with-crates.io Area: interaction with registries C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Sep 25, 2017
@Eh2406
Copy link
Contributor

Eh2406 commented Jul 5, 2018

If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name

Is there a better way of doing this then brute force? With out changing the index in ways that brack older cargos / exiting projects?

@SimonSapin
Copy link
Contributor

Brute force take an exponential O(2^n) time, but that’s not really a problem when n is almost never greater than two.

bors added a commit that referenced this issue Jul 16, 2018
Make index lookup robust to _ vs -, but don't let the user get it wrong.

This does a brute force search thru combinations of hyphen and underscores to allow queries of crates to pass the wrong one.

This is a small first step of fixing #2775

Where is best to add test?
@dhardy
Copy link

dhardy commented Aug 4, 2018

Why not go further and normalise all names to use hyphens -? The first step would be a new version of the query which only returns/finds normalised names; the second step would be a Cargo update to normalise then use the new version. The third step (a bit later perhaps) would be to only show the normalised names on crates.io.

@Eh2406
Copy link
Contributor

Eh2406 commented Aug 12, 2018

Because the index wuld need to have both names so that pre-normalise cargo and post-normalise cargo can find it, and that makes for 2 sources of truth.

@dhardy
Copy link

dhardy commented Aug 13, 2018

No it wouldn't if deployed via a new version of Cargo. Unfortunately this would not be backwards compatible (i.e. old versions of Cargo would require correct - vs _; new versions could accept either).

@mqudsi
Copy link

mqudsi commented Sep 29, 2018

A much nicer solution would have been to restrict crate names to a whitelist of characters that contains only one of those two symbols from day 1. Alas, time travel is not really an option.

@withoutboats
Copy link
Contributor

withoutboats commented Nov 20, 2019

You can't do use serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.

The "workaround" is incredibly simple: you can do use serde_transcode;, all hyphens are transformed to underscores within Rust source code. If we don't have diagnostics that recommend this when you type use serde-transcode we should.

@SOF3
Copy link

SOF3 commented Nov 21, 2019

Since crates.io already does not accept two packages with just - and _ different, why can't we make every place that accepts a package name to silently convert - to _ (or the other way round) internally? It is perfectly feasible that the preference of -/_ of the package author does not affect how users use it (just like whether the package author uses tabs or spaces doesn't affect whether users use tabs or spaces). This can be done with some algorithm similar to how case-insensitive systems handle the cases.

TL;DR: Why do we have to care if ignoring doesn't lead to problems?

In fact, it is even feasible to force change everything existing to - (or to _) and just silently (or with a warning) convert them when new crates are published.

BTW, I'm a bit confused. Are we talking about crate names or package names, or are they the same thing?

@withoutboats
Copy link
Contributor

TL;DR: Why do we have to care if ignoring doesn't lead to problems?

That is the goal, and we've solved the problem in both rustc and crates.io but not in cargo, which is why this issue is open.

@Eh2406
Copy link
Contributor

Eh2406 commented Nov 21, 2019

So why is this hard in cargo?

One complication is Alternative Registries, the existing RFC allows registries to have packages that only differ by - vs _. crates.io does not allow this, but other registries can do what they want. If we want to continue to support the RFC, then Cargo needs to keep the names as is and change all equality checks to equivalency checks. Tracking down every time we use eq or hash on a type that contains a name and finding a way to make it equivalency... it is going to be hard. (Also a deep well of corner cases.)

Even if we decide to break such (niche) uses of Alternative Registries, we will want a grace period. Some time where it will build with the wrong -/_ but give you a warning that older cargos won't know what you meant. I think this leads to the same implementation problems.

If someone has a way to make this work, I am open to helping make it happen.

@hskang9
Copy link

hskang9 commented Nov 23, 2019

@Eh2406 I found the reason why - / _ is all considered as an underscore here in crates.io. The sql function canon_crate_name replaces the hyphen with an underscore. If the replace function is removed, crates.io will allow registries to have packages that only differ by - vs _.

@carols10cents
Copy link
Member

crates.io will allow registries to have packages that only differ by - vs _.

I don't think we want crates.io to allow packages that only differ by - vs _ so I've closed the associated PR; let me know if I've misunderstood.

@Manishearth
Copy link
Member

I've posted https://internals.rust-lang.org/t/pre-rfc-unify-dashes-and-underscores-on-crates-io/13216/13. It's becoming more clear to me that this doesn't need an RFC, but I've posted the pre RFC anyway.

I don't actually think the exponential growth for names with a large number of separators is a problem. In those cases we can traverse the index trie, looking for both separators whenever there is one, and splitting the search if both exist. Note that this will not cause exponential growth unless there actually are crates with that combination of separator: it's unlikely that foo-bar-baz-quux-1 and foo-bar-baz_quux-2 and foo-bar_baz-quux-3 and so on all exist (and if they do it's probably an automated publish in violation of crates.io policy). In other words, it's only possible to engineer an exponentially bad situation here on purpose, it's not really possible as an accident.

@morkeltry
Copy link

Strongly agree! This is the sort of magic renaming nonsense that CSS has to do because they used - everywhere. I think Cargo should issue a warning for new crates that use - in the name.

I've actually wasted 10 minutes on this now because serde-transcode has a hyphen in it and I haven't worked out how to reference that from Rust code yet. You can't do use serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.

10 minutes?
I am new to rust and have spent the last two hours working out how to, in my lib.rs, import a crate that somebody has named with a hyphen.
I think they named it that way because there's already one with the underscore name and so it is definitely a good way to distinguish between the two versions.

@SOF3
Copy link

SOF3 commented Jan 22, 2021

Strongly agree! This is the sort of magic renaming nonsense that CSS has to do because they used - everywhere. I think Cargo should issue a warning for new crates that use - in the name.
I've actually wasted 10 minutes on this now because serde-transcode has a hyphen in it and I haven't worked out how to reference that from Rust code yet. You can't do use serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.

10 minutes?
I am new to rust and have spent the last two hours working out how to, in my lib.rs, import a crate that somebody has named with a hyphen.
I think they named it that way because there's already one with the underscore name and so it is definitely a good way to distinguish between the two versions.

there isn't one with the underscore name. crates.io does not accept an underscore name if there is already one with the hyphen name.

@LuminousPath
Copy link

If this helps the discussion, I'm a new user to Rust as well and coming from other languages, the ability for a package name and the source code referring to that package name within the same context to differ without being explicit is baffling (at least until you google it and end up in a thread like this one).

Moving around in a large project where rust might exist alongside many languages, having to remember that particular quirk about Rust libraries is definitely going to be exhausting in the long term.

@Pzixel
Copy link

Pzixel commented Jan 13, 2022

Since there are so many popular crates with - in their names (since it's looks nicer and easier to read) isn't it an option to relax naming rules a bit and allow them in rust code in imports? Okay, maybe it's not valid identifier name but it could be a valid crate name.

On the other hand it would forbid it in using in context like let a = 10 - serde-json::from_str::<i32>("15").unwrap()) so this might look awkward. But people could always manually rename it like use serde-json as serde_json if they wish.

@SOF3
Copy link

SOF3 commented Jan 14, 2022

Aren't we heading to an overkill here? Perhaps package name conversion to underscore is not something you would realize immediately, but it would be the responsibility of the package developer anyway. Packages can always specify lib.name in Cargo.toml to set their own crate name anyway, but how many newbies to rust would actually pay attention to the Cargo.toml of the dependencies they use? I remember getting really confused why futures-preview is imported as futures instead of futures_preview when I just started; if anything, that is way more confusing than having serde-json imported as serde_json, which everyone would immediately understand intuitively (even though they may be doubtful why it is written like that).

I don't know what's the original rationale of allowing underscores, but if we were to start all over, it might be more reasonable to have cargo only take the substring before the first hyphen as the lib crate name (hence the futures-preview case). But now that we are already at this (and it is almost idiomatic to suffix binding crates with -sys), collating hyphens as the equivalent of underscore would be most sensible thing to do because that's how everyone understands it.

@SOF3
Copy link

SOF3 commented Jan 14, 2022

By the way, should we rename this issue to package name instead of crate name?

@Kimundi
Copy link
Member

Kimundi commented May 12, 2024

Just for reference, Python Package names can also differ in _, -, and are in the same equivalence class based on a normalization scheme: https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization

In fact, they take this quite a bit further: Any arbitrary sequence of _, - or . is considered to be the same, so for example foo.-_-.bar and foo-bar refer to the same package. I'm not saying we should do that, just that there is precedence for even more complicated schemes than what is proposed here.

@Timmmm
Copy link
Contributor

Timmmm commented May 12, 2024

I would think "Python does this" is pretty strong evidence that it's a terrible idea, given how awful Python's packaging story is!

@carols10cents
Copy link
Member

I would think "Python does this" is pretty strong evidence that it's a terrible idea, given how awful Python's packaging story is!

Hey now, we don't language bash here. It's all tradeoffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-crate-dependencies Area: [dependencies] of any kind A-interacts-with-crates.io Area: interaction with registries C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests