Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Rename some of the standard collections for consistency and clarity #580

Merged
merged 6 commits into from
Feb 18, 2015

Conversation

CloudiDust
Copy link
Contributor

This comment in /r/programming is the motivation of this RFC.

Rendered View.

@CloudiDust
Copy link
Contributor Author

cc @pcwalton @aturon @gankro.

@Valloric
Copy link

I fully support the idea of unobfuscating the type/function names in the Rust library. Every single engineer I've introduced to Rust has pointed out poor naming in the library as a flaw (Err being particularly egregious). I've seen the names called much worse: from "sophomoric", "farcical" to... comments I won't repeat.

@Valloric
Copy link

Note that the "I don't want to type long names" argument has been defeated with code completion decades ago. Every decent editor has it (some with non-semantic completion engines, but still), including non-IDE ones like Vim, Emacs & Sublime Text.

"I don't want to type long names" is an issue of tooling that either exists or can be easily built; the libraries shouldn't suffer for it. Let's not forget that code is read far more often than it is written.

@Gankra
Copy link
Contributor

Gankra commented Jan 13, 2015

I would also argue for all the names to be "full". Vector, DoublyLinkedList, etc. But I also just really don't care at all.

@Valloric
Copy link

Here's some data: I just polled 5 engineers I work with. The question was "What do you think a collection class named 'DList' stands for?"

I got the following responses:

  • "a dynamic list"
  • "a data list"
  • "a destructor list"
  • "a doubly linked list"

Only one engineer thought of doubly linked list, and they had to think about it for a minute or so (making the name an obvious mental speed-bump). I asked them how confident they were that was what "DList" stood for, and they said they were 80% confident.

This is the reason why people are complaining about the bad names. The names should be self-evident and the user shouldn't have to remember what abbreviation the author of the type went with. A "doubly linked list" could be mapped to many abbreviated names.

I'll note I work with very smart people.

@theemathas
Copy link

I strongly believe that BTreeMap and BTreeSet should also be renamed to BinTreeMap and BinTreeSet too, if you want to be consistent in renaming BinaryHeap.

As for the other names, I am not so sure about them. However, I prefer renaming Bitv and BitvSet to BitVec and BitMap (I found the name BitvSet confusing).

For DList, I find the current name weird (I mentally parse it as "download ist something"). However, I could not find a desirable alternative, so I can't really complain about it.

@sinistersnare
Copy link

@theemathas note that BTree is an actual data structure called a BTree, same with BTreeMap.

@sinistersnare
Copy link

I once asked if we should expand all the names, Vector, DoublyLinkedList, BitVector, BitVectorSet, but the argument was made the name length is inversely proportional to usage. so Vecs are used a lot so have a short name now.

I would be for expanding all of the names, but I am not sure the core team would be behind it.

@CloudiDust
Copy link
Contributor Author

The main problem with expending all names is it is a large scale breaking change.

The main problem with not doing so is, well, if the referenced reddit comments and the comments here are any indication, then many do not like abbreviated names, and for good reasons.

No matter what abbreviation rules to use, there will be some inconsistencies. (Particularly, there is no abbreviation for DoublyLinkedList that clearly indicates what it is.)

The reason I propose to rename BinaryHeap is mainly to achieve some consistency while avoid doing a large scale breaking change. But this may be sweeping the real problem under the rug.

If in this case, breaking the world is acceptable, then I am for expending all the names too.

AFAIK, those prefer shorter names can always do aliasing, and it is easy to provide a deprecation period because of aliasing support, as we are not changing the collection methods here, only the names.

@CloudiDust CloudiDust changed the title RFC: Rename BinaryHeap to BinHeap RFC: Rename (maybe one of) the standard collections for consistency Jan 14, 2015
@CloudiDust
Copy link
Contributor Author

A quick and dirty search reveals that, strings vec and bitv appear in the names of some conversion functions in the standard library. Renaming Vec, Bitv and related types may or may not require renaming those functions.

(Personally, I think it is fine to still have abbreviations in function names even if the types get renamed. The important thing is that the full names appear at least once in the function signatures, and they will.)

@CloudiDust
Copy link
Contributor Author

@theemathas, according to Wikipedia:

A bit array (also known as bitmap, bitset, bit string, or bit vector) is an array data structure that compactly stores bits.

So it seems that BitvSet should not be renamed to BitMap.

@bluss
Copy link
Member

bluss commented Jan 14, 2015

I'd prefer:

Bitv -> BitVector
BitvSet -> BitSet

and, because Ring Buffers always are fixed size when the term is used elsewhere

RingBuf -> ArrayDeque

I also prefer that Vec stays short. I would really enjoy if we gave up everything about misusing the word "vector", but that fight is impossible to win..

@lifthrasiir
Copy link
Contributor

@CloudiDust

So it seems that BitvSet should not be renamed to BitMap.

I don't think so, since the Map here doesn't match the current usage of Map in Rust (associated key-value container). BitSet indeed can work though (naturally, it is a Set of smallish integers).

@aturon aturon self-assigned this Jan 14, 2015
@jfager
Copy link

jfager commented Jan 14, 2015

@Valloric While I'd like to see some of these names changed as well, the argument that short names are bad b/c some developers who don't use the language throw a hissy fit and call Rust names over them is lame. Developers are petty assholes, they're going to find something to complain about regardless.

@bluss Rust's RingBuf isn't fixed-size. Surprised me too.

@petrochenkov
Copy link
Contributor

C. Rename ... also Bitv to BitVec, BitvSet to BitVecSet

FWIW, Bitv always looked like a very strange abbreviation for me.
Obviously, I'm not an English language specialist, but do people really abbreviate like that?

@bluss
Copy link
Member

bluss commented Jan 14, 2015

@jfager that was my point and why it should be renamed.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 14, 2015

Here's some data: I just polled 5 engineers I work with. The question was "What do you think a collection class named 'DList' stands for?"

How about you ask them what 'a stands for?

@aturon
Copy link
Member

aturon commented Jan 14, 2015

@CloudiDust Thanks for the RFC! I definitely understand the original motivation about the seeming inconsistency among collections names.

I think that the RFC and discussion is making clear that we need to have a more formal, rather than folklore, guideline about abbreviations in names.

@Valloric It would be more helpful to discuss the relative merits of various strategies here, rather than focusing on the names that your fellow developers call Rust's libraries. (FWIW, I'm also quite surprised that Err is considered a particularly egregious example, because it seems highly unlikely to be misunderstood.)

In general, I would ask everyone to consider that as with so many other things, there are tradeoffs here. Many of us have various habits/cultural backgrounds that makes one choice seem "obvious" but I hope that we can examine these biases and lay out some rationale.

In particular, ease of writing/typing the names is not the only, or even primary benefit of short names, as far as I'm concerned.

Here's how I personally see the tradeoffs.

Long names can make it somewhat easier for a newcomer to "guess" the meaning of snippets of code. The DList versus DoublyLinkedList example gets at this. However, this line of thinking only goes so far. In general, it's not possible or desirable to stuff all of the relevant details about a data structure into its name; LinkedList or even List are also reasonable "long form" names, and in other cases we've made calculated decisions about which aspects to focus on in the name. Regardless of these choices, you can only get so far making guesses about what unknown types do based on their names.

Short names, on the other hand, make it more feasible to work "at scale" with things like generics and method chaining. To exaggerate -- but only slightly:

AtomicallyReferenceCounted<MutuallyExclusive<DoublyLinkedList<Integer>>>

    // versus

Arc<Mutex<DList<int>>>

Iterator chaining offers similar examples.

To me, the readability benefits of the shorter names here are enormous; I find it's much easier to see at a glance the structure of the stack of types because each name is short and easy to parse, and the names don't obscure the structure. The recent choice to introduce the mpsc module is another example. The full name would be:

use standard_library::synchronization::multiple_producer_single_consumer;

    // versus

use std::sync::mpsc;

In my experience, the willingness to abbreviate often makes it feasible for us to incorporate more information into names, and makes it easier to parse that information at a glance.

People sometimes argue that longer names are easier to remember, but I think that's wrong and/or missing the point. For one, as @Valloric argues, full length names are tolerable in part because most people use auto-complete, which obviates much of the need for memorization. But a deeper point is that there is no way to provide useful, rigid guidelines on exactly what types should be called even if abbreviations were ruled out entirely. I mentioned one example above -- how do you decide how much to say about the implementation in a type? But there are also synonyms and other wording choices. At the end of the day, when you program against an API, you learn its names. (And it's possible to argue that short names like Arc are actually easier to remember.)

So I think there are reasonable arguments on both sides. And while some people complain about Rust's historical choices here, I've also heard plenty of people express appreciation for its avoidance of naming verbosity.

(One final point keeping in mind is that this choice of short names has long been a part of Rust, to the point that it affects keywords -- impl being a particularly compelling example.)

@nikomatsakis
Copy link
Contributor

I'm going to speak up for short names here. @aturon did a great job giving a robust, factual defense, I'm going to give a more personal one: I think that it's important not to pick long, "enterprise"-style names for everything. Now, this doesn't mean we should use super-short abbreviations everywhere: Rust certainly has a history of very short abbreviations that we have generally moved away from. But we've been moving towards "mid-length names". Memorable names with punch. Obviously these is not a hard-and-fast rule that one can follow, but I think it's important, and it's part of Rust's character. I would be very loathe to lose it.

Also, speaking personally, while I find overly short names can be very opaque I find very long names equally hard to read. There's too much there and the "pattern recognition" part of my brain shuts off and the "actually read text" mode kicks in, which is much slower. (Ok, I am not a neuroscientist, and I am making things up about biology, but I think you know what I mean.) As far as I'm concerned the best of all worlds is when we can pick a short, one-word name that highlights the most interesting aspect of a type. Sometimes this even involves creating semi-arbitrary distinctions (e.g., List vs Vector), but I think that's ok, as long as those distinctions are used consistently.

I don't expect everyone to have the same tastes as I do. I'm sure other people find longer names easier to read, for whatever reason. But I don't want readers to get the impression that this is a universal taste.

@arthurprs
Copy link

Consistency is great but longgg names are even worse, so what about abbreviating the least important parts when the full name has 3+ compound names.
DList (Double Linked List) -> DLinkedList

@Valloric
Copy link

@aturon

Thank you for providing a detailed rationale!

I of course agree that names that are too long and unwieldy are bad, and some abbreviations are fine. For instance, I think Arc is a fine name. Some of the examples provided against longer names are IMO strawmen:

AtomicallyReferenceCounted<MutuallyExclusive<DoublyLinkedList<Integer>>>

Nobody is suggesting that. I am suggesting Arc<Mutex<DoublyLinkedList<int>>>, which is IMO superior to something like Arc<Mutex<DList<int>>>. You also can't convince me that you use such long type names (with DList or othewise) often in your code without aliasing the name to something shorter. Even with DList, that name is too long.

What I've noticed in Rust type and function names is a very frequent pattern of unnecessary abbreviations that do little more than confuse people. Err is a bad name because now I have to remember what did the library author choose as the "more appropriate" spelling of the word Error. The speed bump is there for 2 characters? That's it? Is this really the appropriate tradeoff? One of the recent RFC's had something like cvt_path instead of convert_path too, which is just plain inexcusable.

Similar with Vec. Is Vector really that bad? I keep saying that English is the default and the author needs to have a damn good reason why an abbreviation is better.

To sum up, "I don't know what this name stands for" is not the primary concern (although it's a big one), it's "I want to use a DoublyLinkedList, now what name did they pick in the library? DLList? DList? DLinkedList? Crap, I have to go look and check."

Every time you shorten the name from what you'd pronounce it as, you're saying that the pros of the shorter name outweigh the cons. I'm sorry, but you can't convince me that BitVector is so inappropriately long shortening it to Bitv outweighs the above costs to readability & usability.

In other words, readability is so massively valuable in any codebase beyond Hello World that any changes that negatively impact it need to have something seriously great going for it.

Erring on the side of code readability is almost always the right choice.

@Valloric
Copy link

To preempt questions like "whey is this abbreviation fine and that other one isn't:"

  • Arc is better than AtomicallyReferenceCounted. The full name is comically too long and unwieldy, so it obviously needs to be shortened regardless of how frequently used the type is. Arc can be easily pronounced and is symmetrical to Rc, which is a widely-recognized abbreviation for reference counting (much like HTTP is widely understood and used over Hypertext Transfer Protocol).
  • Mutex is better than MutuallyExclusive because it too is a widely familiar name.
  • int (which Rust doesn't have anymore, but lets pretend) is better than Integer for the same reason, familiarity.
  • Vector is better than Vec. It's a type frequently used which suggests that it should be short, but Vector is pretty short already and shortening it more IMO doesn't outweigh the cost; the benefit gained is minimal as well. (I'll grant that this one might be a bit debatable, since the type is used everywhere.)
  • BitVector is not a frequently used type so any arguments that it should be shorter are highly unconvincing. Bitv is obviously sub-par. Same goes for BitvSet.
  • RingBuf is both not super-frequently used and is only marginally shorter than RingBuffer. Shortening it seriously isn't worth the cost.

Etc.

@Valloric
Copy link

For the Vector vs Vec debate, List is both a full name and very short. LinkedList and DoublyLinkedList would have a nice symmetry with it. I honestly don't know why Rust went with the name from the C++ standard library; every time I've taught C++ to a newcomer I had to say "a vector is a list" since the name needs explanation.

@ghost
Copy link

ghost commented Jan 14, 2015

I'm a big fan of short names for the pervasive things: Vec, Str, BTreeMap -> Map, BTreeSet -> Set, Option -> Opt, Result -> Res, IoResult -> IO.

There's no real value in keeping long descriptive names for these. It's the Standard Library(tm), you're expected to get familiar with them either way. Yes it somewhat hurts grep, but that's an argument for better tooling, not a worse library. Link a name guide right above the search box and everyone lives happily ever after.

@CloudiDust
Copy link
Contributor Author

@lifthrasiir, I think we are in agreement? (There was a "not" in my sentence.)

And yes, I now think BitSet can be a better name than the other alternatives. Though Wikipedia points out BitVector, BitSet, BitMap mean the same thing, we can see them as different "views" of the same thing. BitvSet has a set-like interface, so BitSet is a fine name.

@bluss
Copy link
Member

bluss commented Feb 13, 2015

Great. I would prefer just Deque though, but you all want to reserve that for a trait. I don't think that reservation is worth it.

And various other adjustments.
@CloudiDust CloudiDust changed the title RFC: Rename (maybe one of) the standard collections for consistency RFC: Rename some of the standard collections for consistency and clarity Feb 13, 2015
@CloudiDust
Copy link
Contributor Author

@aturon, done.

@bluss, even if we were not to reserve Deque for a trait, it still completely hides implementation details if used as a type name, so it isn't consistent with names like HashSet or BTreeMap.

@tbu-
Copy link
Contributor

tbu- commented Feb 13, 2015

I think a change from RingBuf to VecDeque would be unfortunate, RingBuf is a concept known to most people programming in C, while I have never heard of a VecDeque and would have no idea what it means. When I wanted to have a ring buffer, I specifically searched the docs for ringbuffer and immediately found it. VecDeque would be bad for discoverability, so I don't think it should be renamed.

@soulseekah
Copy link

@tbu- but it's not really a ring buffer, is it?

@tbu-
Copy link
Contributor

tbu- commented Feb 13, 2015

@soulseekah It's a growable RingBuf, I have seen such an implementation in C, and you use it similar to how you'd use a non-growable RingBuf.

@soulseekah
Copy link

@tbu- right, a dynamically expandable circular buffer, but I think when talking about a plain circular buffer it's expected to be of a fixed size. So RingBuf does not seem to, in most cases, address the need for the traditional ring buffer. Thus, RingBuf should be left to house the fixed-size implementation, either alongside the growable implementation or standalone.

And by "alongside" I mean adding fixed-size functionality to RingBuf and keep the name as is, the problem with RingBuf currently is that it doesn't offer traditional fixed-size circular buffering as expected by many. I don't see how we can favour the growable implementation over the non-growable one (which doesn't even exist yet, but is the traditional one).

But I agree that VecDeque doesn't describe the reading circularity that RingBuf currently has. How about RingDeque, perhaps? RingVecDeque?

@tbu-
Copy link
Contributor

tbu- commented Feb 13, 2015

So RingBuf does not seem to, in most cases, address the need for the traditional ring buffer.

No, a dynamically growable ring buffer addresses all the use cases of a fixed-size ring buffer, just like a Vec also fits in all use cases of an array.

@soulseekah
Copy link

@tbu- Only if you can grow it manually and explicitly, but RingBuf in Rust grows automatically and will eventually fill up all memory. There are many use cases where you need fixed-sized buffers that simply discard the tail instead of growing indefinitely. Makes no sense.

@tbu-
Copy link
Contributor

tbu- commented Feb 13, 2015

That's akin to the other collections in Rust.

Makes no sense.

Please try to keep the discussion a discussion, saying that your discussion partner's arguments make no sense does not belong there.

@soulseekah
Copy link

@tbu- I'm not saying it's not impossible to make the dynamically growable one behave just like a fixed-size one, it's what people are doing in their own implementations, wrapping around RingBuf and checking length before deciding whether to pop before a push. But that's what the standard implementation is supposed to be able to do out of the box. So perhaps my issue with RingBuf is not in the naming, but in the lack of fixed-size functionality.

@tbu-
Copy link
Contributor

tbu- commented Feb 13, 2015

It's relatively easy to make a fixed-size hash map, vector, ring buffer out of a dynamically sized one, the other way around however, is impossible to do. There's multiple things you can do when running out of storage capacity, dropping old elements, returning an error when trying to insert new ones, dropping elements based on some metrics, panicking, but you can only select one when you implement a static sized collection. Making a dynamically sized collection allows for implementing any of these strategies yourself based on the original collection.

But all that is actually not part of the naming discussion, this is part of a design choice the Rust collections have made, namely "grow on missing capacity".

@soulseekah
Copy link

It's relatively easy to make a fixed-size hash map, vector, ring buffer out of a dynamically sized one, the other way around however, is impossible to do.

That makes sense. Thanks for the discussion.

@reem
Copy link

reem commented Feb 13, 2015

fwiw mio has a constant-sized ringbuf implemented here https://github.com/carllerche/mio/blob/master/src/buf/ring.rs

@theemathas
Copy link

What about DynamicRingDeque or DynRingDeque or just RingDeque?

Also note that "Buf"/"Buffer" implies a queue, not a deque.

@aturon
Copy link
Member

aturon commented Feb 18, 2015

Thanks, @CloudiDust, for your continuing work to push for clarity and consistency. This RFC nicely cleans up a few of the collection names (which somehow escaped collections reform) while maintaining Rust's overall approach to naming. The RFC has been merged. Kudos!

Tracking issue

@CloudiDust
Copy link
Contributor Author

@theemathas, sorry I missed and didn't reply earlier.

I think VecDeque, or "a deque implemented with a vector" is more in line with names like HashSet, BTreeMap etc.

Also, at least for me, it is not the Ring part, but the Buf(fer) part that hints that ring buffers are implemented with arrays/vectors, so it is reasonable to assume that RingDeques are implemented with circular doubly linked lists.

As you pointed out, Buf(fer) implies a queue, not a deque, which is another good point against RingBuf(fer).

@aturon, thanks for the merge.

aturon added a commit to aturon/rust that referenced this pull request Feb 18, 2015
This commit implements RFC 580 by renaming:

* DList -> LinkedList
* Bitv -> BitVec
* BitvSet -> BitSet
* RingBuf -> VecDeque

More details are in [the
RFC](rust-lang/rfcs#580)

[breaking-change]
Gankra pushed a commit to Gankra/rust that referenced this pull request Feb 18, 2015
This commit implements RFC 580 by renaming:

* DList -> LinkedList
* Bitv -> BitVec
* BitvSet -> BitSet
* RingBuf -> VecDeque

More details are in [the
RFC](rust-lang/rfcs#580)

[breaking-change]
mlalic added a commit to mlalic/solicit that referenced this pull request Feb 20, 2015
@Centril Centril added A-collections Proposals about collection APIs A-convention Proposals relating to documentation conventions. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs A-convention Proposals relating to documentation conventions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.