Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcollections: Add a Multiset data structure. #15623

Closed
wants to merge 13 commits into from

Conversation

nham
Copy link
Contributor

@nham nham commented Jul 11, 2014

This is not finished, but I want some feedback before I go too far on my own. This defines two new traits, Multiset and MutableMultiset. I currently have a partial implementation of these traits, TreeMultiset, based on TreeMap, and I plan to add another one based on HashMap. I have tried to make these traits and the TreeMultiset implementation closely correspond to the Set traits and TreeSet.

The TreeMultiset implementation is missing a few things, mostly some trait implementations. I also plan on adding a multiset sum operation, which is distinct from multiset union (see this Wikipedia page for reference).

One thing I want to add to the Multiset trait is a to_set method that will return the underlying set of the multiset, which would just be all the distinct values in the multiset with multiple occurrences removed. However, this currently doesn't seem possible, as noted in #8154.

Another thing I've wondered about: the Multiset trait could inherit the Set trait, but the way I've written it, MutableMultiset could not inherit MutableSet. One way this could be changed is to inherit MutableSet and use insert as a method to insert one occurrence (and remove as a method to remove one occurrence) and then add trait methods for inserting multiple occurrence, perhaps called insert_multiple/remove_multiple or insert_many/remove_many. I'm just not sure if this makes sense.

nham added 2 commits July 12, 2014 10:25
… take iterator parameter.

This allows them to be used for both TreeSet and TreeMultiset.
@Gankra
Copy link
Contributor

Gankra commented Jul 12, 2014

I don't think MutableMultiset should inherit MutableSet. Given a Set I expect the following to work:

set.insert(1); 
set.insert(1);
set.remove(&1);
assert!(!set.contains(&1));

And that's just not how a multiset works.


/// Add one occurrence of `value` to the multiset. Return true if the value
/// was not already present in the multiset.
fn insert_one(&mut self, value: T) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also have remove_one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inadvertent omission. I've just added it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a minor point, but it seems like insert_one/remove_one are probably the more common methods for someone to want. Would it be better to have insert/remove be insert_one/remove_one, and then have insert_many/remove_many? At very least, this would bring the multiset interface superficially closer to the set interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Most of the inserts/removes will probably be one at a time. I'll change this.

Add remove_one() method to MutableMultiset. Implement Show and Default for TreeMultiset. Add tests for count() and the Show implementation.

impl<T: Ord> Collection for TreeMultiset<T> {
#[inline]
fn len(&self) -> uint { self.map.len() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should len just return the number of keys, or the sum of the counts? I imagine it would be fairly easy and desirable to expose methods for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I almost certainly copied it unthinkingly from the TreeSet implementation. I'm going to change len() to return the sum of the counts. Since the number of distinct values is probably a useful method to have, I'll add a method for that (though I'm currently not sure whether to add it to the trait or just to each implementation)

…evious version returned the number of distinct values.
@Gankra
Copy link
Contributor

Gankra commented Jul 14, 2014

Should the possibility of splitting treemap.rs into multiple files be discussed here? It's getting a bit unwieldy, and could easily be separated into treemap, treeset, and treemultiset.

@treeman treeman mentioned this pull request Jul 14, 2014
@nham
Copy link
Contributor Author

nham commented Jul 15, 2014

With this patch, treemap.rs is currently at 2502 lines. By comparison, hashmap.rs is currently at 2520, so there is precedent for files this large. But perhaps it makes the most sense to put TreeSet and TreeMultiset into separate modules. I'm not sure here.

@alexcrichton
Copy link
Member

This does seem to play nicely into the Set and MutableSet traits in that insert and remove operating on one element would be convenient. It may provide surprising results like @gankro pointed out, but I think that the symmetry would be more valuable.

For now, though, I'm not sure if adding new collections traits is the best option. The current traits are sort of ad-hoc and seem to be lacking an overall design in terms of interactions with one another and extensions into the future. Perhaps these methods could be inherent methods for now pending a redesign in the future? I don't think anyone is taking a generic Set or MutableSet today, so I don't think too much functionality will be lost.

nham added 3 commits July 16, 2014 14:59
The collections crate is still in flux, so we are holding off on
deciding the design of this trait.
@nham
Copy link
Contributor Author

nham commented Jul 16, 2014

@alexcrichton Yeah, that makes sense. I've removed the traits.

@nham
Copy link
Contributor Author

nham commented Jul 22, 2014

This should be ready for further review. TreeMultiset now implements all the traits that TreeSet implements and provides all the same methods as well, with the exception of move_iter (see below). I've also implemented the sum method for performing multiset sums.

For move_iter, we have a bit of an issue since the current design does not store multiple copies but simply a counter. In order to properly implement this I need to either change it to store multiple copies or only implement for cloneable values.

I should also point out that it's unclear to me how this could support both insert_many and a generic move_iter at the same time. If you're just giving a number for the number of occurrences an item should have, multiple values aren't available to be moved out by move_iter. So you either have to give up insert_many or make move_iter work only for Cloneable items.

let mut mset = TreeMultiset::new();
mset.extend(iter);
mset
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason that from_iter isn't #[inline] but extend is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that I copied and pasted function signatures from TreeSet and its from_iter method is not inlined! Same goes for much of the other methods.

@Gankra
Copy link
Contributor

Gankra commented Jul 29, 2014

Part of me wonders if a "bulk" iterator that yields (item, count) would be desirable, but... so many iterators...

}

/// Return true if the multiset has no elements in common with `other`.
pub fn is_disjoint(&self, other: &TreeMultiset<T>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this implementation is elegant, would it not be radically more efficient to iterate the underlying TreeMaps, and compare counts? As an extreme case if I make two Multisets with 10000000 and 1000001 instances of x, this code will count all the way to 1000001 before terminating, where it could just do a single integer comparison.

Edit: I think it would be more readable, too.

Edit2: This comment was actually directed at is_subset, but there exists a similar very-inefficient input for this function.

self.count -= 1;
self.current
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between these implementations? Only the underlying iterators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if I remember correctly that's the only difference. (same as for SetItems and RevSetItems)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you may write a single impl<'a, T, I: Iterator<(&'a T, &'a uint)>> Iterator<&'a T> for MultisetItems<'a, T, I> {, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the only question is whether that makes the item too unwieldy for use. The same could be said for SetItems and RevSetItems (unifying them and parameterizing the iterator)

@nham
Copy link
Contributor Author

nham commented Aug 9, 2014

There are some improvements that can be made to this, as @gankro and @pczarn have pointed out, but I'm no longer convinced that using TreeMap<T, uint> should be how TreeMultiset is implemented. There may still be a use for such a data structure, but perhaps it should be called a "CountedSet" or something. I'm currently working on a new implementation based on TreeMultimap, which will just be an AA tree that allows duplicate keys to be inserted. I'm closing this for now.

@nham nham closed this Aug 9, 2014
@Gankra
Copy link
Contributor

Gankra commented Aug 9, 2014

@nham presumably you were dissatisfied with all colliding instances taking on the same concrete value?

@nham
Copy link
Contributor Author

nham commented Aug 9, 2014

@gankro Yeah, only storing a count instead of multiple instances means that move_iter() cannot be implemented. I think TreeMultiset should store each instance individually, which I believe (but am not certain) is how C++'s std::multiset works.

@Gankra
Copy link
Contributor

Gankra commented Aug 9, 2014

@nham: presumably TreeMultiMap<K,V> should just be a layer over TreeMap<K,Vec<V>>, then? And then TreeMultiSet<T> is just TreeMap<&T,Vec<T>>?

Edit: <>'s

@nham
Copy link
Contributor Author

nham commented Aug 9, 2014

@gankro That is one way to go (TreeMultimap<K, V> = TreeMap<K, Vec<V>>, roughly), but I was thinking of using a slight variation on the current AA tree implementation where duplicate keys are allowed (so the left subtree contains values less than or equal to the current node, and when we try to insert a key that already exists, don't replace the old value, but actually insert a new node). I need to do some testing to see which version is better.

@Gankra
Copy link
Contributor

Gankra commented Aug 9, 2014

@nham: avoiding more nodes seems like it'd be a big performance win in several ways (less frequent heap allocs). Although... the more I think about it the more I'm uncertain if my TreeMultiSet impl makes sense (can you get the ref out of the Vec correctly?). A duplicate key tree would be saner there, for sure.

Duplicate keys also has distinct semantics wrt popping nodes. If your keys have destructors, then TreeMap<K, Vec<V>> is very different from DuplicateTreeMap<K,V>. It would have to be a TreeMap<&K, Vec<(K,V)>> to have the same external-facing effects. Although if you assume keys are all worthless rubbish, then TreeMap<K, Vec<V>> will have serious space savings by only storing one key.

Not sure which is the desired behaviour!

@nham
Copy link
Contributor Author

nham commented Aug 9, 2014

@gankro: Do I understand correctly that in your TreeMultimap, the first (key, value) gets inserted as (&key, vec!((key, value))). Then subsequent pairs (k, v) where the k equals key get pushed to the vector? Won't that prevent us from obtaining a mutable reference to the vector, since it is partially borrowed immutably?

@Gankra
Copy link
Contributor

Gankra commented Aug 9, 2014

@nham: I believe you are correct. My implementation wouldn't work. You'd need a wrapper around a ptr, I guess.

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 18, 2023
fix: Don't skip closure captures after let-else

As I understand that `return` was left there by accident. It caused capture analysis to skip the rest of the block after a let-else, and then missed captures caused incorrect results in borrowck, closure hints, layout calculation, etc.

Fixes rust-lang#15623

I didn't understand why I using the example from rust-lang#15623 as-is doesn't work - I don't get the warnings unless I remove the `call_me()` call, even on the same commit as my own RA version which does show those warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants