Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features request: Mutably lookup multiple distinct values #151

Closed
HeroicKatora opened this issue Apr 8, 2020 · 7 comments · Fixed by #239
Closed

Features request: Mutably lookup multiple distinct values #151

HeroicKatora opened this issue Apr 8, 2020 · 7 comments · Fixed by #239

Comments

@HeroicKatora
Copy link

In the current interface looking up a single key will borrow the entire map and in particular a mutable lookup requires a unique borrow of the entire map. This is very safe but also overly restrictive. In comparison, slices (as a very primitive key-value-map) permit this same operation with a split_mut method which for example allows defining an entirely safe method for acquiring elements at two different indices at once:

fn at2<T>(slice: &mut [T], a: usize, b: usize) -> (&mut T, &mut T) {
    let (_, slice) = slice.split_at_mut(cmp::min(a, b));
    let (head, tail) = slice.split_at_mut(cmp::max(a, b) - cmp::min(a, b));
}

If we try to replace the slice with a HashMap, for example as a sparse data structure, however we hit a brickwall. There is a bad workaround: Use an iterator over all mutable reference, filter it by the two elements we are interested in and return them. This is both inefficient, unergonomic, and brittle.

fn at2<T>(map: &mut HashMap<usize, T>, a: usize, b: usize) -> (&mut T, &mut T) {
    let mut values: HashMap<_, _> = map.iter_mut().collect();
    (values.remove(&a).unwrap(), values.remove(&b).unwrap())
}

This is embarrisingly expensive, cloning the entire map and recalculating all hashes, while the existence of the safe workaround should also make it clear that there is no formal issue preventing the simultaneous use of multiple values (the backing memory is allocated similar to a vector) while querying keys.

A few remarks on a first draft

I'm not quite sure how the interface should look like. It could be some simple interface returning raw pointers to elements which requires the caller to implement correct safety and uniqueness guarantees.

There is also a more ambitious solution. Do note that raw::RawTable has the interfaces needed to build a less expensive safe abstraction but it is too raw. It is both very complex in that almost everything is unsafe and not exposed in the standard library 🙂

Let's think about how the overhead of the workaround could be lessened by using internal interface. Such a virtual clone would only need to support removal, and no insertion. It would borrow most data structures from the original map. By not allowing insertion, we would not have to recalculate the hashes or do rehashing. It also need not actually remove elements but only mark them borrowed, which can be even cheaper than cloning and modifying the control bytes. This marking could even use a small-vec optimization if only few elements have already been borrowed.

@Amanieu
Copy link
Member

Amanieu commented Apr 9, 2020

Here is a variant of at2 that avoids cloning the map:

fn at2<K, V>(map: &mut HashMap<K, V>, a: &K, b: &K, cb: impl FnOnce(&mut V, &mut V)) {
    // Remove values from map
    let mut va = map.remove(a).unwrap();
    let mut vb = map.remove(b).unwrap();

    // Pass mutable references to callback
    cb(&mut va, &mut vb);

    // Re-insert modified values into map
    map.insert(a, va);
    map.insert(b, vb);
}

There are obvious improvements to make on this (e.g. use a drop guard for reinsertion, report errors nicely, support return values from the callback, etc) but it should be a good start.

@HeroicKatora
Copy link
Author

HeroicKatora commented Apr 9, 2020

It's not a true replacement because the callback must be, as written in the function signature, able to take keys and values of arbitrarily short lifetime and not only those of the map itself. That is a huge restriction in my point of view for the general case.

@Amanieu
Copy link
Member

Amanieu commented Apr 9, 2020

I think that a proper solution would probably have to wait until we have support for const generics. This would allow us to design an API like this:

impl<K, V> HashMap<K, V> {
    fn get_mut_multi<const N: usize>(&mut self, keys: [&K; N]) -> [Option<&mut V>; N] { ... }
}

@HeroicKatora
Copy link
Author

I don't think it would be much worse with an iterator based design. When querying a key twice it would yield a None the second time. Is there precedent for incorrect implementations of Borrow, Eq causing panics? If so then that may be an alternative. Otherwise, it would probably hurt usability as it would be the caller's responsibility to ensure that keys are unequal which they may not always be able or willing to do.

impl<K, V> HashMap<K, V> {
    fn get_multi_mut(&mut self, keys: &[K]) -> MultiIterMut<'_, K, V> { ... }
}

impl<'val> Iterator for MultiIterMut<'val, K, V> {
    type Item = Option<&'val mut V>;
}

@Amanieu
Copy link
Member

Amanieu commented Apr 9, 2020

Unfortunately that won't work. Eq is allowed to "lie" with only safe code, so we need to manually ensure that we never return two mutable references to the same value. This means that we need to keep track of all the references we have given out so far in a Vec. There is no way around this.

@HeroicKatora
Copy link
Author

HeroicKatora commented Apr 9, 2020

Yes, I realize that. How would the array based design avoid this in any case? I don't think it's much different from querying a key after it has been removed. Also, tracking which reference have been given out is a way to make it work, so I don't see how 'that won't work'.

@Amanieu
Copy link
Member

Amanieu commented Apr 9, 2020

OK so you can do this with a Vec, however the main downside is that it forces you to allocate memory on every lookup, which is going to hurt performance. The array-based version on the other hand will allow the compiler to optimize most overhead away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants