Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A design for single-pass borrowed insert_if_not_there #93

Closed
CAD97 opened this issue May 16, 2019 · 5 comments
Closed

A design for single-pass borrowed insert_if_not_there #93

CAD97 opened this issue May 16, 2019 · 5 comments

Comments

@CAD97
Copy link

CAD97 commented May 16, 2019

I have an IndexSet<String> that I want to be able to insert or get index from &str without allocating needlessly if the value is already in the set.

This is actually possible with one small library addition: entry_index. It would return an entry for the value at the index up to and including .len(), thus allowing insertion of the value there. It wouldn't even need to be unsafe: having two keys that compare equal in the map is only very bad for domain correctness, but not for memory correctness if I understand correctly.

It could then be used for inserting if not present with something like the following:

set.get(&key)
    .unwrap_or_else(|| set.entry_index(set.len)
        .insert(key.to_owned())
@cuviper
Copy link
Member

cuviper commented May 16, 2019

In parity with the std sets, there isn't an entry API at all yet. How about this:

set.get(key).unwrap_or_else(|| {
    let (i, _) = set.insert_full(key.to_owned());
    set.get_index(i).unwrap()
})

The unwrap is unfortunate, but might just get optimized away in the vector bounds check.

@CAD97
Copy link
Author

CAD97 commented May 16, 2019

This does need to do a double lookup, and is what I've got written right now. (As insert needs to scan for a hash equivalence, which while ~O(1), is not free.) The entry idea is to avoid the double lookup, and would apply equally to maps (I just wanted it on a set). Unfortunately some other issues mean that design isn't tenable anyway.

@cuviper
Copy link
Member

cuviper commented May 16, 2019

Right, so it seems like you really want a borrowing version of entry on maps, and then mirror these on sets. That functionality is not really specific to indexmap's design, so I would love to see that API hammered out in std first, and then we can mirror it.

I believe raw_entry_mut is the current plan:
https://doc.rust-lang.org/nightly/std/collections/struct.HashMap.html#method.raw_entry_mut

@cuviper
Copy link
Member

cuviper commented May 16, 2019

Something you may not have realized -- your entry_index(_).insert(_) would also count as a second lookup, because entries have two records: the indexed vector and the hash table mapping to indexes. So looking up the index is cheap, but inserting the value would also need to do the raw hash insert.

You can functionally think of IndexMap like this:

struct IndexMap<K, V> {
    map: HashMap<K, usize>,
    vec: Vec<(K, V)>,
}

... but optimized to not duplicate key storage.

@CAD97 CAD97 closed this as completed May 16, 2019
@cuviper
Copy link
Member

cuviper commented May 16, 2019

See rust-lang/rust#60894 -- I think get_or_insert_with will do what you want. We can replicate that for IndexSet when it's stable.

@cuviper cuviper mentioned this issue Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants