Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing a good way to insert into a HashMap while panicking on duplicates #3092

Closed
m-ou-se opened this issue Mar 4, 2021 · 18 comments · Fixed by rust-lang/rust#82764
Closed
Labels
A-collections Proposals about collection APIs T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Mar 4, 2021

HashMap::insert(key, val) returns Some(old_val) as the 'error' if the key was already in the map. In many cases it's assumed no duplicate keys are ever added, but still asserted by panicking otherwise.

We experimented with map.insert(key, val).unwrap_none() (rust-lang/rust#62633), but decided that that's not the kind of method we'd like to have on Options.

Using an assert macro has some downsides:

assert!(map.insert(key, val).is_none()); // 1
assert_eq!(map.insert(key, val), None); // 2
assert_match!(map.insert(key, val), None); // 3

Not everyone feels comfortable with assertions having side effects.
1 doesn't show any context in the panic message. (Though unwrap_none also didn't show the key or the new value. Only the old value.) 2 would only work if the value type implements PartialEq. 3 uses a macro that doesn't exist (yet) in std.


Updating the HashMap interface might be a good solution:

insert always succeeds because it replaces the old value if it exists. One could argue that insert() is never the right method for panicking on duplicates, since already handles that case by replacing the value, only allowing you to panic after that already happened.

A try_insert method could instead return an Result::Err indicating the duplicate key error, which could also report the full context (key, old value, new value) in the panic message.

map.try_insert(key, val).unwrap();

Alternatively (or additionally), the Entry interface could be updated to have a .unwrap_vacant() method to assume the obtained entry is vacant, which can then be used to insert the value. This can report the key and the old value in the panic message.

map.entry(key).unwrap_vacant().insert(val);
@m-ou-se m-ou-se added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-collections Proposals about collection APIs labels Mar 4, 2021
@Amanieu
Copy link
Member

Amanieu commented Mar 4, 2021

I like try_insert: it clearly indicates that the intended behavior is to not replace any existing entries if there is a conflict. I expect that this is actually a very common case which is not very ergonomic to handle via entry since you need to match on VacantEntry.

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2021

I'd expect it to be something like this:

    /// Tries to insert a key-value pair into the map, and returns
    /// a mutable reference to the value in the entry.
    ///
    /// If the map already had this key present, nothing is updated, and
    /// an error containing the occupied entry and the value is returned.
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    /// use std::collections::BTreeMap;
    ///
    /// let mut map = BTreeMap::new();
    /// assert_eq!(map.try_insert(37, "a").unwrap(), "a");
    ///
    /// let err = map.try_insert(37, "b").unwrap_err();
    /// assert_eq!(err.entry.key(), 37);
    /// assert_eq!(err.entry.get(), "a");
    /// assert_eq!(err.value, "b");
    /// ```
    pub fn try_insert(&mut self, key: K, value: V) -> Result<&mut V, OccupiedError<'_, K, V>>;

Working on an implementation now to try this out.

@BurntSushi
Copy link
Member

In principle I like try_insert as well. I think my only concern is that I think it is slightly divergent from our existing pattern. That is, try_foo and foo I believe usually behave exactly the same with one difference: foo panics where try_foo returns an error. But in this case, the behavior is actually different. insert doesn't panic at all and allows overwriting the key, which is different from try_insert which doesn't overwrite the key.

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2021

@BurntSushi Yes. insert() overwriting by default is unfortunate.

Taking a look at all the different insert methods, we have some that return a bool, some that always replace and return the old value, some that always replace and return a &mut to the newly inserted value, some that return a reference to both the key and the value, and some that return nothing at all.

Unfortunately we can't change or rename insert(). But I don't think the situation is that bad. We already have try_thing/thing pairs where the second one always succeeds: try_from/from, try_fold/fold, although for these the signature is different. Overwriting as 'always succeeds' is a bit worse, but I don't think this is a deal breaker.

@BurntSushi
Copy link
Member

BurntSushi commented Mar 4, 2021

If the try_foo/foo pairs don't have a rigorous pattern, and it looks like they don't, then I think that weakens my concern. I don't necessarily think it's a deal breaker either, but it's good to be aware of it. For example, in theory, we could introduce, say, insert_no_duplicate/try_insert_no_duplicate (its awful name to be bikeshed) as an alternative. But I think that cure may be worse than the disease.

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2021

Before, with .insert(10, "world").unwrap_none():
thread 'main' panicked at 'called `Option::unwrap_none()` on a `Some` value: "hello"', src/main.rs:8:29

After, with .try_insert(10, "world").unwrap():
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OccupiedError { key: 10, old_value: "hello", new_value: "world" }', src/main.rs:6:33

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2021

Implementation: rust-lang/rust#82764

@programmerjake
Copy link
Member

programmerjake commented Mar 4, 2021

possible alternative names: insert_if_vacant, insert_no_replace, insert_only, insert_keep

@kennytm
Copy link
Member

kennytm commented Mar 4, 2021

🌈 Bikeshed! 🌈

We could take MySQL and SQLite's terminology and use insert_or_ignore/insert_ignore.

Redis called this operation "SETNX" (Set if Not eXists), but I don't like negativity in the name 🙃.

C++ calls this try_emplace (thus similar to try_insert).

Java calls this putIfAbsent (thus similar to insert_if_vacant).

Or maybe just call this .occupy()? (not obvious enough?)

map.occupy(key, val).unwrap();

@sollyucko
Copy link

sollyucko commented Mar 4, 2021

A few more alternatives: (try_ +) insert_if_/replace_ + empty/vacant/missing/not_exists, insert_new, setdefault (Python)

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2021

🌈 Bikeshed! 🌈

try_insert_if_this_key_wasn_apostrophe_t_already_there_or_otherwise_return_an_error_also_i_like_my_bikesheds_to_be_pink().

@Amanieu
Copy link
Member

Amanieu commented Mar 4, 2021

I still feel that try_insert is still the best choice out of all of these.

@Bathtor
Copy link

Bathtor commented Mar 4, 2021

Since we are in the unfortunate situation that the function called insert actually does "insert or update", I would be in favour of clearly differentiating the variant that errors on existing entries in the function name, so people can see what it does without having to read the docs.
Without knowing what try_insert does, I'd assume—following the current logic—that it tries to insert or update and returns a Result instead of an Option containing the old value on Ok or an error message if it could neither insert nor update.

There are some great names mentioned in the posts above to pick from. My personal favourites are insert_if_vacant and insert_if_absent.

@tanriol
Copy link

tanriol commented Mar 4, 2021

Would it be technically possible to have the insert method have different signatures in different editions (be an alias for different methods, possibly)? Is it something that cannot be done at all or something that should not be done because of too much churn?

@BurntSushi
Copy link
Member

@tanriol Different editions still need to use the same standard library. And yes, even if we could do it, we shouldn't.

@Ekleog
Copy link

Ekleog commented Mar 4, 2021

Before anything, I'm not at all against try_insert (the concept of returning a Result instead of an Option with unintuitive semantics appeals to me).

However, for the top-post-described problem, I think the currently-possible HashMap::insert(set, val).ok_or(()).unwrap_err() alternative would also work and hasn't been suggested yet? Though it's semantically even worse, as it creates a Result where the success condition is Err, I think it may be a reasonable way of doing things.

@Amanieu
Copy link
Member

Amanieu commented Mar 4, 2021

@Ekleog HashMap::insert(set, val).ok_or(()).unwrap_err() will still modify the HashMap, while HashMap::try_insert(set, val).unwrap() leaves it unmodified. The semantics are subtly different and I would argue that the second is "more correct".

@Ekleog
Copy link

Ekleog commented Mar 4, 2021

Totally agreed, I'm only talking about the problem described in the top-post of replacing .unwrap_none() — if it were just for this problem, I'm not sure introducing a second slightly-redundant function would be great for the language as a whole. (Cases where the HashMap will still stay readable after a panic involving an &mut to it are… I think very rare? [not sure that's even possible but haven't thought about it very long either, and it's probably not the core of the discussion])

But I'm pretty sure there are other problems that are solved by try_insert (if only the problem of “Missing a good way to insert into a HashMap while ignoring on duplicates,” I think?) (edit: .entry().or_insert() works for this case, so let's say the problem of “insert's semantics are weird”) — so I'm mostly saying this so that the full trade-off is clearly spelled out in the discussion :)

JohnTitor added a commit to JohnTitor/rust that referenced this issue Mar 5, 2021
Add {BTreeMap,HashMap}::try_insert

`{BTreeMap,HashMap}::insert(key, new_val)` returns `Some(old_val)` if the key was already in the map. It's often useful to assert no duplicate values are inserted.

We experimented with `map.insert(key, val).unwrap_none()` (rust-lang#62633), but decided that that's not the kind of method we'd like to have on `Option`s.

`insert` always succeeds because it replaces the old value if it exists. One could argue that `insert()` is never the right method for panicking on duplicates, since already handles that case by replacing the value, only allowing you to panic after that already happened.

This PR adds a `try_insert` method that instead returns a `Result::Err` when the key already exists. This error contains both the `OccupiedEntry` and the value that was supposed to be inserted. This means that unwrapping that result gives more context:
```rust
    map.insert(10, "world").unwrap_none();
    // thread 'main' panicked at 'called `Option::unwrap_none()` on a `Some` value: "hello"', src/main.rs:8:29
```

```rust
    map.try_insert(10, "world").unwrap();
    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
    // OccupiedError { key: 10, old_value: "hello", new_value: "world" }', src/main.rs:6:33
```

It also allows handling the failure in any other way, as you have full access to the `OccupiedEntry` and the value.

`try_insert` returns a reference to the value in case of success, making it an alternative to `.entry(key).or_insert(value)`.

r? `@Amanieu`

Fixes rust-lang/rfcs#3092
m-ou-se added a commit to m-ou-se/rust that referenced this issue Mar 5, 2021
Add {BTreeMap,HashMap}::try_insert

`{BTreeMap,HashMap}::insert(key, new_val)` returns `Some(old_val)` if the key was already in the map. It's often useful to assert no duplicate values are inserted.

We experimented with `map.insert(key, val).unwrap_none()` (rust-lang#62633), but decided that that's not the kind of method we'd like to have on `Option`s.

`insert` always succeeds because it replaces the old value if it exists. One could argue that `insert()` is never the right method for panicking on duplicates, since already handles that case by replacing the value, only allowing you to panic after that already happened.

This PR adds a `try_insert` method that instead returns a `Result::Err` when the key already exists. This error contains both the `OccupiedEntry` and the value that was supposed to be inserted. This means that unwrapping that result gives more context:
```rust
    map.insert(10, "world").unwrap_none();
    // thread 'main' panicked at 'called `Option::unwrap_none()` on a `Some` value: "hello"', src/main.rs:8:29
```

```rust
    map.try_insert(10, "world").unwrap();
    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
    // OccupiedError { key: 10, old_value: "hello", new_value: "world" }', src/main.rs:6:33
```

It also allows handling the failure in any other way, as you have full access to the `OccupiedEntry` and the value.

`try_insert` returns a reference to the value in case of success, making it an alternative to `.entry(key).or_insert(value)`.

r? ``@Amanieu``

Fixes rust-lang/rfcs#3092
eggyal pushed a commit to eggyal/copse that referenced this issue Jan 9, 2023
Add {BTreeMap,HashMap}::try_insert

`{BTreeMap,HashMap}::insert(key, new_val)` returns `Some(old_val)` if the key was already in the map. It's often useful to assert no duplicate values are inserted.

We experimented with `map.insert(key, val).unwrap_none()` (rust-lang/rust#62633), but decided that that's not the kind of method we'd like to have on `Option`s.

`insert` always succeeds because it replaces the old value if it exists. One could argue that `insert()` is never the right method for panicking on duplicates, since already handles that case by replacing the value, only allowing you to panic after that already happened.

This PR adds a `try_insert` method that instead returns a `Result::Err` when the key already exists. This error contains both the `OccupiedEntry` and the value that was supposed to be inserted. This means that unwrapping that result gives more context:
```rust
    map.insert(10, "world").unwrap_none();
    // thread 'main' panicked at 'called `Option::unwrap_none()` on a `Some` value: "hello"', src/main.rs:8:29
```

```rust
    map.try_insert(10, "world").unwrap();
    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
    // OccupiedError { key: 10, old_value: "hello", new_value: "world" }', src/main.rs:6:33
```

It also allows handling the failure in any other way, as you have full access to the `OccupiedEntry` and the value.

`try_insert` returns a reference to the value in case of success, making it an alternative to `.entry(key).or_insert(value)`.

r? ```@Amanieu```

Fixes rust-lang/rfcs#3092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants