[stdlib] Implement `Dict.popitem()` method #2794

msaelices · 2024-05-22T22:53:56Z

Implement Dict.popitem() method
Added changelog entry

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

JoeLoser · 2024-05-23T00:55:04Z

msaelices · 2024-05-23T20:41:53Z

Closed in favor of: #2701

JoeLoser · 2024-05-29T22:30:10Z

Since Dict.popitem got reverted in b9453e9, want to re-open this and we can see how it does with regard to flakiness?

I'd recommend adding a changelog entry and updating the raise to be more descriptive like raise "KeyError: popitem(): dictionary is empty" from https://github.com/modularml/mojo/pull/2701/files#diff-861767ee9c0157172e9901c202431c23b1609089ab926bdf524ee3e722b209bfR828.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices · 2024-05-29T22:39:47Z

Since Dict.popitem got reverted in b9453e9, want to re-open this and we can see how it does with regard to flakiness?

I'd recommend adding a changelog entry and updating the raise to be more descriptive like raise "KeyError: popitem(): dictionary is empty" from https://github.com/modularml/mojo/pull/2701/files#diff-861767ee9c0157172e9901c202431c23b1609089ab926bdf524ee3e722b209bfR828.

Re-opened and synced from nightly. Also improved the raise sentence.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

docs/changelog.md

JoeLoser

Looks good — just one minor wording suggestion on the changelog entry, and then I'm happy to import and merge this.

jayzhan211 · 2024-05-29T23:49:49Z

stdlib/src/collections/dict.mojo

+        """
+        if self.size == 0:
+            raise "KeyError: popitem(): dictionary is empty"
+        var entry = self._entries.pop(0)


Shouldn't we follow the default behaviour like python in LIFO order?

You are completely right. Fixed here: msaelices@a034f7b

jayzhan211 · 2024-05-30T00:01:02Z

Can we try another approach that is not using reverse(dict.item).
We can iterate in the forward pass, the get the last item. If this kind of approach has no panic issue, then we can know the issue is because of reverse(dict) but not pop. And, we can also have the default behaviour that is consistent as python's FILO

What do you think @JoeLoser @msaelices ?

JoeLoser · 2024-05-30T00:06:57Z

Can we try another approach that is not using reverse(dict.item). We can iterate in the forward pass, the get the last item. If this kind of approach has no panic issue, then we can know the issue is because of reverse(dict) but not pop. And, we can also have the default behaviour that is consistent as python's FILO

What do you think @JoeLoser @msaelices ?

SGTM!

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Co-authored-by: Joe Loser <joeloser@fastmail.com> Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices · 2024-05-30T05:07:45Z

Can we try another approach that is not using reverse(dict.item). We can iterate in the forward pass, the get the last item. If this kind of approach has no panic issue, then we can know the issue is because of reverse(dict) but not pop. And, we can also have the default behaviour that is consistent as python's FILO

What do you think @JoeLoser @msaelices ?

Why do we need to iterate? To get the FILO behaviour we can just do self.entries.pop(self.size - 1)

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

jayzhan211 · 2024-05-30T13:48:15Z

stdlib/src/collections/dict.mojo

+        """
+        if self.size == 0:
+            raise "KeyError: popitem(): dictionary is empty"
+        var entry = self._entries.pop(self.size - 1)


fn _insert(inout self, owned entry: DictEntry[K, V]): self._maybe_resize() var found: Bool var slot: Int var index: Int found, slot, index = self._find_index(entry.hash, entry.key) self._entries[index] = entry^ if not found: self._set_index(slot, index) self.size += 1 self._n_entries += 1

I think index in the _entries is computed by hash function, which mean the element index is not ordered. You pop the item based on the index of self.size - 1, which may not always be the last item we expect.

You can try to pop and insert the dict to mix the dictionary, I think it is possible to get the unexpected result.

iterator including reversed iterator both check if the index in entries has item or not

var opt_entry_ref = self.src[]._entries.__get_ref(self.index

@always_inline fn __next__(inout self) -> Self.ref_type: while True: @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") var opt_entry_ref = self.src[]._entries.__get_ref(self.index) if opt_entry_ref[]: @parameter if forward: self.index += 1 else: self.index -= 1 self.seen += 1 return opt_entry_ref[].value()[] @parameter if forward: self.index += 1 else: self.index -= 1

I'm not sure of it. Please review the _DictEntryIter, I'm deleted non-relevant fragments for this question:

@value struct _DictEntryIter[ ... # omitted forward: Bool = True, ]: """Iterator over immutable DictEntry references. Parameters: ... # omitted forward: The iteration direction. `False` is backwards. """ ... # omitted var index: Int var src: Reference[Dict[K, V], dict_mutability, dict_lifetime] @always_inline fn __next__(inout self) -> Self.ref_type: while True: ... # omitted var opt_entry_ref = self.src[]._entries.__get_ref(self.index) if opt_entry_ref[]: @parameter if forward: self.index += 1 else: self.index -= 1 return opt_entry_ref[].value()[] @parameter if forward: self.index += 1 else: self.index -= 1 ... # omitted

When we call self.items() in the Dict, a _DictEntryIter obj is returned. If you call reversed(self.items()), the same obj is created with the forward param set to False

So, as you can see in the code above, the index is an integer that is being incremented or decremented while looping over the _entries, calling to the dict._entries.__get_ref(index).

This code calls the self._entries.pop(self.size - 1) being self._entries a List object, which works with integers indexes.

So, I think that the code is right. I am not 100% sure as I'm just landing on the Mojo's world. I will complete the current tests to make sure it works as expected

Forget about it. I think you are right. I will refactor the code.

@jayzhan211 fixed here: msaelices@c3112cd

In the end, I finished implementing a very similar approach to yours, as the tests raised fatal errors if we added a new key after calling to popitem(). I think it's because what you said, the _entries list is bigger than the amount of entries with holes on it when we remove items from the dict.

Now, with the current implementation, I'm not sure if we will find the same issue as in your PR code.

@jayzhan211 FYI: It seems @gabrieldemarmiesse has found the cause of the flakiness in your code which it was NOT caused by your code but a boundary bug in the reversed() logic. See #2896

Now I guess we can re-introduce your code or also this one. Let's @JoeLoser decide.

@jayzhan211 Just in case, I've added you to the changelog entry: msaelices@bd0b912

@jayzhan211 FYI: It seems @gabrieldemarmiesse has found the cause of the flakiness in your code which it was NOT caused by your code but a boundary bug in the reversed() logic. See #2896

Now I guess we can re-introduce your code or also this one. Let's @JoeLoser decide.

I don't mind, at least the code looks good to me

@jayzhan211

Fatal errors were detected if we add a new key after calling popitem() Thanks to @jayzhan211 for the heads-up Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

jayzhan211

👍

@parameter

…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87

@parameter

…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87

gabrieldemarmiesse · 2024-06-10T11:47:28Z

@msaelices The PR looks good. But since #2896 was merged into nightly and fixed the flakyness exposed by #2701 , the popitem() implementation of @jayzhan211 was added back to nightly. Maybe we can then close this pull request?

msaelices · 2024-06-10T12:04:44Z

@msaelices The PR looks good. But since #2896 was merged into nightly and fixed the flakyness exposed by #2701 , the popitem() implementation of @jayzhan211 was added back to nightly. Maybe we can then close this pull request?

Sure. Closed.

[stdlib] Implement Dict.popitem() method

d4526e9

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from a team as a code owner May 22, 2024 22:53

msaelices changed the title ~~[stdlib] Implement Dict.popitem() method~~ [stdlib] Implement Dict.popitem() method May 22, 2024

msaelices changed the title ~~[stdlib] Implement Dict.popitem() method~~ [stdlib] Implement Dict.popitem() and Dict.setdefault() methods (WIP) May 22, 2024

msaelices mentioned this pull request May 23, 2024

[stdlib] Support Dict popitem #2701

Closed

msaelices closed this May 23, 2024

msaelices mentioned this pull request May 29, 2024

[BUG] Flaky test, test dict popteim ...get: wrong variant type #2866

Closed

msaelices changed the title ~~[stdlib] Implement Dict.popitem() and Dict.setdefault() methods (WIP)~~ [stdlib] Implement Dict.popitem() method May 29, 2024

msaelices reopened this May 29, 2024

msaelices added 3 commits May 30, 2024 00:34

Merge branch 'nightly' into dict-popitem

cae5a0f

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Better error message if Dict.popitem() is empty

cb3ad6d

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Remove duplicated Dict.__iter__() method

4709457

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Changelog entry for Dict.popitem()

2e45b91

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from a team as a code owner May 29, 2024 22:50

JoeLoser reviewed May 29, 2024

View reviewed changes

docs/changelog.md Outdated Show resolved Hide resolved

JoeLoser approved these changes May 29, 2024

View reviewed changes

JoeLoser self-assigned this May 29, 2024

jayzhan211 reviewed May 29, 2024

View reviewed changes

msaelices and others added 2 commits May 30, 2024 07:01

[stdlib] Honours the Python dict.popitem() LIFO ordering

a034f7b

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Minor wording change on changelog's Dict.popitem() entry

dfb1f15

Co-authored-by: Joe Loser <joeloser@fastmail.com> Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Fix changelog.md line width

e53654a

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

jayzhan211 reviewed May 30, 2024

View reviewed changes

Merge branch 'nightly' into dict-popitem

7000a03

gabrieldemarmiesse mentioned this pull request May 30, 2024

[stdlib] Fix UB in reversed(Dict.values()) and reversed(Dict.items()) #2896

Closed

msaelices added 3 commits May 31, 2024 00:14

[stdlib] Refactor Dict.popitem() code to prevent fatal errors.

c3112cd

Fatal errors were detected if we add a new key after calling popitem() Thanks to @jayzhan211 for the heads-up Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Add @jayzhan211 to the changelog's entry attribution

bd0b912

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[stdlib] Remove testing code

d15c72d

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested review from JoeLoser and jayzhan211 May 30, 2024 22:50

jayzhan211 approved these changes May 30, 2024

View reviewed changes

msaelices closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Implement `Dict.popitem()` method #2794

[stdlib] Implement `Dict.popitem()` method #2794

msaelices commented May 22, 2024 •

edited

Loading

JoeLoser commented May 23, 2024

msaelices commented May 23, 2024

JoeLoser commented May 29, 2024 •

edited

Loading

msaelices commented May 29, 2024

JoeLoser left a comment

jayzhan211 May 29, 2024

msaelices May 30, 2024

jayzhan211 commented May 30, 2024 •

edited

Loading

JoeLoser commented May 30, 2024

msaelices commented May 30, 2024

jayzhan211 May 30, 2024 •

edited

Loading

jayzhan211 May 30, 2024 •

edited

Loading

msaelices May 30, 2024

msaelices May 30, 2024

msaelices May 30, 2024

msaelices May 30, 2024 •

edited

Loading

msaelices May 30, 2024

jayzhan211 May 30, 2024

jayzhan211 left a comment

gabrieldemarmiesse commented Jun 10, 2024

msaelices commented Jun 10, 2024

[stdlib] Implement Dict.popitem() method #2794

[stdlib] Implement Dict.popitem() method #2794

Conversation

msaelices commented May 22, 2024 • edited Loading

JoeLoser commented May 23, 2024

msaelices commented May 23, 2024

JoeLoser commented May 29, 2024 • edited Loading

msaelices commented May 29, 2024

JoeLoser left a comment

Choose a reason for hiding this comment

jayzhan211 May 29, 2024

Choose a reason for hiding this comment

msaelices May 30, 2024

Choose a reason for hiding this comment

jayzhan211 commented May 30, 2024 • edited Loading

JoeLoser commented May 30, 2024

msaelices commented May 30, 2024

jayzhan211 May 30, 2024 • edited Loading

Choose a reason for hiding this comment

jayzhan211 May 30, 2024 • edited Loading

Choose a reason for hiding this comment

msaelices May 30, 2024

Choose a reason for hiding this comment

msaelices May 30, 2024

Choose a reason for hiding this comment

msaelices May 30, 2024

Choose a reason for hiding this comment

msaelices May 30, 2024 • edited Loading

Choose a reason for hiding this comment

msaelices May 30, 2024

Choose a reason for hiding this comment

jayzhan211 May 30, 2024

Choose a reason for hiding this comment

jayzhan211 left a comment

Choose a reason for hiding this comment

gabrieldemarmiesse commented Jun 10, 2024

msaelices commented Jun 10, 2024

[stdlib] Implement `Dict.popitem()` method #2794

[stdlib] Implement `Dict.popitem()` method #2794

msaelices commented May 22, 2024 •

edited

Loading

JoeLoser commented May 29, 2024 •

edited

Loading

jayzhan211 commented May 30, 2024 •

edited

Loading

jayzhan211 May 30, 2024 •

edited

Loading

jayzhan211 May 30, 2024 •

edited

Loading

msaelices May 30, 2024 •

edited

Loading