Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why IndexedSet not update the index of items? #240

Closed
Urahara opened this issue Feb 10, 2020 · 11 comments
Closed

Why IndexedSet not update the index of items? #240

Urahara opened this issue Feb 10, 2020 · 11 comments

Comments

@Urahara
Copy link

Urahara commented Feb 10, 2020

What is the motivation of not updating the current index of items when they are removed? It's this a bug? Because i starting using IndexedSet to pop items from a list that i have and i start get index of range errors.

boltons/boltons/setutils.py

Lines 201 to 209 in bf8a659

def remove(self, item):
"remove(item) -> remove item from the set, raises if not present"
try:
didx = self.item_index_map.pop(item)
except KeyError:
raise KeyError(item)
self.item_list[didx] = _MISSING
self._add_dead(didx)
self._cull()

@mahmoud
Copy link
Owner

mahmoud commented Feb 10, 2020

Hi Urahara, Can you paste the stack trace, and maybe a reproduction of the problem?

I never rule out the possibility of bugs, but IndexedSet updates the index in the code you've linked.

@Urahara
Copy link
Author

Urahara commented Feb 10, 2020

@mahmoud Of course! Here we go:

from boltons.setutils import IndexedSet

original_list = []

for i in range(0, 8):
    original_list.append(i)

indexed_list = IndexedSet()

for i in original_list:
    key = "key_{}".format(i)
    indexed_list.add(key)

for i in original_list:
    key = "key_{}".format(i)
    index = indexed_list.index(key)
    indexed_list.pop(index)

Exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/boltons/setutils.py", line 422, in index
    return self.item_index_map[val]
KeyError: 'key_2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/urahara/tests.py", line 18, in <module>
    index = indexed_list.index(key)
  File "/usr/local/lib/python3.6/dist-packages/boltons/setutils.py", line 425, in index
    raise ValueError('%r is not in %s' % (val, cn))
ValueError: 'key_2' is not in IndexedSet

Process finished with exit code 1

For what i see this kind of error begin ocurrs after len of set is >= 8.

I alter pop method to:

    def pop(self, index=None):
        "pop(index) -> remove the item at a given index (-1 by default)"
        item_index_map = self.item_index_map
        len_self = len(item_index_map)
        if index is None or index == -1 or index == len_self - 1:
            ret = self.item_list.pop()
            del item_index_map[ret]
        else:
            real_index = self._get_real_index(index)
            ret = self.item_list[real_index]
            self.item_list.pop(real_index)
            del item_index_map[ret]
            for idx, item in enumerate(self.item_list): # recalculating index
                item_index_map[item] = idx
            #self._add_dead(real_index)
        self._cull()
        return ret

And the error go away and code works like expected, but i need to know the reason of the existence of dead_indices, _get_real_indexand_cull` to make a proper correction.

@mahmoud
Copy link
Owner

mahmoud commented Feb 10, 2020

Ah, I had understood your original report to be about .remove() not .index().

The short answer is that for performance reasons, IndexedSet is designed to avoid changing underlying datastructures on every operation. There's a dictionary for O(1) presence check and a list for O(1) index access. The frequently-called methods insert sentinel gravestones and ._cull() takes care of resizes.

There may be an issue in keeping these in sync with the "list" portion of the API (index/pop come from the list tradition, add/remove/etc. come from the set tradition, if that makes sense.)

If you'd like to provide a PR which maintains the amortized behavior, I'd definitely consider and appreciate it. :)

@Urahara
Copy link
Author

Urahara commented Feb 10, 2020

At first i was using .index() then i move out to .remove() but i still facing the same issue above, sorry for misunderstood.

@mahmoud Do you have any suggestion how can i implement this without compromise the perfomance?

@mahmoud
Copy link
Owner

mahmoud commented Mar 30, 2020

I went ahead and did it, by adding an internal method _get_apparent_index() to complement _get_real_index. Your reproduction code above lives on in a modified state in one of IndexedSet's tests.

As far as performance, the amortized culling mechanisms throughout the rest of the API maintain the basic time complexity, so I didn't get too fancy. If you run into performance troubles for the core APIs, we can perhaps optimize further. Thanks again for the excellent report!

@Urahara
Copy link
Author

Urahara commented Mar 30, 2020

@mahmoud Hi! Thanks, i upgraded my module and i find another bug, now i get negative indexes and exceptions too:

from boltons.setutils import IndexedSet

original_list = []

for i in range(0, 10):
    original_list.append(i)

indexed_list = IndexedSet()

for i in original_list:
    key = "key_{}".format(i)
    indexed_list.add(key)

for i in original_list:
    if i > 2:
        key = "key_{}".format(i)
        index = indexed_list.index(key)
        print(index)
        indexed_list.pop(index)

Result:

3
-3
4
-4
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/boltons/setutils.py", line 434, in index
    return self._get_apparent_index(self.item_index_map[val])
KeyError: 'key_7'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/Urahara/Projects/tests.py", line 17, in <module>
    index = indexed_list.index(key)
  File "/usr/local/lib/python3.6/dist-packages/boltons/setutils.py", line 437, in index
    raise ValueError('%r is not in %s' % (val, cn))
ValueError: 'key_7' is not in IndexedSet

This fix is only when i pop items in order?

@mahmoud
Copy link
Owner

mahmoud commented Mar 30, 2020

Oh my goodness, thanks for circling back and rechecking! I added more tests (including that one), and caught/fixed the bug. If you want to test on master, it's there now.

@Urahara
Copy link
Author

Urahara commented Mar 30, 2020

I get the same error. :( I get the file from master and try run locally.

Also: i found another user case:

from setutils import IndexedSet

original_list = []

for i in range(32):
    original_list.append(i)

indexed_list = IndexedSet()

for i in original_list:
    key = "key_{}".format(i)
    indexed_list.add(key)

for i in original_list:
    if i % 3:
        key = "key_{}".format(i)
        index = indexed_list.index(key)
        print(index)
        indexed_list.pop(index)

Thanks for quickly feedback!

@mahmoud
Copy link
Owner

mahmoud commented Mar 30, 2020

Ah, yes, I saw the problem with this one pretty quick. Test added and new fix pushed. Keep those tests coming, I say! :)

@Urahara
Copy link
Author

Urahara commented Mar 31, 2020

Now i think is 100%, thanks again man! Great work! :)

@mahmoud
Copy link
Owner

mahmoud commented Mar 31, 2020

Wouldn't be 100% without your diligence, huge thanks! I'll get this released ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants