Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative index inconsistency, simple pop function #1

Merged
merged 2 commits into from Feb 6, 2016

Conversation

zsrinivas
Copy link
Contributor

accessing with negative index is possible but popping with negative index is giving wrong results

import xheap
rmh = xheap.RemovalHeap()
for x in xrange(10):
    rmh.push(x)
print rmh.pop(-1)

that will print 8, instead of 9

and _pop function can be simplified by leaving all those conditions/cases to _siftup which it is expected to do.

super(RemovalHeap, self).__setitem__(key, value)
self._indexes[value] = key
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only assign self._indexes[value] = key after the super(RemovalHeap, self).__setitem__(key, value) has completed without any exception.

@srkunze
Copy link
Owner

srkunze commented Feb 2, 2016

Thanks; just great!

Your pull request basically consists of three items.

  1. making pop smaller and faster << wow
  2. prevent setting _indexes although __setitem__ might raise an exception << thanks
  3. negative index handling << good catch again; thanks

About 3) I am not quite happy with the idea that people can mess around with the heap. The index parameter of pop has been introduced in a former design (you might find evidence in the history of that) but during splitting things up into separate classes, I removed it (at least in my mind) from Heap and OrderHeap. Now I see that there are still traces of them (thanks for that as well).

My target design will include a parameterless pop (just eject the smallest item) and for RemovalHeap and XHeap a private _pop(self, index) only supposed to be used for remove(self, item).

How do you feel about this?

Apart from that, I am very happy to pull your changes.

@srkunze
Copy link
Owner

srkunze commented Feb 2, 2016

Oh wait. I've run the testsuite and it gave me the following errors:

Error
Traceback (most recent call last):
  File "src/xheap/test_xheap.py", line 193, in test_pop
    popped_item = heap.pop()
  File "src/xheap/xheap.py", line 122, in pop
    return_item = self._pop(index)
  File "src/xheap/xheap.py", line 129, in _pop
    _siftup(self, min(index, len(self) - 1))
  File "/usr/lib/python3.4/heapq.py", line 290, in _siftup
    newitem = heap[pos]
IndexError: list index out of range


Error
Traceback (most recent call last):
  File "src/xheap/test_xheap.py", line 206, in test_remove
    heap.check()
  File "src/xheap/xheap.py", line 157, in check
    super(RemovalHeap, self).check()
  File "src/xheap/xheap.py", line 52, in check
    self.check_invariant()
  File "src/xheap/xheap.py", line 58, in check_invariant
    raise InvalidHeapError('heap invariant (heap[{parent_index}] <= heap[{index}]) violated: {parent} !<= {item}'.format(parent=self[parent_index], parent_index=parent_index, item=self[index], index=index))
xheap.InvalidHeapError: heap invariant (heap[6] <= heap[14]) violated: L !<= G


Error
Traceback (most recent call last):
  File "src/xheap/test_xheap.py", line 275, in test_pop
    popped_item = heap.pop()
  File "src/xheap/xheap.py", line 217, in pop
    return_item = self._pop(index)[1]
  File "src/xheap/xheap.py", line 225, in _pop
    _siftup(self, min(index, len(self) - 1))
  File "/usr/lib/python3.4/heapq.py", line 290, in _siftup
    newitem = heap[pos]
IndexError: list index out of range


Error
Traceback (most recent call last):
  File "src/xheap/test_xheap.py", line 287, in test_remove
    heap.remove(c)
  File "src/xheap/xheap.py", line 208, in remove
    self.pop(index)
  File "src/xheap/xheap.py", line 217, in pop
    return_item = self._pop(index)[1]
  File "src/xheap/xheap.py", line 225, in _pop
    _siftup(self, min(index, len(self) - 1))
  File "/usr/lib/python3.4/heapq.py", line 290, in _siftup
    newitem = heap[pos]
IndexError: list index out of range

@zsrinivas
Copy link
Contributor Author

Yeah it is a good idea not to use index param in pop. after all private methods are completely accessible in python. but if you want to show that as a feature then may be it is better to make it an external function.

sorry about the Error's in the code i forgot to _siftdown first because _siftup(index) never considers about parents of index.
actually I ran the tests before this PR with python xheap_test.py thinking that you have already included unittest.main() in the file. :)

@srkunze
Copy link
Owner

srkunze commented Feb 3, 2016

Good, then I will remove the index from pop().

No problem. :) Do you really think its efficient to do both siftdown and siftup? When looking at the code in heapq, it seems that it's either siftdown or siftup not both. Doing both seems inefficient.

Oh, I didn't know one could do that. I run the tests using PyCharm. :)

else:
return_value = self[index]
self[index] = last_item
if self[(index - 1) >> 1] < last_item:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your code in case of self[(index - 1) >> 1] < last_item evaluates to True the _siftup is executed, in my code the the _siftdown operation exits as soon as it find that condition.

In [13]: from heapq import _siftdown

In [14]: _siftdown??
Signature: _siftdown(heap, startpos, pos)
Source:
def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
###################HERE############# 
#this condition evaluates to false and it breaks out
        if cmp_lt(newitem, parent):   
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem
File:      /usr/lib/python2.7/heapq.py
Type:      function

similarly in other case _siftup breaks out in the first iteration. these lines are simply doing redundant checks which are already performed by the _siftup and _siftdown.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this for _siftdown. But _siftup does not check at all. It pulls ALL items up without breaking out as it compares not the new item but siblings:

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

If you say that's not significant (due to some math or so), that's alright. Just asking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, i never bothered to see the _siftup code of heapq module. i really have no idea why there not breaking when both of children's values are valid w.r.t heap[pos]
let me ask about it in the python-list and come back here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. It seems we cannot get better than this. I merge this now. Thanks. :)

srkunze added a commit that referenced this pull request Feb 6, 2016
negative index inconsistency, simple pop function
@srkunze srkunze merged commit 316e41e into srkunze:master Feb 6, 2016
@srkunze
Copy link
Owner

srkunze commented Feb 6, 2016

Btw. this is another benchmark showing the probability of how many siftups and siftdowns need to be done (same script like the one on the mailing list):

('cmp ', 4304304)
('up ', 948690)
('down', 51283)

('cmp ', 4302243)
('up ', 948592)
('down', 51380)

('cmp ', 4299502)
('up ', 948911)
('down', 51063)

5% siftdown
95% siftup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants