New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a set slice function. #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change. I didn't know about __setslice__
. According to the docs (https://docs.python.org/2/library/operator.html#operator.setitem), this method is deprecated.
Perhaps we could add this functionality to __setitem__
?
I'd also like to see a unit test for this.
Apologies for brevity, am in a mobile device.
While working on django-haystack/django-haystack#1117 I noticed a related change: def __delitem__(self, index):
if isinstance(index, slice):
for i in range(*index.indices(self.size)):
self.elements.pop(i, None)
else:
self.elements.pop(index, None) |
The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5
The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5
The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5
@johnsyweb Sure – that was the minimum needed to get it usable for the django-haystack queryset caching. I'll open a separate pull request & clean it up. Re: EAFP, I used .pop in that patch to avoid having two duplicate try/except blocks – not entirely happy with having the two closely related blocks but it's not exactly a ton of code either way and it's probably better to be fast than trying something more clever. |
* Update __delitem__ to handle slices * Remove __delslice__ which is deprecated on Python 2, where the runtime will create a slice instance and pass it to __delitem__ if __delslice__ does not exist. * Use xrange instead of range to avoid running out of memory on Python 2 when passed a slice like [1:] which will allocate sys.maxint - 1 elements * Update the test code to use the same check as the main sparse_list See johnsyweb#5
…ecated. Handle a number of corner cases, such as adding past the end and adding from the middle past the end. Also, add a test re same.
Made the suggested changes. There are also a few corner cases that we now cover, specifically setting a slice that runs past the length of the list. I used the standard python list structure as a reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests, they make the intention much clearer.
# Like a normal list, when we assign past the end, it appends. | ||
sl[100:] = [4, 5] | ||
self.assertEquals([1, 2, None, None, 4, 5], sl) | ||
self.assertEquals(len(sl), 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this behaviour surprising.
Given the case:
>>> from sparse_list import SparseList
>>> sl = SparseList(0)
>>> sl
[]
>>> sl[100] = 'test'
>>> len(sl)
101
>>> sl
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, test]
I would expect sl[100:] =
to start inserting at index 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v0.6 I included this change: a53a984
I hope this works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's strange behavior, but I was just trying to copy the implementation of the normal Python list
. I suggest that we keep sparse_list
as close to list
as possible, so it can be used as a drop-in replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, thanks @speedplane.
I can't find this functionality documented anywhere but both CPython and PyPy behave this way.
>>> l = [None, None, None, None]
>>> l[100:] = [4, 5]
>>> l
[None, None, None, None, 4, 5]
I would have expected an IndexError
from list
here because the way this is implemented means that an item can't be retrieved from the specified location:
>>>> l = []
>>>> l[100:] = ['somevalue']
>>>> l[100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
In 828c8bb#diff-e1add0b36b652c44614a561f3bdf4532R121 I made the decision that setting an element past the end of a SparseList
would behave differently to a list
, since we set arbitrary elements without having to allocate large blocks of memory. And all of the preceding elements have been given a default (which we don't have with list
).
As I've implemented this sl[x]
will return the first value allocated by a call to sl[x:] =
.
If this breaks people's use of SparseLIst
, then I'd certainly entertain a PR to change the behaviour to match that of list
. Then I'd have to question whether assigning to a single index past the end of the SparseLIst
should raise IndexError
after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://stackoverflow.com/q/40558529/78845 is proving enlightening...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It transpires that Guido van Rossum "can't quite remember why" he implemented it this way: https://code.activestate.com/lists/python-dev/132599/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, sometimes things are the way they are just because they've always been that way.
self.assertEquals(len(sl), 6) | ||
# Now assign from the middle, past the end. | ||
sl[5:] = [6, 7, 8] | ||
self.assertEquals([1, 2, None, None, 4, 6, 7, 8], sl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this less surprising, but this test relies on previous state.
Let's break this this into separate tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split these tests here: a9a90c6
@@ -58,6 +72,10 @@ def __delitem__(self, item): | |||
except KeyError: | |||
pass | |||
|
|||
def __setslice__(self, start, stop, vals): | |||
# __setslice__ is deprecated, but kept here for backwards compatability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks.
if isinstance(index, slice): | ||
vals_idx = 0 | ||
# Inserting past the end appends. | ||
if index.start > len(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why call len()
here rather than access self.size
?
The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5
Adds a pretty straightforward
__setslice__
function.