Add a set slice function. #5

speedplane · 2016-11-08T15:50:02Z

Adds a pretty straightforward __setslice__ function.

johnsyweb

Thanks for this change. I didn't know about __setslice__. According to the docs (https://docs.python.org/2/library/operator.html#operator.setitem), this method is deprecated.

Perhaps we could add this functionality to __setitem__?

I'd also like to see a unit test for this.

Apologies for brevity, am in a mobile device.

acdha · 2016-11-08T23:57:46Z

While working on django-haystack/django-haystack#1117 I noticed a related change: __delitem__ doesn't handle slices, which my code was triggering. Should I send a separate pull request for this or should we merge it into this?

def __delitem__(self, index):
    if isinstance(index, slice):
        for i in range(*index.indices(self.size)):
            self.elements.pop(i, None)
    else:
        self.elements.pop(index, None)

The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5

johnsyweb · 2016-11-09T06:56:21Z

@acdha : Thanks for this. I think that should be a separate PR.

Can you please include a unit test?

Also: prefer the EAFP approach.

acdha · 2016-11-09T14:36:02Z

@johnsyweb Sure – that was the minimum needed to get it usable for the django-haystack queryset caching. I'll open a separate pull request & clean it up.

Re: EAFP, I used .pop in that patch to avoid having two duplicate try/except blocks – not entirely happy with having the two closely related blocks but it's not exactly a ton of code either way and it's probably better to be fast than trying something more clever.

* Update __delitem__ to handle slices * Remove __delslice__ which is deprecated on Python 2, where the runtime will create a slice instance and pass it to __delitem__ if __delslice__ does not exist. * Use xrange instead of range to avoid running out of memory on Python 2 when passed a slice like [1:] which will allocate sys.maxint - 1 elements * Update the test code to use the same check as the main sparse_list See johnsyweb#5

…ecated. Handle a number of corner cases, such as adding past the end and adding from the middle past the end. Also, add a test re same.

speedplane · 2016-11-09T23:51:00Z

Made the suggested changes. There are also a few corner cases that we now cover, specifically setting a slice that runs past the length of the list. I used the standard python list structure as a reference.

johnsyweb

Thanks for adding these tests, they make the intention much clearer.

johnsyweb · 2016-11-11T09:19:47Z

t_sparse_list.py

+        # Like a normal list, when we assign past the end, it appends.
+        sl[100:] = [4, 5]
+        self.assertEquals([1, 2, None, None, 4, 5], sl)
+        self.assertEquals(len(sl), 6)


I find this behaviour surprising.

Given the case:

>>> from sparse_list import SparseList >>> sl = SparseList(0) >>> sl [] >>> sl[100] = 'test' >>> len(sl) 101 >>> sl [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, test]

I would expect sl[100:] = to start inserting at index 100.

In v0.6 I included this change: a53a984

I hope this works for you.

I agree it's strange behavior, but I was just trying to copy the implementation of the normal Python list. I suggest that we keep sparse_list as close to list as possible, so it can be used as a drop-in replacement.

This makes sense, thanks @speedplane.

I can't find this functionality documented anywhere but both CPython and PyPy behave this way.

>>> l = [None, None, None, None] >>> l[100:] = [4, 5] >>> l [None, None, None, None, 4, 5]

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified location:

>>>> l = [] >>>> l[100:] = ['somevalue'] >>>> l[100] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list index out of range

In 828c8bb#diff-e1add0b36b652c44614a561f3bdf4532R121 I made the decision that setting an element past the end of a SparseListwould behave differently to a list, since we set arbitrary elements without having to allocate large blocks of memory. And all of the preceding elements have been given a default (which we don't have with list).

As I've implemented this sl[x] will return the first value allocated by a call to sl[x:] =.

If this breaks people's use of SparseLIst, then I'd certainly entertain a PR to change the behaviour to match that of list. Then I'd have to question whether assigning to a single index past the end of the SparseLIst should raise IndexError after all.

http://stackoverflow.com/q/40558529/78845 is proving enlightening...

It transpires that Guido van Rossum "can't quite remember why" he implemented it this way: https://code.activestate.com/lists/python-dev/132599/

Ha, sometimes things are the way they are just because they've always been that way.

johnsyweb · 2016-11-11T09:22:00Z

t_sparse_list.py

+        self.assertEquals(len(sl), 6)
+        # Now assign from the middle, past the end.
+        sl[5:] = [6, 7, 8]
+        self.assertEquals([1, 2, None, None, 4, 6, 7, 8], sl)


I find this less surprising, but this test relies on previous state.

Let's break this this into separate tests.

I split these tests here: a9a90c6

johnsyweb · 2016-11-11T09:22:19Z

sparse_list.py

@@ -58,6 +72,10 @@ def __delitem__(self, item):
        except KeyError:
            pass

+    def __setslice__(self, start, stop, vals):
+        # __setslice__ is deprecated, but kept here for backwards compatability.


Nice. Thanks.

johnsyweb · 2016-11-11T09:23:48Z

sparse_list.py

+        if isinstance(index, slice):
+            vals_idx = 0
+            # Inserting past the end appends.
+            if index.start > len(self):


Why call len() here rather than access self.size?

The sparse list implementation avoids allocating the result cache elements until something triggers loading the actual content. This has a slight memory increase for very small querysets – roughly a custom class + dict() vs. a list() – but the actual result data will usually be much larger and the difference is cancelled out by the time the result cache is at a thousand items. This branch is currently broken using the released version of sparse_list – see johnsyweb/python_sparse_list#5

Add a set slice function.

981a1b9

johnsyweb requested changes Nov 8, 2016

View reviewed changes

acdha mentioned this pull request Nov 9, 2016

WIP: Use sparse_list to save memory (see #1117) django-haystack/django-haystack#1451

Open

acdha mentioned this pull request Nov 9, 2016

Bad Assumption in SearchQuerySet._fill_cache Leads to Performance Degradation django-haystack/django-haystack#1117

Open

acdha mentioned this pull request Nov 9, 2016

Update __delitem__ implementation #7

Merged

speedplane added 2 commits November 9, 2016 18:43

Re-implement setslice using the setitem operator, as setslice is depr…

d3a95a5

…ecated. Handle a number of corner cases, such as adding past the end and adding from the middle past the end. Also, add a test re same.

Use range here, not xrange.

ad4e76e

johnsyweb requested changes Nov 11, 2016

View reviewed changes

johnsyweb mentioned this pull request Nov 11, 2016

Implement slicing #8

Merged

johnsyweb merged commit ad4e76e into johnsyweb:master Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a set slice function. #5

Add a set slice function. #5

speedplane commented Nov 8, 2016

johnsyweb left a comment

acdha commented Nov 8, 2016 •

edited

johnsyweb commented Nov 9, 2016 •

edited

acdha commented Nov 9, 2016

speedplane commented Nov 9, 2016

johnsyweb left a comment

johnsyweb Nov 11, 2016

johnsyweb Nov 11, 2016

speedplane Nov 11, 2016

johnsyweb Nov 11, 2016 •

edited

johnsyweb Nov 12, 2016

johnsyweb Dec 4, 2016

speedplane Dec 5, 2016

johnsyweb Nov 11, 2016

johnsyweb Nov 11, 2016

johnsyweb Nov 11, 2016

johnsyweb Nov 11, 2016

Add a set slice function. #5

Add a set slice function. #5

Conversation

speedplane commented Nov 8, 2016

johnsyweb left a comment

Choose a reason for hiding this comment

acdha commented Nov 8, 2016 • edited

johnsyweb commented Nov 9, 2016 • edited

acdha commented Nov 9, 2016

speedplane commented Nov 9, 2016

johnsyweb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnsyweb Nov 11, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acdha commented Nov 8, 2016 •

edited

johnsyweb commented Nov 9, 2016 •

edited

johnsyweb Nov 11, 2016 •

edited