Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a set slice function. #5

Merged
merged 3 commits into from Nov 11, 2016
Merged

Conversation

speedplane
Copy link

Adds a pretty straightforward __setslice__ function.

Copy link
Owner

@johnsyweb johnsyweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change. I didn't know about __setslice__. According to the docs (https://docs.python.org/2/library/operator.html#operator.setitem), this method is deprecated.

Perhaps we could add this functionality to __setitem__?

I'd also like to see a unit test for this.

Apologies for brevity, am in a mobile device.

@acdha
Copy link
Contributor

acdha commented Nov 8, 2016

While working on django-haystack/django-haystack#1117 I noticed a related change: __delitem__ doesn't handle slices, which my code was triggering. Should I send a separate pull request for this or should we merge it into this?

def __delitem__(self, index):
    if isinstance(index, slice):
        for i in range(*index.indices(self.size)):
            self.elements.pop(i, None)
    else:
        self.elements.pop(index, None)

acdha added a commit to acdha/django-haystack that referenced this pull request Nov 9, 2016
The sparse list implementation avoids allocating the result cache
elements until something triggers loading the actual content.

This has a slight memory increase for very small querysets – roughly a
custom class + dict() vs. a list() – but the actual result data will
usually be much larger and the difference is cancelled out by the time
the result cache is at a thousand items.

This branch is currently broken using the released version of
sparse_list – see johnsyweb/python_sparse_list#5
acdha added a commit to acdha/django-haystack that referenced this pull request Nov 9, 2016
The sparse list implementation avoids allocating the result cache
elements until something triggers loading the actual content.

This has a slight memory increase for very small querysets – roughly a
custom class + dict() vs. a list() – but the actual result data will
usually be much larger and the difference is cancelled out by the time
the result cache is at a thousand items.

This branch is currently broken using the released version of
sparse_list – see johnsyweb/python_sparse_list#5
acdha added a commit to acdha/django-haystack that referenced this pull request Nov 9, 2016
The sparse list implementation avoids allocating the result cache
elements until something triggers loading the actual content.

This has a slight memory increase for very small querysets – roughly a
custom class + dict() vs. a list() – but the actual result data will
usually be much larger and the difference is cancelled out by the time
the result cache is at a thousand items.

This branch is currently broken using the released version of
sparse_list – see johnsyweb/python_sparse_list#5
@johnsyweb
Copy link
Owner

johnsyweb commented Nov 9, 2016

@acdha : Thanks for this. I think that should be a separate PR.

Can you please include a unit test?

Also: prefer the EAFP approach.

@acdha
Copy link
Contributor

acdha commented Nov 9, 2016

@johnsyweb Sure – that was the minimum needed to get it usable for the django-haystack queryset caching. I'll open a separate pull request & clean it up.

Re: EAFP, I used .pop in that patch to avoid having two duplicate try/except blocks – not entirely happy with having the two closely related blocks but it's not exactly a ton of code either way and it's probably better to be fast than trying something more clever.

acdha added a commit to acdha/python_sparse_list that referenced this pull request Nov 9, 2016
* Update __delitem__ to handle slices
* Remove __delslice__ which is deprecated on Python 2, where
  the runtime will create a slice instance and pass it to
  __delitem__ if __delslice__ does not exist.
* Use xrange instead of range to avoid running out of memory
  on Python 2 when passed a slice like [1:] which will
  allocate sys.maxint - 1 elements
* Update the test code to use the same check as the
  main sparse_list

See johnsyweb#5
speedplane added 2 commits November 9, 2016 18:43
…ecated. Handle a number of corner cases, such as adding past the end and adding from the middle past the end. Also, add a test re same.
@speedplane
Copy link
Author

Made the suggested changes. There are also a few corner cases that we now cover, specifically setting a slice that runs past the length of the list. I used the standard python list structure as a reference.

Copy link
Owner

@johnsyweb johnsyweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests, they make the intention much clearer.

# Like a normal list, when we assign past the end, it appends.
sl[100:] = [4, 5]
self.assertEquals([1, 2, None, None, 4, 5], sl)
self.assertEquals(len(sl), 6)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this behaviour surprising.

Given the case:

>>> from sparse_list import SparseList
>>> sl = SparseList(0)
>>> sl
[]
>>> sl[100] = 'test'
>>> len(sl)
101
>>> sl
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, test]

I would expect sl[100:] = to start inserting at index 100.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v0.6 I included this change: a53a984

I hope this works for you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's strange behavior, but I was just trying to copy the implementation of the normal Python list. I suggest that we keep sparse_list as close to list as possible, so it can be used as a drop-in replacement.

Copy link
Owner

@johnsyweb johnsyweb Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks @speedplane.

I can't find this functionality documented anywhere but both CPython and PyPy behave this way.

>>> l = [None, None, None, None]
>>> l[100:] = [4, 5]
>>> l
[None, None, None, None, 4, 5]

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified location:

>>>> l = []
>>>> l[100:] = ['somevalue']
>>>> l[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

In 828c8bb#diff-e1add0b36b652c44614a561f3bdf4532R121 I made the decision that setting an element past the end of a SparseListwould behave differently to a list, since we set arbitrary elements without having to allocate large blocks of memory. And all of the preceding elements have been given a default (which we don't have with list).

As I've implemented this sl[x] will return the first value allocated by a call to sl[x:] =.

If this breaks people's use of SparseLIst, then I'd certainly entertain a PR to change the behaviour to match that of list. Then I'd have to question whether assigning to a single index past the end of the SparseLIst should raise IndexError after all.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://stackoverflow.com/q/40558529/78845 is proving enlightening...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It transpires that Guido van Rossum "can't quite remember why" he implemented it this way: https://code.activestate.com/lists/python-dev/132599/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, sometimes things are the way they are just because they've always been that way.

self.assertEquals(len(sl), 6)
# Now assign from the middle, past the end.
sl[5:] = [6, 7, 8]
self.assertEquals([1, 2, None, None, 4, 6, 7, 8], sl)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this less surprising, but this test relies on previous state.

Let's break this this into separate tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split these tests here: a9a90c6

@@ -58,6 +72,10 @@ def __delitem__(self, item):
except KeyError:
pass

def __setslice__(self, start, stop, vals):
# __setslice__ is deprecated, but kept here for backwards compatability.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks.

if isinstance(index, slice):
vals_idx = 0
# Inserting past the end appends.
if index.start > len(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call len() here rather than access self.size?

@johnsyweb johnsyweb mentioned this pull request Nov 11, 2016
@johnsyweb johnsyweb merged commit ad4e76e into johnsyweb:master Nov 11, 2016
wetneb pushed a commit to dissemin/django-haystack that referenced this pull request Jul 29, 2019
The sparse list implementation avoids allocating the result cache
elements until something triggers loading the actual content.

This has a slight memory increase for very small querysets – roughly a
custom class + dict() vs. a list() – but the actual result data will
usually be much larger and the difference is cancelled out by the time
the result cache is at a thousand items.

This branch is currently broken using the released version of
sparse_list – see johnsyweb/python_sparse_list#5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants