Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update() ordering to be more consistent with add() ordering #159

Merged

Conversation

bamartin125
Copy link
Contributor

This pull request fixes #158 by prepending existing elements to new ones (to stick with the same ordering as the add() method).

I have run the included tox tests and benchmarks offline and have not seen any noticeable difference in performance after looking at the benchmark plots.

@grantjenks
Copy link
Owner

I'm not sure this is correct. I think you would need to reverse the order of the inserted values to update(). Can you write a test that adds each element individually and compares with calling update()?

Also, the two test cases currently given look the same to me. Test cases in those files should use the "modulo" and "negate" keys respectively.

@bamartin125
Copy link
Contributor Author

I'm not sure this is correct. I think you would need to reverse the order of the inserted values to update(). Can you write a test that adds each element individually and compares with calling update()?

I don't think I need to reverse the direction of the values to update... I can write the test you suggest and show that I am not required to reverse the order given my change.

Also, the two test cases currently given look the same to me. Test cases in those files should use the "modulo" and "negate" keys respectively.

That is a great suggestion! That will certainly help the tests look consistent. I'll go ahead and mention up front that the modulo and negate functions will be applied to the first element in the tuple and since that first element is always equal, the function will not change the ordering. That's what is needed to most reliably see this effect.

@grantjenks
Copy link
Owner

The modulo test shouldn’t need to use a tuple. Can you try something like this:

import sortedcontainers as sc 
def modulo(num):
  return num % 10
sl = sc.SortedKeyList((num*10 for num in range(10)), key=modulo)
print(sl)
for num in range(0, 1000, 100):
  sl.add(num)
print(sl)
sl.update(list(range(0, 10000, 1000)))
print(sl)

@grantjenks
Copy link
Owner

Having played with it in the repl for a bit I see now the surprising behavior :)

Can you also share your benchmarking (just a snippet from IPython is fine)? It’s typically tricky to measure performance accurately. If we had a million values then I would assume inserting them at the beginning is slower than extending them to the end.

@bamartin125
Copy link
Contributor Author

I had originally used your benchmarking tools from this project to verify I didn't see any differences, but here are some IPython snippets like you asked. I figured it would be best to have the SL populated with some data before calling update().

The first snippet is using the current release of sortedcontainers:

In [1]: from sortedcontainers import SortedList; from itertools import repeat

In [2]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e6)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
633 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [3]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e3)))); sl = SortedList(data, key=lambda x: x[0]);
    ...: sl.update(data)
    ...: 
    ...: 
590 µs ± 58 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [4]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(10)))); sl = SortedList(data, key=lambda x: x[0]);
    ...: sl.update(data)
    ...: 
    ...: 
9.46 µs ± 872 ns per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [5]: print(sortedcontainers.__file__)
/home/bmartin/tmp/py38env/lib/python3.8/site-packages/sortedcontainers/__init__.py

The second snippet is with my change:

In [1]: from sortedcontainers import SortedList; from itertools import repeat

In [2]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e6)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
681 ms ± 5.68 ms per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [3]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e3)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
718 µs ± 101 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [4]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(10)))); sl = SortedList(data, key=lambda x: x[0]);
    ...: sl.update(data)
    ...: 
    ...: 
17.6 µs ± 4.12 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [5]: print(sortedcontainers.__file__)
/home/bmartin/tmp/python-sortedcontainers-bam/sortedcontainers/__init__.py

So, there is certainly some performance hit. It seems like the hit ratio is worse as the number of elements being inserted drops. So it seems that there is probably some initial overhead which is proportional to the size, but that the overhead is much smaller than the actual time it takes to complete the update() process.

@grantjenks
Copy link
Owner

I was able to decrease the performance hit by using reduce. Here's my measurements:

In [1]: from itertools import chain                                                                           

In [2]: from functools import reduce                                                                          

In [3]: from operator import iadd                                                                             

In [4]: values = list(range(1_000_000))                                                                       

In [5]: _lists = [list(range(offset, offset + 1_000)) for offset in range(0, 1_000_000, 1_000)]               

In [6]: %%timeit -n 1 -r 15 test_values = values.copy(); test_lists = _lists.copy() 
   ...: test_values.extend(chain.from_iterable(test_lists)) 
   ...:  
   ...:                                                                                                       
12.7 ms ± 697 µs per loop (mean ± std. dev. of 15 runs, 1 loop each)

In [7]: %%timeit -n 1 -r 15 test_values = values.copy(); test_lists = _lists.copy() 
   ...: test_lists.append(values) 
   ...: test_values = reduce(iadd, test_lists, []) 
   ...:  
   ...:                                                                                                       
17.9 ms ± 1.47 ms per loop (mean ± std. dev. of 15 runs, 1 loop each)

In [8]: %%timeit -n 1 -r 15 test_values = values.copy(); test_lists = _lists.copy() 
   ...: test_values[0:0] = chain.from_iterable(test_lists) 
   ...:  
   ...:                                                                                                       
33.9 ms ± 3.53 ms per loop (mean ± std. dev. of 15 runs, 1 loop each)

The ~41% slow down is still pretty bad but not nearly as bad as 167%.

I suspect the cause is that chain.from_iterable can provide no size hint for the list growth.

Can you update the code to use reduce? Here's my code:

modified   sortedcontainers/sortedlist.py
@@ -339,7 +339,8 @@ class SortedList(MutableSequence):
 
         if _maxes:
             if len(values) * 4 >= self._len:
-                values.extend(chain.from_iterable(_lists))
+                _lists.append(values)
+                values = reduce(iadd, _lists, [])
                 values.sort()
                 self._clear()
             else:
@@ -1878,7 +1879,8 @@ class SortedKeyList(SortedList):
 
         if _maxes:
             if len(values) * 4 >= self._len:
-                values.extend(chain.from_iterable(_lists))
+                _lists.append(values)
+                values = reduce(iadd, _lists, [])
                 values.sort(key=self._key)
                 self._clear()
             else:

@bamartin125
Copy link
Contributor Author

What you've got there looks awesome! I'll get your changes pushed in and do a last cursory check.

@bamartin125
Copy link
Contributor Author

The benchmarks plots look good on my end. Also, I have run the same tests that I did on the IPython console with the new changes:

In [1]: from sortedcontainers import SortedList; from itertools import repeat

In [2]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e6)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
668 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [3]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(1e3)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
641 µs ± 180 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)

In [4]: %%timeit -n 3 -r 7 data = list(zip(repeat(0), range(int(10)))); sl = SortedList(data, key=lambda x: x[0]);
   ...: sl.update(data)
   ...: 
   ...: 
9.34 µs ± 773 ns per loop (mean ± std. dev. of 7 runs, 3 loops each)

Timings now look like (668ms, 641us, 9.34us respectively)

These show an improvement in performance to the timings done with the last iteration of code (681ms, 718us, 17.6us respectively)

These also compare nicely to the original timings of the code (before any changes on this issue) (633ms, 590us, 9.46us respectively)

I feel good about this after @grantjenks suggested this latest code change. Thanks!

Any other questions about this unanswered @grantjenks ?

@grantjenks grantjenks merged commit 7dc426c into grantjenks:master Nov 4, 2020
@grantjenks
Copy link
Owner

I'll try to push a new version this week.

If I forget, please ping me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SortedList/SortedKeyList update/add insertion order inconsistency
2 participants