Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33234 Improve list() pre-sizing for inputs with known lengths #6493

Closed
wants to merge 1 commit into from

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Apr 17, 2018

The list() constructor isn't taking full advantage of known input
lengths or length hints. This commit makes the constructor
pre-size and not over-allocate when the input size is known or
can be reasonably estimated.

A new test is added to test_list.py and a test needed to be modified
in test_descr.py as it assumed that there is only one call to
__length_hint__() in the list constructor.

https://bugs.python.org/issue33234

@@ -157,5 +157,12 @@ class L(list): pass
with self.assertRaises(TypeError):
(3,) + L([1,2])

def test_preallocation(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked as a CPython only test.

Copy link
Member Author

@pablogsal pablogsal Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhettinger Amended in 7602879

The list() constructor isn't taking full advantage of known input
lengths or length hints. This commit makes the constructor
pre-size and not over-allocate when the input size is known or
can be reasonably estimated.

A new test is added to test_list.py and a test needed to be modified
in test_descr.py as it assumed that there is only one call to
__length_hint__() in the list constructor.
@@ -2649,6 +2675,13 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
Py_ssize_t iter_len = PyObject_LengthHint(iterable, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_extend() uses PyObject_LengthHint(iterable, 8): we may also use 8 here.

@pablogsal
Copy link
Member Author

pablogsal commented Sep 14, 2018

I have collected more metrics. This are the L1 and L2 cache misses and related information:

THIS PR
Performance counter stats for './python -c for _ in range(10000): list([0]*10000)' (200 runs):

      75916384      cache-references                                              ( +-  2.52% )
        659899      cache-misses              #    0.869 % of all cache refs      ( +-  3.55% )
    4242376466      cycles                                                        ( +-  0.84% )
    5316694198      instructions              #    1.25  insn per cycle           ( +-  0.02% )
    1086705824      branches                                                      ( +-  0.01% )
         71006      faults                                                        ( +-  0.00% )
             0      migrations

   1.708574749 seconds time elapsed                                          ( +-  0.86% )

MASTER
Performance counter stats for './python -c for _ in range(10000): list([0]*10000)' (200 runs):

      91939960      cache-references                                              ( +-  1.98% )
        730659      cache-misses              #    0.795 % of all cache refs      ( +-  2.53% )
    4595555189      cycles                                                        ( +-  0.95% )
    5383346835      instructions              #    1.17  insn per cycle           ( +-  0.02% )
    1097849443      branches                                                      ( +-  0.02% )
         91006      faults                                                        ( +-  0.00% )
             0      migrations

   1.851745158 seconds time elapsed                                          ( +-  0.96% )

As you can see, this shows a great improvement in cache efficiency and branch predictions. As there is some concern regarding what happens with the overhead of calling __len_hint__ I have collected more metrics in that regard:

root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf timeit "list(iter([0]*11))" -o old.json
root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf timeit "list(iter([0]*10000))" -o old_big.json
root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf timeit "list(iter([0]*11))" -o new.json
root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf timeit "list(iter([0]*10000))" -o new_big.json

And these are the results:

root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf compare_to old.json new.json -vv
Mean +- std dev: [old] 1.55 us +- 0.11 us -> [new] 1.66 us +- 0.11 us: 1.07x slower (+7%)
Significant (t=-5.31)
root@ubuntu-s-2vcpu-4gb-sfo2-01:~/cpython# ./python -m perf compare_to old_big.json 
new_big.json
Mean +- std dev: [old_big] 206 us +- 18 us -> [new_big] 155 us +- 11 us: 1.33x faster (-25%)

As you can see the only case when this patch is slow is for very small iterables without __len__, but we are talking about 7% difference (0.11us) vs the chunked linear improvement that you get for the preallocation.

@pablogsal
Copy link
Member Author

CC: @rhettinger

@vstinner
Copy link
Member

I'm not sure about this change because of the possible slowdown for short datasets. Raymond and me proposed to add a slot for length hint. I would suggest to work in two steps:

  • Optimize list allocation only using the length if it's available (not the length hint)
  • Later add a slot for length hint and use it for list allocation

@pablogsal
Copy link
Member Author

Agreed. Will open a new PR with the version that only uses lenght if it is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants