Skip to content

[mypyc] Pre-allocate space in list comprehensions #10295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 27, 2021

Conversation

97littleleaf11
Copy link
Collaborator

@97littleleaf11 97littleleaf11 commented Apr 8, 2021

Description

Closes mypyc/mypyc#473

  • Reconstruct tuple_creation_from_generator_helper to preallocate_space_helper
  • Rewrite new_tuple_with_length. Now we need only to pass in length.
  • Add new primitive for list initialization.
  • Support faster list comprehension in some simple situations.

Test Plan

@97littleleaf11
Copy link
Collaborator Author

Encounter segment fault locally when i want to import module compiled by this branch

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 12, 2021

Can you share the example which is failing for you? I can see if I can spot the issue.

@97littleleaf11
Copy link
Collaborator Author

97littleleaf11 commented Apr 13, 2021

Here is my local test file tmp.py:

def f(x: str) -> str:
    return x

def test() -> None:
    source = ["a", "b", "c"]
    a = list(f(x) for x in source)
    print(a)

test()

compile and run:

python -c "import tmp"
[1]    58318 segmentation fault  python -c "import tmp"

If I comment line 96 - 101 of specialize.pyof this branch, it could work normally.

@JelleZijlstra
Copy link
Member

Did you try to find the cause in gdb?

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 13, 2021

Hmm we should probably document how to debug C-level crashes on different platforms. It's pretty easy on Linux but I'm actually not sure what's the best way to do it on macOS or Windows. I'll create an issue.

@97littleleaf11
Copy link
Collaborator Author

97littleleaf11 commented Apr 15, 2021

I am using lldb on Mac. Then I got this

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001011e2c44 tmp4.cpython-37m-darwin.so`CPyDef_test at list_ops.c:77:9 [opt]
   74               }
   75           }
   76           // PyList_SET_ITEM doesn't decref the old element, so we do
-> 77           Py_DECREF(PyList_GET_ITEM(list, n));
   78           // N.B: Steals reference
   79           PyList_SET_ITEM(list, n, value);
   80           return true;

Since it's a newly created list, obviously there is no element in it. Thus Py_DECREF failed.
Maybe we could change this to Py_XDECREF?

@97littleleaf11 97littleleaf11 marked this pull request as ready for review April 15, 2021 11:01
@JelleZijlstra
Copy link
Member

I was wondering whether this could lead to reading from uninitialized memory, but https://docs.python.org/3/c-api/list.html#c.PyList_New says the list is initialized to NULL.

Changing this code to XDECREF would incur a small performance hit for the additional check. If that matters, you could write separate setitem functions for existing lists (can use DECREF) and new lists (don't need to do anything).

@97littleleaf11
Copy link
Collaborator Author

97littleleaf11 commented Apr 15, 2021

@JelleZijlstra Thanks for your advice! I will add a new primitive later. Btw, should I check list size every time or just add items without checking?

@JelleZijlstra
Copy link
Member

To be clear I'm not sure the performance hit is big enough to be worth the additional complexity. You could try some benchmarks, or maybe someone else reading here has a stronger intuition.

If you created a new list and you're filling it up, you shouldn't need to do bounds checking.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this with a microbenchmark that does lots of list comprehensions it was 50% faster, which is great!

Looks good overall. Left various minor comments.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Some more minor comments, but this should be almost ready to merge.

@97littleleaf11
Copy link
Collaborator Author

@JukkaL Thanks for detailed suggestions!

There remains only one unsolved. It seems that list(...) would call to_list op in list_ops.py. Could you please give me more hints about how to solve it?

@97littleleaf11 97littleleaf11 requested a review from JukkaL April 22, 2021 18:29
@JukkaL
Copy link
Collaborator

JukkaL commented Apr 23, 2021

Could you please give me more hints about how to solve it?

Add a builtins.list specialization in mypyc.irbuild.specialize, perhaps? Or, more specifically, add a new case to the existing specialization where there is only one argument that is a generator, similar to what we do for builtins.tuple and others.)

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now! I like how this gives a significant speedup to list comprehensions, which are very common operations.

@JukkaL JukkaL merged commit 5f05bf5 into python:master Apr 27, 2021
@97littleleaf11 97littleleaf11 deleted the preallocate-space branch April 30, 2021 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-allocate space in list comprehensions
3 participants