Improve performance of InputSplitter #10898
Conversation
Up for discussion: should this be backported to 5.x? It's a worthwhile fix if people use very long inputs, but there is some risk that unknown third-party packages are broken by the API changes. |
I should not that I don't habitually write 500 line cells; I only noticed this because I'm doing some autogeneration work! |
can you make |
Looks good @takluyver. A suggestion on a comment (to confirm my understanding) and a small question. |
# indentation level, in order to provide auto-indent facilities. | ||
indent_spaces = 0 | ||
# A cache for calculating the current indentation. | ||
# If the first value matches self.source, the second value is an integer |
willingc
Nov 14, 2017
Member
Perhaps:
# A cache for storing the current indentation
# The first value stores the most recently processed source input
# The second value is the number of spaces for the current indentation
# If self.source matches the first value, the second value is a valid current indentation.
# If self.source does not match the first value (the cached source), the indentation
# must be recalculated.
Perhaps:
# A cache for storing the current indentation
# The first value stores the most recently processed source input
# The second value is the number of spaces for the current indentation
# If self.source matches the first value, the second value is a valid current indentation.
# If self.source does not match the first value (the cached source), the indentation
# must be recalculated.
if not self.within_python_line: | ||
line = self.assemble_logical_lines.push(line) | ||
if line is None: | ||
return _accumulating('acc logical line') | ||
|
||
return _accumulating('acc logical line') |
takluyver
Nov 15, 2017
Author
Member
It indicates that the thing that swallowed this line without returning something is the logical line accumulator. These values are only used for debugging, which thankfully we haven't had to do very often.
It indicates that the thing that swallowed this line without returning something is the logical line accumulator. These values are only used for debugging, which thankfully we haven't had to do very often.
Thanks @takluyver |
Let's try that ! Thanks @takluyver ! |
8c7ab85
into
ipython:master
What do we think about backporting this to 5.x? I think I lean slightly towards doing so, but I don't have a strong preference. |
Let's see how hard it is. @meeseeksdev backport to 5.x |
There seem to be a conflict, please backport manually |
The conflict seem a bit long:
If you're motivated to backport feel free to do so. I'm willing to have the "Needs manual backport" handle by people who actually care. That is to say, we, as a core team do not spend time backporting manually. That's consistent with our timeline. |
OK, I've marked this as 6.3. It was like that for years already, so I doubt many people are hitting it. |
It might be a neat feature for meeseeks if it could display the conflict - or possibly a summary of it if it's big. |
Yeah, I though about that, the secret command is |
@takluyver I ran into this issue today, and this fixed it. Thank you! Now it doesn't take forever to load the UI for a project I am working on! |
This fixes some embarrassingly bad performance. @rpep pointed out to me that a 500-line cell he had in a notebook took tens of seconds even to start executing, e.g. to show a syntax error.
I tracked this down to our input processing in
IPython.core.inputsplitter
. We are doing large amounts of unnecessary work by calculating at each line processed:We still need the ability to work out both of these things, but we don't need them for each line of a block passed to inputsplitter, and we don't need them at all when transforming code prior to execution.
This code avoids much of this unnecessary work, taking the processing time on the 500-line sample from ~40 seconds to ~0.2 seconds on my computer. The problematic code is roughly
O(N^2)
in the number of lines, so it's probably only noticeable with very long cells, but with a 40 line sample I measured an improvement from 0.05 s to 0.004 s.I have changed a couple of methods and attributes on
InputSplitter
to do this. The automatic API docs do include the module, but I don't really consider it public API, and that code is already convoluted enough that I don't want to clutter it up with compatibility stuff and make it even harder to understand. I checked that ipykernel and rlipython are not using the pieces I changed.