New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of InputSplitter #10898
Conversation
Up for discussion: should this be backported to 5.x? It's a worthwhile fix if people use very long inputs, but there is some risk that unknown third-party packages are broken by the API changes. |
I should not that I don't habitually write 500 line cells; I only noticed this because I'm doing some autogeneration work! |
can you make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @takluyver. A suggestion on a comment (to confirm my understanding) and a small question.
IPython/core/inputsplitter.py
Outdated
# indentation level, in order to provide auto-indent facilities. | ||
indent_spaces = 0 | ||
# A cache for calculating the current indentation. | ||
# If the first value matches self.source, the second value is an integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps:
# A cache for storing the current indentation
# The first value stores the most recently processed source input
# The second value is the number of spaces for the current indentation
# If self.source matches the first value, the second value is a valid current indentation.
# If self.source does not match the first value (the cached source), the indentation
# must be recalculated.
if not self.within_python_line: | ||
line = self.assemble_logical_lines.push(line) | ||
if line is None: | ||
return _accumulating('acc logical line') | ||
|
||
return _accumulating('acc logical line') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takluyver Is 'acc logical line' just representing that the line has been assembled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indicates that the thing that swallowed this line without returning something is the logical line accumulator. These values are only used for debugging, which thankfully we haven't had to do very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @takluyver
Let's try that ! Thanks @takluyver ! |
What do we think about backporting this to 5.x? I think I lean slightly towards doing so, but I don't have a strong preference. |
Let's see how hard it is. @meeseeksdev backport to 5.x |
There seem to be a conflict, please backport manually |
The conflict seem a bit long:
If you're motivated to backport feel free to do so. I'm willing to have the "Needs manual backport" handle by people who actually care. That is to say, we, as a core team do not spend time backporting manually. That's consistent with our timeline. |
OK, I've marked this as 6.3. It was like that for years already, so I doubt many people are hitting it. |
It might be a neat feature for meeseeks if it could display the conflict - or possibly a summary of it if it's big. |
Yeah, I though about that, the secret command is |
@takluyver I ran into this issue today, and this fixed it. Thank you! Now it doesn't take forever to load the UI for a project I am working on! |
This fixes some embarrassingly bad performance. @rpep pointed out to me that a 500-line cell he had in a notebook took tens of seconds even to start executing, e.g. to show a syntax error.
I tracked this down to our input processing in
IPython.core.inputsplitter
. We are doing large amounts of unnecessary work by calculating at each line processed:We still need the ability to work out both of these things, but we don't need them for each line of a block passed to inputsplitter, and we don't need them at all when transforming code prior to execution.
This code avoids much of this unnecessary work, taking the processing time on the 500-line sample from ~40 seconds to ~0.2 seconds on my computer. The problematic code is roughly
O(N^2)
in the number of lines, so it's probably only noticeable with very long cells, but with a 40 line sample I measured an improvement from 0.05 s to 0.004 s.I have changed a couple of methods and attributes on
InputSplitter
to do this. The automatic API docs do include the module, but I don't really consider it public API, and that code is already convoluted enough that I don't want to clutter it up with compatibility stuff and make it even harder to understand. I checked that ipykernel and rlipython are not using the pieces I changed.