New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of InputSplitter #10898

Merged
merged 5 commits into from Nov 15, 2017

Conversation

Projects
None yet
5 participants
@takluyver
Member

takluyver commented Nov 14, 2017

This fixes some embarrassingly bad performance. @rpep pointed out to me that a 500-line cell he had in a notebook took tens of seconds even to start executing, e.g. to show a syntax error.

I tracked this down to our input processing in IPython.core.inputsplitter. We are doing large amounts of unnecessary work by calculating at each line processed:

  • The indentation for the next line, which involves tokenisation. This was >90% of the time taken.
  • Whether the input is complete, which involves byte-compilation.

We still need the ability to work out both of these things, but we don't need them for each line of a block passed to inputsplitter, and we don't need them at all when transforming code prior to execution.

This code avoids much of this unnecessary work, taking the processing time on the 500-line sample from ~40 seconds to ~0.2 seconds on my computer. The problematic code is roughly O(N^2) in the number of lines, so it's probably only noticeable with very long cells, but with a 40 line sample I measured an improvement from 0.05 s to 0.004 s.

I have changed a couple of methods and attributes on InputSplitter to do this. The automatic API docs do include the module, but I don't really consider it public API, and that code is already convoluted enough that I don't want to clutter it up with compatibility stuff and make it even harder to understand. I checked that ipykernel and rlipython are not using the pieces I changed.

@takluyver

This comment has been minimized.

Member

takluyver commented Nov 14, 2017

Up for discussion: should this be backported to 5.x? It's a worthwhile fix if people use very long inputs, but there is some risk that unknown third-party packages are broken by the API changes.

@rpep

This comment has been minimized.

rpep commented Nov 14, 2017

I should not that I don't habitually write 500 line cells; I only noticed this because I'm doing some autogeneration work!

@Carreau

This comment has been minimized.

Member

Carreau commented Nov 14, 2017

can you make indent_space a property (read_only) to change less the API if you are concern ?

@willingc

Looks good @takluyver. A suggestion on a comment (to confirm my understanding) and a small question.

# indentation level, in order to provide auto-indent facilities.
indent_spaces = 0
# A cache for calculating the current indentation.
# If the first value matches self.source, the second value is an integer

This comment has been minimized.

@willingc

willingc Nov 14, 2017

Member

Perhaps:

# A cache for storing the current indentation
# The first value stores the most recently processed source input
# The second value is the number of spaces for the current indentation 
# If self.source matches the first value, the second value is a valid current indentation.
# If self.source does not match the first value (the cached source), the indentation
# must be recalculated.
if not self.within_python_line:
line = self.assemble_logical_lines.push(line)
if line is None:
return _accumulating('acc logical line')
return _accumulating('acc logical line')

This comment has been minimized.

@willingc

willingc Nov 14, 2017

Member

@takluyver Is 'acc logical line' just representing that the line has been assembled?

This comment has been minimized.

@takluyver

takluyver Nov 15, 2017

Member

It indicates that the thing that swallowed this line without returning something is the logical line accumulator. These values are only used for debugging, which thankfully we haven't had to do very often.

@takluyver

This comment has been minimized.

Member

takluyver commented Nov 15, 2017

Thanks @Carreau @willingc - I have updated this following your suggestions.

@willingc

Thanks @takluyver

@Carreau

This comment has been minimized.

Member

Carreau commented Nov 15, 2017

Let's try that ! Thanks @takluyver !

@Carreau Carreau merged commit 8c7ab85 into ipython:master Nov 15, 2017

4 checks passed

codecov/patch 96.42% of diff hit (target 0%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +29.18% compared to 43994e7
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@takluyver

This comment has been minimized.

Member

takluyver commented Nov 15, 2017

What do we think about backporting this to 5.x? I think I lean slightly towards doing so, but I don't have a strong preference.

@Carreau Carreau added this to the 5.6 milestone Nov 15, 2017

@Carreau

This comment has been minimized.

Member

Carreau commented Nov 15, 2017

Let's see how hard it is.

@meeseeksdev backport to 5.x

@meeseeksdev

This comment has been minimized.

Contributor

meeseeksdev bot commented Nov 15, 2017

There seem to be a conflict, please backport manually

@Carreau

This comment has been minimized.

Member

Carreau commented Nov 15, 2017

The conflict seem a bit long:

diff --cc IPython/core/inputsplitter.py
index ac14747d6,337b27d7a..000000000
--- a/IPython/core/inputsplitter.py
+++ b/IPython/core/inputsplitter.py
@@@ -307,7 -414,6 +306,10 @@@ class InputSplitter(object)
          if source.endswith('\\\n'):
              return False
  
++<<<<<<< HEAD
 +        self._update_indent(lines)
++=======
++>>>>>>> 8c7ab85c0... Merge pull request #10898 from takluyver/inputsplitter-indents
          try:
              with warnings.catch_warnings():
                  warnings.simplefilter('error', SyntaxWarning)
@@@ -383,55 -489,20 +385,72 @@@
          # General fallback - accept more code
          return True
  
++<<<<<<< HEAD
 +    #------------------------------------------------------------------------
 +    # Private interface
 +    #------------------------------------------------------------------------
 +
 +    def _find_indent(self, line):
 +        """Compute the new indentation level for a single line.
 +
 +        Parameters
 +        ----------
 +        line : str
 +          A single new line of non-whitespace, non-comment Python input.
 +
 +        Returns
 +        -------
 +        indent_spaces : int
 +          New value for the indent level (it may be equal to self.indent_spaces
 +        if indentation doesn't change.
 +
 +        full_dedent : boolean
 +          Whether the new line causes a full flush-left dedent.
 +        """
 +        indent_spaces = self.indent_spaces
 +        full_dedent = self._full_dedent
 +
 +        inisp = num_ini_spaces(line)
 +        if inisp < indent_spaces:
 +            indent_spaces = inisp
 +            if indent_spaces <= 0:
 +                #print 'Full dedent in text',self.source # dbg
 +                full_dedent = True
 +
 +        if line.rstrip()[-1] == ':':
 +            indent_spaces += 4
 +        elif dedent_re.match(line):
 +            indent_spaces -= 4
 +            if indent_spaces <= 0:
 +                full_dedent = True
 +
 +        # Safety
 +        if indent_spaces < 0:
 +            indent_spaces = 0
 +            #print 'safety' # dbg
 +
 +        return indent_spaces, full_dedent
 +
 +    def _update_indent(self, lines):
 +        for line in remove_comments(lines).splitlines():
 +            if line and not line.isspace():
 +                self.indent_spaces, self._full_dedent = self._find_indent(line)
++=======
+     def get_indent_spaces(self):
+         sourcefor, n = self._indent_spaces_cache
+         if sourcefor == self.source:
+             return n
+ 
+         # self.source always has a trailing newline
+         n = find_next_indent(self.source[:-1])
+         self._indent_spaces_cache = (self.source, n)
+         return n
+ 
+     # Backwards compatibility. I think all code that used .indent_spaces was
+     # inside IPython, but we can leave this here until IPython 7 in case any
+     # other modules are using it. -TK, November 2017
+     indent_spaces = property(get_indent_spaces)
++>>>>>>> 8c7ab85c0... Merge pull request #10898 from takluyver/inputsplitter-indents
  
      def _store(self, lines, buffer=None, store='source'):
          """Store one or more lines of input.

If you're motivated to backport feel free to do so. I'm willing to have the "Needs manual backport" handle by people who actually care. That is to say, we, as a core team do not spend time backporting manually. That's consistent with our timeline.

@takluyver takluyver modified the milestones: 5.6, 6.3 Nov 15, 2017

@takluyver

This comment has been minimized.

Member

takluyver commented Nov 15, 2017

OK, I've marked this as 6.3. It was like that for years already, so I doubt many people are hitting it.

@takluyver takluyver deleted the takluyver:inputsplitter-indents branch Nov 15, 2017

@takluyver

This comment has been minimized.

Member

takluyver commented Nov 15, 2017

It might be a neat feature for meeseeks if it could display the conflict - or possibly a summary of it if it's big.

@Carreau

This comment has been minimized.

Member

Carreau commented Nov 15, 2017

Yeah, I though about that, the secret command is git diff --diff-filter=U after the fail cherry-pick. I might do that at some point.

@codypiersall

This comment has been minimized.

codypiersall commented Jan 29, 2018

@takluyver I ran into this issue today, and this fixed it. Thank you! Now it doesn't take forever to load the UI for a project I am working on!

@takluyver takluyver referenced this pull request Mar 1, 2018

Closed

infinite loop #11026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment