Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile auto-completion #44

Open
mcepl opened this issue Dec 22, 2013 · 2 comments
Open

profile auto-completion #44

mcepl opened this issue Dec 22, 2013 · 2 comments

Comments

@mcepl
Copy link
Contributor

mcepl commented Dec 22, 2013

By @JulianEberius
The main changes I made to Rope concern auto-completion. It is very, very slow on large code bases (just try Rope’s completions on Rope’s code itself). I used Python’s profile module to track where the time is spent on a single completion request, and it was mainly file system related stuff that is called a lot of times for a single request. I added a lot of caching to my branch of Rope, but I didn’t fully check whether I introduced subtle bugs or changed semantics this way. It improved latency by several orders of magnitude, which was important as Sublime Text does „as-you-type“ autocompletion, so the operation needs to be performed in milliseconds (on almost every keystroke) even on large code bases.

Please see my commit on SublimePythonIDE for details, if you’re interested on tackling this problem (JulianEberius/SublimePythonIDE@85c490cd64). Sadly the commit intermingles the caching with a second, minor, issue: case-insensitive completions. Some people prefer to just type, and not to worry about lower/upper case if the completion engine can return the right thing anyway. As I said, this is minor, compared to the performance issues.

To tackle the problems systematically, it’s probably best if you to some profiling of your own, I just used cProfile and gprof2dot to make graphs… If the completion performance is improved I would love to just add Rope as a submodule, instead of maintaining my own fork.

@mcepl
Copy link
Contributor Author

mcepl commented Dec 22, 2013

Interesting ... patch just with -p3 cleanly applies and doesn't break any tests ... https://travis-ci.org/mcepl/rope/builds/15833656
So, now we need to split speed-up improvements from the case insensitive matching (which should be a separate functionality and probably optional).
Also, http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script and http://stefaanlippens.net/python_profiling_with_pstats_interactive_mode look like great resources on Python profiling.

@aligrudi
Copy link
Member

aligrudi commented Dec 22, 2013

In general, I think supporting case insensitive completions is a good
idea. On the other hand, rope's performance problems requires a more
detailed examination to understand why the existing caching mechanisms
fail. When does rope's performance issues become noticeable? That
should depend on the number of files in the project, I guess.

The main known performance issue in rope, besides the automatic Static
Object Analysis which I suggested disabling in rope mainline in the
mailing list a few times before, is "from largemodule import *". When
concluded data (more details below) get invalidated, these names are
dropped and are recalculated again when needed. I wonder if the
slowness you observe is due to the same reason.

My comments follow:

diff --git a/lib/python2/rope/base/project.py b/lib/python2/rope/base/project.py
index 97d2dd3..c6f2b59 100644
--- a/lib/python2/rope/base/project.py
+++ b/lib/python2/rope/base/project.py
@@ -18,6 +18,7 @@ class _Project(object):
         self.prefs = prefs.Prefs()
         self.data_files = _DataFiles(self)
-    @utils.memoize
   def get_resource(self, resource_name):
       """Get a resource in a project.

Constructing File and Folder objects seems rather cheap; does
this really help?

@@ -360,7 +361,7 @@ class _DataFiles(object):
             path += '.gz'
         return self.project.get_file(path)

## 

+@utils.cached(1000)
 def _realpath(path):
     """Return the real path of `path`

I am worried that the overhead of searching the previous return
values could be actually greater than the overhead of python's
realpath/abspath/expanduser. Do you remember which of these
three functions is the slowest? Can't we optimize that?

diff --git a/lib/python2/rope/base/pycore.py b/lib/python2/rope/base/pycore.py
index 32056a0..3aff2ff 100644
--- a/lib/python2/rope/base/pycore.py
+++ b/lib/python2/rope/base/pycore.py
@@ -183,6 +183,7 @@ class PyCore(object):
     #    packages, that is most of the time
     #  - We need a separate resource observer; `self.observer`
     #    does not get notified about module and folder creations
-    @utils.memoize
   def get_source_folders(self):
       """Returns project source folders"""
       if self.project.root is None:

The main problem with saving the return value of this function is that
it would become invalid after creating folders in the project.

-    @utils.memoize
   def _find_source_folders(self, folder):
       for resource in folder.get_folders():
           if self._is_package(resource):

This does not seem necessary and adding the decorator for
get_source_folders() should be adequate.

diff --git a/lib/python2/rope/base/pynames.py b/lib/python2/rope/base/pynames.py
index 79bba15..82ee2cc 100644
--- a/lib/python2/rope/base/pynames.py
+++ b/lib/python2/rope/base/pynames.py
@@ -101,6 +101,7 @@ class ImportedModule(PyName):
         self.level = level
         self.resource = resource
         self.pymodule = _get_concluded_data(self.importing_module)
-        self.cached_pyobject = None

Actually self.pymodule does the caching itself; it uses _ConcludedData
objects. These objects should be reused unless a change to the
project (like modifying a module) possibly makes the data invalid; for
instance, if a module is changed, imports in other modules may become
invalid. pycore.py manages such changes using project observers. The
logic for managing changes in pycore.py was much more involved once,
but was simplified because no significant improvement was observed
then.

So I think it is important to find out if ImportedModule._get_module()
calls resource_to_pyobjects() unnecessarily and, if so, why?

diff --git a/lib/python2/rope/base/utils.py b/lib/python2/rope/base/utils.py
index e35ecbf..4e433b8 100644
--- a/lib/python2/rope/base/utils.py
+++ b/lib/python2/rope/base/utils.py
@@ -76,3 +75,48 @@ class _Cached(object):
         if len(self.cache) > self.count:
             del self.cache[0]
         return result
+
+class memoize(object):
-    """cache the return value of a method

This makes me worried for two reasons:

  • Some return values become invalid and should be dropped;
    that is the reason for the existence of project.observers
    and concluded_data.
  • There is no limit on the saved objects.
diff --git a/lib/python2/rope/contrib/codeassist.py b/lib/python2/rope/contrib/codeassist.py
index 37433c2..4b02c89 100644
--- a/lib/python2/rope/contrib/codeassist.py
+++ b/lib/python2/rope/contrib/codeassist.py
@@ -4,14 +4,14 @@ import warnings
 def code_assist(project, source_code, offset, resource=None,
-                templates=None, maxfixes=1, later_locals=True):
-                templates=None, maxfixes=1, later_locals=True, case_sensitive=False):
   """Return python code completions as a list of `CodeAssistProposal`\s

Supporting case insensitive completions seems interesting...

 class _PythonCodeAssist(object):

 def __init__(self, project, source_code, offset, resource=None,
-                 maxfixes=1, later_locals=True):
-                 maxfixes=1, later_locals=True, case_sensitive=False):
       self.maxfixes = maxfixes
       self.later_locals = later_locals
-        self.case_sensitive = case_sensitive
-        self.startswith = _startswith if case_sensitive else _case_insensitive_startswith
       self.word_finder = worder.Worder(source_code, True)
       self.expression, self.starting, self.offset = \
           self.word_finder.get_splitted_primary_before(offset)
  @@ -315,7 +325,7 @@ class _PythonCodeAssist(object):
   def _matching_keywords(self, starting):
       result = []
       for kw in self.keywords:
-            if kw.startswith(starting):
-            if self.startswith(kw, starting):

I suggest calling startswith() and tolower().startswith() directly
here depending on the value of self.case_sensitive and removing
self.startswith.

Ali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants