Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get a new collection table when original in-buffer text is deleted/altered? #59

Closed
jdtsmith opened this issue Sep 7, 2021 · 16 comments
Closed

Comments

@jdtsmith
Copy link
Collaborator

jdtsmith commented Sep 7, 2021

One of the very interesting differences of corfu from company is that it keeps the collection table function around for a long time. This is very useful for "slow" caching completion backends like LSP, for which asking the server for new completions is expensive. But it does lead to some odd behavior. For example, in eglot (an emacs LSP client) with Python:

import os
os.li[M-Tab]

leads to 4 candidates.

image

Deleting the i with corfu popped up leaves the same 4 candidates, since the LSP completion table quite reasonably has cached its results:

image

In the meantime a fresh:

os.l[M-Tab]

returns 10 results:

image

Company asks for a new completion table on each and every character event. I actually prefer the corfu approach, because it allows completion systems to use their caches effectively. But it also seems reasonable that once the original text before point is altered/deleted a fresh completion table gets requested.

@minad
Copy link
Owner

minad commented Sep 7, 2021

Requesting a new completion table every time does not seem correct to me. What is the point of this whole machinery then? A while ago we talked about the APIs not being respected and the backends only being tested with Company. That's the reason why things are the way they are. I have no intention to work around this. It should be possible to repair this issue in the backends.

@minad minad closed this as completed Sep 7, 2021
@jdtsmith
Copy link
Collaborator Author

jdtsmith commented Sep 7, 2021

I definitely agree that requesting a new completion table on each character is inefficient. This is actually fairly noticeable with a slow LSP backend. An advantage of corfu worth advertising! But the difference in behavior above is obviously surprising.

The problem is that an LSP client can't necessarily know in advance whether the server would preserve the same set of candidates, or whether the cache should be invalidated. It's easy to test if the original completion text is a prefix of the current text passed to the CAPF collection function, but not all servers do only prefix-completion. Maybe that shouldn't matter. It does seem fair to invalidate the cache aggressively if the new text is shorter than the original.

@minad
Copy link
Owner

minad commented Sep 7, 2021

It does seem fair to invalidate the cache aggressively if the new text is shorter than the original.

I see it like this - the capf mechanism is used to obtain the completion table (this is not a cache, but a function which can recompute candidates!). From that point on the completion style/completion table has full control over the completion process. I also don't understand the problem here, since the lsp client should request the new set of candidates on every keypress (by definition of the lsp protocol), independent of if the completion table is reobtained by the frontend.

Furthermore the prefix which is used in the beginning to initiate completion should not matter at all for the later filtering. The context should only matter for the capf to determine if completion is possible at the given point. If you complete in elisp buffers, Corfu also works as expected. I consider the elisp backend to be a reference backend. So it is purely a matter of the backend getting this right.

@dgutov
Copy link

dgutov commented Sep 7, 2021

According to the lsp-mode folks, VS Code caches completion results (and uses the cache when more characters are typed). So if our LSP clients do no caching at all, we'll be at disadvantage in performance.

It does seem fair to invalidate the cache aggressively if the new text is shorter than the original.

I can agree with that.

@minad
Copy link
Owner

minad commented Sep 7, 2021

It does seem fair to invalidate the cache aggressively if the new text is shorter than the original.

I can agree with that.

Then do it in the backend (the lsp client). Why should the frontend (the ui, corfu or company) decide for the backend on when the "cache" should be invalidated? And once again - the completion table is not a cache.

@jdtsmith
Copy link
Collaborator Author

jdtsmith commented Sep 7, 2021

My understanding is that caching is used (as in completion-table-with-cache) so you don't have to revisit the server/perform expensive lookup for all the variants (try/test/all/boundaries/etc.). I guess some servers/backends get used to the company behavior of dumping their completion table function and spinning up a new one on character events. Maybe this isn't surprising. I haven't found any guidance in the programmed completion docs or elsewhere that would indicate how long a completion table function will be "kept around". Is it just up to the UI? Can you keep the same collection lambda alive for several hours and expect it to continue to "do the right thing"? I once asked emacs devel for clarification on how long a completion lambda is kept alive and was told "as long as it is needed"...

But I actually prefer not invalidating the cache on each key event, but only when the original text from which completion was launched is altered. This issue collides head-on with completion styles. Orderless for example is a rather radical style (allowing spaces!), which no backend server like LSP will soon implement. So letting orderless filter through a cached list is quite reasonable in that context. In fact otherwise the completion style would be rendered inert.

I've proposed a simple strategy to the eglot author for invalidating the cached-proxies cache:

      (lambda (probe pred action)
	 (unless (string-empty-p probe)
	   (if last-probe
	       (unless (string-prefix-p last-probe probe)
		 (setq cached-proxies :none
		       last-probe probe))
	     (setq last-probe probe)))
...

@minad
Copy link
Owner

minad commented Sep 7, 2021

@jdtsmith Yes, the strategy you proposed for Eglot goes in the right direction. Instead of comparing probe and last-probe one could also compare the string within the buffer, since this is what lsp does. At this point completion styles clash with the lsp protocol as you point out.

One more argument why Corfu should not be changed - Corfu relies on the completion-in-region-mode, which is started after the capf returned a completion table. This table is stored in the completion-in-region--data, this data is never altered during the completion as long as completion-in-region-mode is active. I argue that the design used by Corfu is valid in a sense that it corresponds to the intended original design of the completion machinery.

@minad
Copy link
Owner

minad commented Sep 7, 2021

@jdtsmith

This issue collides head-on with completion styles. Orderless for example is a rather radical style (allowing spaces!), which no backend server like LSP will soon implement. So letting orderless filter through a cached list is quite reasonable in that context. In fact otherwise the completion style would be rendered inert.

I thought before about the conflict between lsp and completion styles (#5). This problem is hard to resolve. By definition of the lsp protocol the client should not perform filtering, the filtering should be done completely on the server. So in order to use lsp+corfu correctly you should use a completion style which passes the candidates through unfiltered. Orderless etc will not work correctly.

@jdtsmith
Copy link
Collaborator Author

jdtsmith commented Sep 7, 2021

one could also compare the string within the buffer, since this is what lsp does

Thanks for your thoughts. I'm not entirely clear how "within the buffer" comparisons would look? BTW, I added the test for empty probe because pcm completion for example requests all-completions with an empty string (!).

By definition of the lsp protocol the client should not perform filtering, the filtering should be done completely on the server.

I don't know the LSP protocol much at all, but there was vigorous discussion on this very point. I didn't follow everything, but client-side filtering did not seem at all forbidden (and VSCode does it, for example). What is clear is that one complex completion system (CAPF) should be naturally expected to have a significant impedance mismatch with another quite different one (LSP).

@minad
Copy link
Owner

minad commented Sep 7, 2021

I'm not entirely clear how "within the buffer" comparisons would look?

You look at the input string in the buffer and compare it with the string which was present the last time the lsp server was asked for candidates. For example:

  1. abc TAB -> request from lsp server, store abc as last string
  2. d -> compare abcd with abc, do not rerequest, but filter the candidates for example with flex or orderless given the input string abcd
  3. DEL -> compare abc with abc, do not rerequest
  4. DEL -> compare abc with ab, rerequest candidates from lsp server, since the prefix is shorter, store ab as last string.
  5. e -> now one could introduce another rule: when the number candidates returned for ab is equal to the maximum number of candidates returned by the server, rerequest the candidates for abe. lsp servers usually limit the number of candidates which are returned.

But all this logic should be part of the completion backend, e.g., Lsp-mode or Eglot.

@dgutov
Copy link

dgutov commented Sep 7, 2021

Then do it in the backend (the lsp client).

I tend to agree.

Why should the frontend (the ui, corfu or company) decide for the backend on when the "cache" should be invalidated? And once again - the completion table is not a cache.

This is basically the reason why company is "dumping their completion table" often: if the backend contains the caches (rather than using lexical context returned with the completion table to store the cache), discarding the completion function shouldn't hurt.

We did end up caching it later (with some rather strict cache key) to help performance of certain completion functions, such as the ones using the aforementioned completion-table-with-cache. And a "completion table" can also be a list of strings, because c-a-p-f is so heavy on backward compatibility; discarding those lists of strings often is generally a good idea.

@minad
Copy link
Owner

minad commented Sep 7, 2021

This is basically the reason why company is "dumping their completion table" often: if the backend contains the caches (rather than using lexical context returned with the completion table to store the cache), discarding the completion function shouldn't hurt.

The backend should be allowed to cache within the lexical context of the completion table. I would even argue that this is the correct approach. Note that reference backends like the elisp backend work perfectly with the approach of never dumping the completion table. This is what is implemented by Corfu. Note that my goals for Corfu have been different than the ones from Company. I am not striving for maximal compatibility with backends, which do not follow the spec. The goal was rather to develop a UI which is a minimal layer over the existing machinery.

@jdtsmith Regarding the buffer-string comparison - there is one further complication. There are frontends which transfer the completion table from the buffer where we are completing to the minibuffer, e.g. selectrum-completion-in-region or consult-completion-in-region. For these frontends the minibuffer input is purely used for filtering since the input string in the actual buffer never changes during the completion. I have not yet found a good solution for this. Either (1) the lsp backend detects that the completion table is recalled from the minibuffer and compares the minibuffer input string instead of the string from the buffer, or (2) the consult-completion-in-region frontend copies the minibuffer string to the actual buffer and calls the table from the actual buffer and undoes the input afterwards. Approach (2) would probably be the more compliant approach.

@dgutov
Copy link

dgutov commented Sep 7, 2021

\1. \2. \3. \4. \5.

I think that's the approach lsp-mode is using already, modulo bugs. With the added complication that a server can declare a set of completions to be "incomplete", thus prohibiting reuse (e.g. step 2 should re-request).

@minad
Copy link
Owner

minad commented Sep 7, 2021

@dgutov See #41 and emacs-lsp/lsp-mode#2970. lsp-mode is not yet compliant unfortunately. But the client-server communication might be as I described, which would be good. (EDIT: Since you said modulo bugs, it is probably exactly as you wrote ;)

@jdtsmith
Copy link
Collaborator Author

jdtsmith commented Sep 7, 2021

a server can declare a set of completions to be "incomplete", thus prohibiting reuse (e.g. step 2 should re-request).

That's very interesting, and would be another obvious reason to dump the cache. I'm not sure if eglot parses a "completions incomplete" flag from the LSP server.

If you are both interested I've created PR joaotavora/eglot#737 with my simple cache invalidation idea. I know I'd be glad for your input on the best simple, sensible & server-agnostic approach (assuming such a thing exists).

@jdtsmith
Copy link
Collaborator Author

jdtsmith commented Sep 7, 2021

To summarize the current situation :) —

  • The author of elgot firmly believes that it is the UI's job to notice when its table is no longer valid (e.g. when os.li has been diminished to os.l), and then request a new one, as the company UI (approximately) does.
  • The author of corfu notes that the backend's completion table should notice for itself whether it is still valid, and take its own corrective action, as emacs-lisp does, and as the design of completion-in-region seems to indicate is the correct behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants