Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible with the Python shell #79

Closed
astoff opened this issue Sep 8, 2021 · 69 comments
Closed

Incompatible with the Python shell #79

astoff opened this issue Sep 8, 2021 · 69 comments

Comments

@astoff
Copy link

astoff commented Sep 8, 2021

Orderless (but also the flex and substring styles) doesn't work with the Python shell completion. It's probably an issue there, but I wanted to ask if there is anything obviously wrong with this pattern:

(defun python-shell-completion-at-point (&optional process)
  "Function for `completion-at-point-functions' in `inferior-python-mode'.
Optional argument PROCESS forces completions to be retrieved
using that one instead of current buffer's process."
  (setq process (or process (get-buffer-process (current-buffer))))
  (let* ((start {some computation that seems correct})
         (end (point))
         (completion-fn {a function which, when called with
                           (buffer-substring start end) as argument,
                           returns a list of possible completions
                           starting at start}))
    (list start end (completion-table-dynamic completion-fn))))
@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Does it work with basic? It seems the dynamic completion table is fundamentally incompatible with style like orderless, partial-completion, initials, substring and flex since these styles pass the empty string as prefix argument to the completion table. So this works with none of the advanced completion styles, as you already mentioned. If the dynamic python completion-fn relies on this prefix string to compute the completions it cannot work. completion-table-dynamic is an ill defined API and should be avoided.

I should add, there are many Orderless issues where the issue with dynamic completion tables has already been discussed, see #20, #24, #45, #67. One possible workaround for orderless would be to treat the first token specially, see #49. However this would be a deviation from "orderless" style. If I recall correctly, @oantolin is not fond of this idea, and I am neither. In particular since orderless is not doing worse than styles like substring. My recommendation is to combine orderless with the basic completion style, to get such legacy dynamic tables to work correctly.

(setq completion-styles '(basic orderless))

Ideally you only set this style for the filtering used by Corfu, Company or consult-completion-in-region using some advices.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

If completion-table-dynamic would be avoided by the python shell completion function or if it would not rely on the given input string and accept the empty input string, the computation of the possible completion candidates would be far less efficient since all possible strings would have to be generated which are then post-filtered using the regexps generated by orderless. By using the prefix input string, the python shell can cull the set of possible completion candidates. So there is no good general solution for this problem - either generate all candidates (inefficiently) and work with advanced styles like orderless or only work with basic.

@astoff
Copy link
Author

astoff commented Sep 8, 2021 via email

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Assuming this is the case, what would be the canonical approach here?
Change completion-fn to ignore its string argument, or replace
completion-table-dynamic by something else? (I am not into the ins
and outs of completion.)

There is no good canonical solution as I mentioned before. The function should not ignore the string argument. It should return all possible completion strings if the string argument is empty, which is not efficient. If the string is non-empty, all possible completions with this prefix should be returned, such that the completion result is compatible with the basic completion style. I assume in the case of an empty string, the function returns no candidates, which leads to problems here.

I am afraid, the canonical solution is to not use orderless, but use basic 🤷

@astoff
Copy link
Author

astoff commented Sep 8, 2021

If completion-table-dynamic would be avoided by the python shell completion function or if it would not rely on the given input string and accept the empty input string, the computation of the possible completion candidates would be far less efficient

I don't follow your reasoning. The way completion in the REPL works is, the REPL sends the contents of the current line to the Python inferior and receives a pre-filtered list of completions. This might or might not be efficient, but it's inescapable.

So I already have the precomputed candidates. I just need to pass it along in a way that pleases Emacs's completion system.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

I don't know how the Python package works internally. All I am saying is that the orderless completion style always passes the empty string as input to the dynamic completion function. This is not compatible with a completion function which relies on this input string to compute the candidates. In contrast, the basic completion style passes the input string directly to the completion function.

Try this with Vertico:

(defun test-dynamic-table ()
  (completing-read "> "
                   (completion-table-dynamic
                    (lambda (input)
                      (message "Input: %S" input)
                      '("first" "second" "third")))))

(let ((completion-styles '(basic)))
  (test-dynamic-table))
(let ((completion-styles '(substring)))
  (test-dynamic-table))
(let ((completion-styles '(orderless)))
  (test-dynamic-table))

EDIT: If you look at python-shell-completion-get-completions, it indeed relies on the input. The input is send to the background Python process. This is what leads to problems here, since the input string is empty if orderless is used. Then the backend does not generate candidates? This whole issue seems to be a flaw in how the completion API is designed. The input string serves a dual purpose, it is used for filtering by prefix and it can be used by completion tables to compute candidates. However since it is used for prefix filtering, the string must necessarily be empty for styles which do not want such filtering. This will then break the candidate generation. If one would redesign the API from scratch one should separate this dual usage of the input string and maybe rather pass two strings as input to the completion. But for now the canonical solution is to use the basic completion style, since this one is the only style which uses prefix filtering. I hope this makes sense now?

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

I agree with everything @minad said, but want to add that while I don't really like the idea of treating the first component differently I'm starting to think it might be worth doing. I'd do something like: if the first component happens to compile to a regexp r that satisfies (equal r (regexp-quote r)), then call the completion table with it (and filter using the rest of the regexps obtained from the user input). I'd have to rename the package to mostly-orderless, though. 😛

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Why not add a configuration variable orderless-use-prefix which allows to opt-in to the special prefix treatment? Then the user can define a mostly-orderless completion style with orderless-define-completion-style. 😄 I would only use this special style for completion-at-point. I wouldn't want it in most of the minibuffer completion scenarios.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Instead of relying on the criterion (equal r (regexp-quote r)) you could always treat the first token as literal if orderless-use-prefix is true. That would be simpler and more predictable.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

I guess. How about this idea: if any of the components compiles to an anchored regexp of the form ^r with (equal r (regexp-quote r)), then take the first such component and pass it to the completion table? No configuration options.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

You'd need an explicit ^ to request the behavior, but at least it makes sense.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

Ah, but that probably wouldn't work for completion-at-point, would it? People don't want to type the ^ and even if they did, in their buffer, for most syntax tables it wouldn't be part of the symbol being completed.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

I guess. How about this idea: if any of the components compiles to an anchored regexp of the form ^r with (equal r (regexp-quote r)), then take the first such component and pass it to the completion table? No configuration options.

I think this is not such a good idea, or at least it is an orthogonal idea. It would work as an additional optimization (if you check for ^ and check that the rest of the string is a literal), but it would not resolve the scenario we are talking about here.

The issue here relies on the prefix of the input to be used as input to the completion table, since you trigger completion at point with TAB after having entered a certain prefix.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Ah, but that probably wouldn't work for completion-at-point, would it? People don't want to type the ^ and even if they did, in their buffer, for most syntax tables it wouldn't be part of the symbol being completed.

Yes, we wrote at the same time. It is still a good idea as optimization. However having such ^regexps is a rare case so it is probably not be worth the complication.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

The configuration variable is viable, I'm just trying to explore other options a bit to see if we can't come up with something even better. How about doing the ^ optimization and then people who want to treat the first component specially just use a style dispatcher that adds a ^ to the first component?

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

I could even add such a style dispatcher to orderless so people don't have to write the 2 line function.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

I always forget how powerful the style dispatchers are. They can also dispatch by index, right? Then this is clearly the better solution. I don't feel strongly about to the optimization - I think there is no need to add this.

@astoff
Copy link
Author

astoff commented Sep 8, 2021

@minad I think you are overestimating my understanding of the completion system :-)

If you look at my (pseudo)code at the top of this issue, isn't it true that completion-fn will only ever be called with (buffer-substring start end) as argument? So I could as well replace completion-fn by (let ((s (buffer-substring start end))) (lambda (_ignored) (funcall completion-fn s), and this is perfectly compatible with every completion style. Oder?

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

If you look at my (pseudo)code at the top of this issue, isn't it true that completion-fn will only ever be called with (buffer-substring start end) as argument?

No, this is not the case. The function completion-all-completions is called with the input as determined by the completion UI, initially this is the string between start and end. But then the completion style can take this input and do whatever it wants to do with it, e.g., like Orderless cut it into pieces and turn it into regexps. The string is not passed through to the completion-fn.

So I could as well replace completion-fn by (let ((s (buffer-substring start end))) (lambda (_ignored) (funcall completion-fn s), and this is perfectly compatible with every completion style. Oder?

This would work with the downside that the list of completions will not be dynamically updated when you delete characters from the initial input string between start and end. Besides that it will probably work well.

However the solution @oantolin proposed with the style dispatcher is clearly better. Then this special style dispatcher should be used to define a mostly-orderless completion style, which can then be used only for completion at point.

@astoff
Copy link
Author

astoff commented Sep 8, 2021

Okay, I think I get it now. Then I'm unsure whether or not to submit a patch to Emacs.

Try this with Vertico:

Out of curiosity, why is the completion table called with a full candidate sometimes when I turn Vertico off? Just typing "f<tab>" gives

Input: ""
Input: "first"
Input: ""
Input: "first" [2 times]

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Then I'm unsure whether or not to submit a patch to Emacs.

There is nothing wrong here.

Out of curiosity, why is the completion table called with a full candidate sometimes when I turn Vertico off?

For some of the actions of the programmable completion function (see the documentation of programmable completion or completing-read), the completion style is bypassed and the actual input string is passed to the completion function. This happens for example for the metadata or boundaries action. But for the t action (query for all the candidates) the completion style takes over, leading to the aforementioned passing of the empty string. The completing read API is pretty incoherent 🤷

@astoff
Copy link
Author

astoff commented Sep 8, 2021

The completing read API is pretty incoherent

Sorry for the incoherent questions!

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

I always forget how powerful the style dispatchers are. They can also dispatch by index, right? Then this is clearly the better solution. I don't feel strongly about to the optimization - I think there is no need to add this.

Without the optimization having a style dispatcher prepend a ^ to the first component wouldn't actually fix the problem we're discussing.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

The optimization could also be simplified by only checking the first component for a ^.

@oantolin
Copy link
Owner

oantolin commented Sep 8, 2021

The completing read API is pretty incoherent

Sorry for the incoherent questions!

It's not your fault at all @astoff, the API is pretty hard to understand. To write orderless I actually had to read a good chunk of minibuffer.el, docs weren't enough for me.

@astoff
Copy link
Author

astoff commented Sep 8, 2021

If completion-table-dynamic would be avoided by the python shell completion function

Actually, completion-table-dynamic could easily be avoided in the Python shell situation, by eagerly calling the completion-fn. Doing so would violate the following recommendation in the docstring of completion-at-point-functions:

NOTE: These functions should be cheap to run since they’re sometimes
run from ‘post-command-hook’; and they should ideally only choose
which kind of completion table to use, and not pre-filter it based
on the current text between START and END (e.g., they should not
obey ‘completion-styles’).

But in this case there seems to be no point in returning a dynamic completion table (which is fast to create) when querying it is potentially slow — or am I missing something? (The eager thing would have the same mild drawbacks of ignoring the string argument to the completion table, but nevermind.)

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

It is not a good idea - as described in the documentation. The returned completion table would not work for foo TAB DEL, since then the returned completions would all start with foo instead of fo. Furthermore the eager completion would not be cheap. But these downsides are not that severe. With Company it even works in all scenarios, since Company reruns the Capf on every keypress. Corfu in contrast would be affected by this, since Corfu relies on capfs to respect the NOTE of the documentation.

@minad
Copy link
Collaborator

minad commented Sep 8, 2021

Coincidentally, I had a longer discussion about this with the company maintainer @dgutov. Dmitry argues that recalling the capf every time is the safer and more robust approach in order to also support completion tables which do not respect the NOTE. He is right about this. However there are other technical reasons why Corfu is not recalling the capf every time (Corfu relies on the completion-in-region infrastructure, which does not allow this), so Corfu only fully supports capfs which respect the NOTE. My opinion is that the NOTE should be formulated in a more binding way - this should actually be the contract which capfs should follow and it is not some optional constraint the capf author can follow or not. The situation is a bit of a dilemma - the completion-in-region infrastructure indicates that capfs are required to respect the NOTE, but the NOTE says that the contract should only be followed "ideally".

@astoff
Copy link
Author

astoff commented Sep 8, 2021

Now I realize that if you supply python.el's dynamic completion table to Vertico or Corfu, you could be doing a network transaction on every key press (I'm running the Python interpreter over Tramp a lot lately). Even without an SSH connection in between, each keypress would trigger some rather flimsy comint communication.

In fact, I've occasionaly seen some garbage dumped in the shell buffer while Corfuing. Clearly it was caused by the comint input hooks getting out of sync with the inferior process.

Now, if I patched python.el to compute the completions eagerly, you would have to contact the inferior process only once per completion-in-region session. You might freeze for a little bit after pressing TAB for the first time, but after that it would be fast to do further filtering. And one could use flex or orderless, at the expense of foo TAB DEL.

So, in fact, following the NOTE seems pretty detrimental to Corfu and Vertico, at least in this rather particular case of comint-based completion.

@dgutov
Copy link

dgutov commented Sep 8, 2021

Right.

@jdtsmith
Copy link

jdtsmith commented Sep 8, 2021

So, we have 3 possible implied contracts, from UI to CAPF:

  1. "I will call you once and only once per completion-in-region. It is up to you to monitor for buffer changes and update the results you give me accordingly."
  2. "I will call you rarely during a completion-in-region. If I call you again, it's because the buffer has changed and I expect that your collection table is no longer valid."
  3. "I will call you whenever and however I want to. If you have internal state you'd like to save, don't keep it in your collection table."

Oh boy.

@dgutov
Copy link

dgutov commented Sep 8, 2021

FWIW, the scheme I described implies 3, not 2. Though it would usually work with any of 1, 2, 3.

oantolin added a commit that referenced this issue Sep 9, 2021
If orderless finds that some component of the pattern compiles to a
regexp of the form ^literal it will now take the first such regexp and
add the literal to the prefix which which it calls all-completions.

That's a slight lie: it actually looks for regexps of the form
\(?:^literal\) since that is what the pattern compiler actually
produces.

With this change people can now add a style dispatcher that adds a ^
to the first component to use with certain completion-at-point
functions that can really take advantage of knowing a prefix to return
fewer completions. Some even (incorrectly) refuse to return all
possible completions for an empty string!

See the discussion in #79 and the issues linked therein.
@oantolin
Copy link
Owner

oantolin commented Sep 9, 2021

I implemented the ^literal optimization discussed above. We should probably document a style-dispatcher that adds a ^ to the first component automatically and how to use it for completion-at-point.

@astoff
Copy link
Author

astoff commented Sep 9, 2021

@dgutov, @jakanakaevangeli What do you think of the following cache invalidation criterium: in a pre-command hook, check if this command is self-insert-command and the syntax class is word or symbol. If not, then kill the cache.

I've tested the Python shell completion with company, and it is pretty laggy if the inferior process is running remotely, since each key press triggers a network request. So this might actually be enough justification to change it.

And then orderless will also work OOTB, as a side effect. Please just don't tell anyone about my hidden agenda.

@jakanakaevangeli
Copy link

jakanakaevangeli commented Sep 9, 2021 via email

@astoff
Copy link
Author

astoff commented Sep 9, 2021

This is not a self-insert-command and cache will not get invalidated.

This is not a self-insert-command, so the cache will be invalidated.

It will also be invalidated if the user pastes an "x". This is was a false positive, but nevermind.

@jakanakaevangeli
Copy link

jakanakaevangeli commented Sep 9, 2021 via email

@minad
Copy link
Collaborator

minad commented Sep 9, 2021

@oantolin

I implemented the ^literal optimization discussed above. We should probably document a style-dispatcher that adds a ^ to the first component automatically and how to use it for completion-at-point.

Is there also a possibility to handle quoted regexps? A style dispatcher could regexp-quote the first word and prepend it with ^.

@dgutov
Copy link

dgutov commented Sep 9, 2021

What do you think of the following cache invalidation criterium: in a pre-command hook, check if this command is self-insert-command and the syntax class is word or symbol. If not, then kill the cache.

It can work, but since the caching would be implemented inside the completion table, I think for it to concern itself with user input and particular invoked commands, is pretty dirty. But you can try and see, IDK.

@jdtsmith
Copy link

jdtsmith commented Sep 9, 2021

@astoff Why not save and examine the probe text passed into the table (when it is non-empty), updating the cache when the saved probe is not a prefix of the current probe (as I suggested for eglot here)?

I'm experimenting with it for my ipython mode, and it requires moving cache out of a lexical to a global, due to all the other lambda's that the CAPF creates (like docsig), as well as to handle contract flavor 3 (ala company). To know when to "start anew" with flavor 3, we can use your idea of completion-in-region-mode-hook, dumping the entire global cache there as the mode is disabled. I think that will then address contract flavors 1, 2, and 3 equally (it's unfortunate that all UI's don't adhere to one of those!).

@jakanakaevangeli
Copy link

jakanakaevangeli commented Sep 9, 2021 via email

@minad
Copy link
Collaborator

minad commented Sep 9, 2021

@jakanakaevangeli Great, I was looking before if regexp-unquote exists in Emacs. Thanks for providing it! @oantolin What about using it to make the prefix optimization more general, such that quoted regexps are also supported?

@oantolin
Copy link
Owner

oantolin commented Sep 9, 2021

Sure, why not, @minad? But I'm not sure it's necessary. Thanks for the regexp-unquote function, @jakanakaevangeli!

@minad
Copy link
Collaborator

minad commented Sep 9, 2021

@oantolin this allows a simpler dispatcher which just takes the first word, quotes it and prepends it with ^. In the current implementation, this will only work for non-regexp characters. But sure, it is an edge case.

@oantolin
Copy link
Owner

oantolin commented Sep 9, 2021

@oantolin this allows a simpler dispatcher which just takes the first word, quotes it and prepends it with ^. In the current implementation, this will only work for non-regexp characters. But sure, it is an edge case.

The dispatcher I had in mind was simpler still: it just prepended a ^ and hoped for the best (the hoping part takes up 0 lines of code). 😛 But we should definitely make it work in more cases if it's not too hard.

@astoff
Copy link
Author

astoff commented Sep 9, 2021

Okay, there are two topics in parallel now. We should start using mailing lists, like the pros.

I'll reply to @jdtsmith's last comment here, but maybe we should continue discussing the performance/caching issue at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=50459

My criterium is pretty simple and has no false negatives (i.e., it never misses invalidating the cache when needed). But now I realize that there are a few false positives that can be mildly problematic. E.g., when the user presses TAB.

I think (a slight refinement of) your suggestion makes sense, specially for reasonably dumb comint-based capf. In Eglot, where the completion is more context-sensitive, there can be weird edge cases too. Imagine, in Lisp, that I try to complete "(x" (a function starting with x), then delete the parenthesis and try to complete "x" (a variable starting with x). Adding a marker as additional safeguard doesn't help either. But this doesn't seem a very common thing to happen anyway.

@jdtsmith
Copy link

jdtsmith commented Sep 9, 2021

Yeah, that's an issue; you could also be crazy and put your region inside a string in mid-completion (change os.li to "os.li) in which case the entire completion becomes invalid. I guess to guard the probe-prefix invalid trick you could add "line is unchanged before the current completion region", which I believe @dgutov suggested. But note that some UI's (like corfu and company) actually kill the completion if you try to go back and edit before the completion text. So a non-issue for them.

In terms of invalidation to handle all the cases 1,2,3, it gets complicated. I'm thinking it may be necessary to accept all the "frequent table dumping" of contract flavor 3 as false positives. Just kill the cache and start over (unnecessarily, sometimes, but oh well). For flavors 1 & 2, table dumps are true positive invalidations. Flavor 1 requires additional in media res invalidations too.

@astoff
Copy link
Author

astoff commented Sep 9, 2021

Okay, I tested @jdtsmith's idea for the cache invalidation and it seems to work well. I submitted a patch to the Emacs bug linked above, in case anyone wants to take a look.

@astoff
Copy link
Author

astoff commented Sep 12, 2021

It now occurred to me that in the ^literal case, it would be possible to support completion of a common prefix substring. The effect wouldn't be too different from (setq completion-styles '(basic orderless)), so I don't know if it's something worth doing.

In any case, I'll close the issue, because now the Python completion table does caching (which is kind of important when the shell is remote) and therefore works with orderless as a side effect.

@minad
Copy link
Collaborator

minad commented Sep 12, 2021

@astoff there are two reasons to still do the prefix special casing:

  1. It allows optimized filtering, since prefix filtering is slightly faster than regexp filtering.
  2. It allows orderless filtering for input strings like "prefix word2 word3" and tables like completion-table-dynamic, which do not work with basic+orderless.

You are right that basic+orderless is almost an equivalent and sufficient solution. basic+orderless however has an unexpected effect, it may happen that if you enter foo you get fewer matches than for foo foo.

Note that @oantolin already implemented the optimization - I think it is good to have it. My only nitpick is that I would like to see a more robust implementation (#81).

@astoff
Copy link
Author

astoff commented Sep 12, 2021

@minad Sure, I understand that part — I've learned some things in this long discussion :-)

But I was referring to something else, namely the common prefix insertion feature (b<tab> becomes ba if the candidates are foo, bar and baz).

My point was that while this feature doesn't make sense in the regular orderless style, it could be implemented for the “mostly orderless” style, since changing the query from ^b to ^ba doesn't change the list of matching candidates.

@minad
Copy link
Collaborator

minad commented Sep 12, 2021

@astoff Okay, I missed that you singled out the TAB completion. Indeed it would be possible to support this. I think it is worth doing this if the mostly-orderless style aims to be a full substitute for basic+orderless. Supporting some kind of substring expansion has also been requested before in #32.

@oantolin
Copy link
Owner

oantolin commented Sep 12, 2021

Oh, good point @astoff! We could have TAB do something useful now. (Well, I'm a little skeptical that it actually is useful, but the point is it could do something. 😛)

minad added a commit to minad/corfu that referenced this issue Oct 6, 2021
* This code was introduced to ease the usage of Orderless, where the input must
not necessarily occur at the beginning. My recommendation is to use an Orderless
style dispatcher instead, where the first word is prefixed with ^ for Corfu.
Doing this even results in a performance win, since Orderless compiles ^words in
a smart way such that prefix filtering takes place inside the completion table.
See oantolin/orderless#79 and
oantolin/orderless#81.

See the Orderless style dispatcher from the Consult wiki https://github.com/minad/consult/wiki:

  (defun +orderless-dispatch (pattern index _total)
    (cond
     ;; Treat first component as prefix. This is useful for Corfu completion-in-region.
     ((and completion-in-region-mode (= index 0))
      `(orderless-regexp . ,(concat "^" (regexp-quote pattern))))
     ...))

* If you use the basic completion style, the input is already used for prefix
filtering and this change has no effect.

* If you use another completion style which even sorts the candidates itself,
e.g., flex, this change is advantageous since it gives the completion style more
control.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants