-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong handling of codepoint offsets #361
Comments
I think more info is needed. What server are you running? What are the values of the variables |
I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:
here is the info you asked though:
|
Workaround for clangd specifically: it sounds like eglot is specifying codepoint offsets, so you can pass This still needs to be fixed, as eglot doesn't pass that flag to clangd by default, and other servers will rely on the standard utf-16 encoding. |
I agree, but many servers (and clients) don't follow the spec so Eglot has been pragmatic about this in the past. So @nemethf is asking you what values do you have for the variables If you don't get it, then it's a bug. You may also argue for using those functions as the default. Eventually we will have to. But they make things run slower, and last time I checked it was intrinsic to the protocol (@MaskRay had some insight on this, I think). |
(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?) |
No. The protocol seems to suck in this regard, but it's the protocol. But you can change the protocol. Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default. So far I only count |
So our perspective is as clangd implementers. We'd like clangd to work robustly with eglot by default, as we can't get all our users to tinker with their config. We follow the spec, and also support other options, there's not much more we can do. If you really want to keep codepoints as the default, maybe you could tell clangd to use them? (Flag or negotiation)
https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0
Clangd supports that as an LSP extension: https://clangd.github.io/extensions.html#utf-8-offsets |
@sam-mccall, that's a nicely designed extension! Have you proposed for inclusion in the standard? (That issue is way too long to read every comment carefully.) @joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables On the other hand, the default server for c-mode and c++-mode is ccls, so eglot does not start automatically clangd. As a result, currently users need to set |
Or maybe we can use a special class for |
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.
There a few missing there, but it seems the tide is turning... @sam-mccall @kadircet can you test the latest commit and confirm Eglot is working correctly out-of-the-box with clangd? |
@kadircet @sam-mccall ping? Can you test 9fa4561? |
@kadircet @sam-mccall ping again? Or ping anyone that can confirm out-of-the-box Eglot works correctly with out-of-the-box clangd using that Eglot commit 9fa4561. |
I can confirm that when using No need to set |
sorry for the late reply, I've also tested 9fa4561 and it seems to be working with clangd, without any extra command line flags. |
Then, because I'm very busy right now, I invite anyone with push access to bring in 9fa4561 into master. |
It's now landed on master. I've added an entry to NEWS.md. Please, check it when you have time. |
Good call. I'm sure it's fine. We can do NEWS reviews from time to time as well... |
I reverted the change on the master branch and reopened this issue, because the tests fail. I think From the spec: |
I wrote a potential fix and made some measurements. The script is in a new branch. Results are at below, they show that eglot-move-to-column consequently fast and does not depend on the circumstances. It is even faster than move-to-column. The fix makes the slow eglot-move-to-lsp-abiding-column even slower. I haven't tested the fix profoundly, but if it's correct, I expect it to keep Eglot responsive. However, I hope someone can come up with a faster solution. Please, try to test it. This is the new function: (defun fix-1 (column)
"Move to COLUMN abiding by the LSP spec."
(save-restriction
(narrow-to-region (line-beginning-position) (line-end-position))
(goto-char (point-max))
(setq column (min column
(/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
2)
2)))
(cl-loop
initially (move-to-column column)
for diff = (- column
(/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
2)
2))
until (zerop diff)
do (condition-case nil
(forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
(end-of-buffer))))) And these are the results (total elapsed time for execution, the number of garbage collections that ran, and the time taken by garbage collection):
|
Disclaimer: I'm not familiar with emacs lisp, neither the language nor the performance characteristics. I worked on the encoding support in clangd. It looks like that function loops over prefixes of the line and encodes the prefix inside the loop. That's going to be quadratic time which explains why it's slow on long lines. You're going to want to iterate character-by-character instead, in pseudopython (sorry, I really don't know any lisp):
For u16len you could use |
@nemethf the functions that I and @mkcms wrote were analysed extensively at the time. I'm not saying there can't be bugs: something is indeed wrong, but since it is such a localized function, I'm pretty sure it's possible to come up with a short serverless snippet that demonstrates beyond doubt the bug. By skimming your #361 (comment) I can't be sure. Please try to spend the time to make a recipe like "In the most recent master, within a buffer with these contents, calling foo returns x and provokes z instead of returning y and provoking w" if you can. |
Or, if you prefer, write a unit test for this. |
Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures. EDIT: |
My guess is this is what’s happening. However, I have one case where the function is not working properly, and was mentioned briefly by me in #397 and by @nemethf here: what happens
what should happen:Eglot moves cleanly to I think this is what causes my test to choke on |
I am not saying that shouldn't be fixed. It may need to be fixed. I will certainly consider the rest of your analysis, but first I would like you to answer my question, so I can make sense of this. In which versions of clangd do you see the behaviour? Do you see this behaviour on clangd-8 at all? EDIT: previously I wrote that "I may need to be fixed" instead of "it may need to be fixed". It may look like a typo, but both things are obviously true :-) |
I haven't encountered this being a problem on By not seeing this behaviour I mean the server hasn't given me the
haha! |
Thank you!
Nor am I, but the at least the mystery that was baffling me is cleared, or less opaque at least.
and has it been placing diagnostics on the correct column, for example? Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns? |
I have not witnessed this yet, no |
This, however, is a sad small attempt to implement @sam-mccall pseudocode. It stops at line end and moves to correct column, just decrementing as we go. Not sure if faster or anything, but wanted to have a go at this :) (defun code-point ()
(interactive)
(or (get-char-property (point) 'untranslated-utf-8)
(encode-char (char-after) 'ucs)))
(defun move (column)
(interactive)
(beginning-of-line)
(let ((u16pos column))
(cl-loop for i below column do
(if (equalp (line-end-position) (point)) (return))
(and (> u16pos 0)
(if (> (code-point) 65536)
(progn (setf u16pos (- u16pos 2))
(forward-char 1))
(progn (setf u16pos (1- u16pos))
(forward-char 1)))))
u16pos))
|
@theothornhill why just not make the existing diff --git a/eglot.el b/eglot.el
index e952b91..7799191 100644
--- a/eglot.el
+++ b/eglot.el
@@ -1050,15 +1050,20 @@ be set to `eglot-move-to-lsp-abiding-column', and
(defun eglot-move-to-lsp-abiding-column (column)
"Move to COLUMN abiding by the LSP spec."
- (cl-loop
- initially (move-to-column column)
- with lbp = (line-beginning-position)
- for diff = (- column
- (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
- 2)
- 2))
- until (zerop diff)
- do (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))))
+ (save-restriction
+ (cl-loop
+ with lbp = (line-beginning-position)
+ initially
+ (narrow-to-region lbp (line-end-position))
+ (move-to-column column)
+ for diff = (- column
+ (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
+ 2)
+ 2))
+ until (zerop diff)
+ do (condition-case eob-err
+ (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
+ (end-of-buffer (cl-return eob-err))))))
(defun eglot--lsp-position-to-point (pos-plist &optional marker)
"Convert LSP position POS-PLIST to Emacs point. There are of course other ways to implement this resiliency, namely checking whether |
If I recall correctly, the relative speed advantage we got from the version of EDIT: For clarity, this means I have reasons to disagree with @sam-mccall's assertion that we "want to iterate character-by-character". We can do it like that, and it was attempted but bisecting was found to be faster. |
I did read these, and they were why I tried to have a go at this. it seems like every iteration uses Sams version is An added benefit is that we don't create If my point here isn't moot - it may very well be . maybe @nemethf can try some variant of this function in his benchmarks? (defun code-point ()
(interactive)
(encode-char (char-after) 'ucs))
(defun move (column)
(interactive)
(beginning-of-line)
(let ((u16pos column))
(cl-loop for i below column do
(if (equalp (line-end-position) (point)) (return))
(and (> u16pos 0)
(if (> (code-point) 65536)
(progn (setf u16pos (- u16pos 2))
(forward-char 1))
(progn (setf u16pos (1- u16pos))
(forward-char 1)))))
u16pos)) And please forgive me my layman's complexity analysis :-) |
The basic-diagnostics test fails because pyls sends the following notification to a buffer of 23 characters:
This notification is perfectly in line with the specification. The new benchmarks show that the patch João proposed doesn't really increase the execution time, whereas Theodor's solution is quite slow. I guess this is because encode-coding-region is in C, so it's cheap to run. However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))? I also see a potential problem with the usage of
|
I don't have any reason to believe you're wrong, and I don't understand @nemethf's benchmarks super-well, but they seem to indicate your O(n) version is slower for some reason. There are a lot of tests there in #125, though it was so specific that me and @mkcms put them in a branch. If you can come up with a more correct and faster version, please do! |
In the meantime, @nemeth, if I understand correctly, my fixed version worked and has identical performance to the version in master, right? If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch. Let Travis run on that branch. If there's a mistake, it is probably because of non-compliant pyls, so let-bind the If at any moment you discover another problem, post here.
I can sort of see the right to left thing, but it could be a question of later using And thanks a lot, both of you, for all this work. |
I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks". |
Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>
Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>
Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because
Yes, the difference was marginal.
Without further optimization, I pushed it to fix-361-attempt-2.
My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc? |
Notice that from your "short" to your "long" version, the difference between the samples you took for the two algorithms increased, not decreased. However, regarding complexity, what you say holds true in general, and if that reasoning applies here, it's easy to test: just benchmark both algorithms with a ridiculously long line. If your algorithm indeed holds an asymptotic advantage as you propose, then it should become evident sooner or later. Otherwise you made an error somewhere in your complexity analysis. Regardless, we are dealing with real code and real systems, so while I am of course curious to know the results of the above experiment, I still think that in the meantime we have a clear winner, at least for "normal" length lines (that the function was originally tested against).
That is very difficult to know, i.e. it is very difficult to prove a negative. It is much easier for you or anyone to show a single case of misbehaviour (perhaps writing tests), than for me to prove that no possible unforeseen interaction can ever make it misbehave. All I can say is that I designed the function to play along with compliant LSP servers: if a server mentions a column in a line that indeed has that column, it should be safe to travel less than that column's width and re-measure the distance travelled. If the server is lying, then indeed we might overshoot (and that's why I added the |
Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.
…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.
…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>
…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.
…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. #361: joaotavora/eglot#361
Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end. (#361: joaotavora/eglot#361
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com> #361: joaotavora/eglot#361
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. GitHub-reference: fix joaotavora/eglot#361
Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end. GitHub-reference: joaotavora/eglot#361
* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com> GitHub-reference: fix joaotavora/eglot#361
When modifying a line like this:
if you try to append some text after closing quote ("), eglot tells LSP servers that there was a change at offset 52, which is wrong as those emojis are encoded with 4 utf-8 bytes, therefore each results in 2 codepoints. the correct offset should've been 72.
This results in synchronisation problems regarding the file contents between editor and lsp servers.
The text was updated successfully, but these errors were encountered: