Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong handling of codepoint offsets #361

Closed
kadircet opened this issue Nov 28, 2019 · 40 comments
Closed

Wrong handling of codepoint offsets #361

kadircet opened this issue Nov 28, 2019 · 40 comments

Comments

@kadircet
Copy link

When modifying a line like this:

  const char write_data[] = u8"🚂🚃🚄🚅🚆🚈🚇🚈🚉🚊🚋🚌🚎🚝🚞🚟🚠🚡🛤🛲";

if you try to append some text after closing quote ("), eglot tells LSP servers that there was a change at offset 52, which is wrong as those emojis are encoded with 4 utf-8 bytes, therefore each results in 2 codepoints. the correct offset should've been 72.

This results in synchronisation problems regarding the file contents between editor and lsp servers.

@nemethf
Copy link
Collaborator

nemethf commented Nov 28, 2019

I think more info is needed. What server are you running? What are the values of the variables eglot-current-column-function and eglot-move-to-column-function? What happens if you flip the values of those variables after reading their documentations?

@kadircet
Copy link
Author

I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:

The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16. To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: ‘\n’, ‘\r\n’ and ‘\r’.

here is the info you asked though:

  • i am using eglot with clangd (trunk cdfdabbb99c32fb34283333c77f59cc62a51904c), and
  • it is vanilla eglot without any modifications to those variables.

@sam-mccall
Copy link

Workaround for clangd specifically: it sounds like eglot is specifying codepoint offsets, so you can pass clangd -offset-encoding=utf-32 to get that nonstandard behavior.

This still needs to be fixed, as eglot doesn't pass that flag to clangd by default, and other servers will rely on the standard utf-16 encoding.

@joaotavora
Copy link
Owner

joaotavora commented Nov 28, 2019

I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:

I agree, but many servers (and clients) don't follow the spec so Eglot has been pragmatic about this in the past. So @nemethf is asking you what values do you have for the variables eglot-move-to-column-function and eglot-current-column-function. If you set them to #'eglot-move-to-lsp-abiding-column and #'eglot-lsp-abiding-column ,respectively, you should get the behaviour you want.

If you don't get it, then it's a bug.

You may also argue for using those functions as the default. Eventually we will have to. But they make things run slower, and last time I checked it was intrinsic to the protocol (@MaskRay had some insight on this, I think).

@nemethf
Copy link
Collaborator

nemethf commented Nov 28, 2019

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

@joaotavora
Copy link
Owner

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

No. The protocol seems to suck in this regard, but it's the protocol. But you can change the protocol.

Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default. So far I only count clangd, but surely there are more (perhaps more than last time we counted...).

@sam-mccall
Copy link

sam-mccall commented Nov 28, 2019

So our perspective is as clangd implementers. We'd like clangd to work robustly with eglot by default, as we can't get all our users to tinker with their config.

We follow the spec, and also support other options, there's not much more we can do. If you really want to keep codepoints as the default, maybe you could tell clangd to use them? (Flag or negotiation)

Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default.

https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0
(From microsoft/language-server-protocol#376)

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

Clangd supports that as an LSP extension: https://clangd.github.io/extensions.html#utf-8-offsets
It's backwards compatible (safe to send to any server). You can request any encoding, clangd supports utf-8, utf-16, utf-32 (eglot default, I think)

@nemethf
Copy link
Collaborator

nemethf commented Nov 28, 2019

@sam-mccall, that's a nicely designed extension! Have you proposed for inclusion in the standard? (That issue is way too long to read every comment carefully.)

@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function and eglot-current-column-function.

On the other hand, the default server for c-mode and c++-mode is ccls, so eglot does not start automatically clangd. As a result, currently users need to set eglot-server-program manually, so they could set it to clangd -offset-encoding=utf-32.

@joaotavora
Copy link
Owner

@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function and eglot-current-column-function.

Or maybe we can use a special class for eglot-server-programs that sets these things as the default for clangd sessions. Or just change the global defaults and let other servers complain, after all they're the ones in the wrong, in principle.

joaotavora added a commit that referenced this issue Nov 29, 2019
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.
@joaotavora
Copy link
Owner

https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0
(From microsoft/language-server-protocol#376)

There a few missing there, but it seems the tide is turning... @sam-mccall @kadircet can you test the latest commit and confirm Eglot is working correctly out-of-the-box with clangd?

@joaotavora
Copy link
Owner

@kadircet @sam-mccall ping? Can you test 9fa4561?

@joaotavora
Copy link
Owner

@kadircet @sam-mccall ping again? Or ping anyone that can confirm out-of-the-box Eglot works correctly with out-of-the-box clangd using that Eglot commit 9fa4561.

@theothornhill
Copy link
Collaborator

theothornhill commented Jan 4, 2020

I can confirm that when using C-u M-x eglot Using clangd-8 and these functions set accordingly eglot reports code offsets correctly. #397 adds tests for this.

No need to set —code-offsets

@kadircet
Copy link
Author

kadircet commented Jan 7, 2020

sorry for the late reply, I've also tested 9fa4561 and it seems to be working with clangd, without any extra command line flags.

@joaotavora
Copy link
Owner

Then, because I'm very busy right now, I invite anyone with push access to bring in 9fa4561 into master.

@nemethf nemethf closed this as completed in 633979e Jan 8, 2020
@nemethf
Copy link
Collaborator

nemethf commented Jan 8, 2020

It's now landed on master. I've added an entry to NEWS.md. Please, check it when you have time.

@joaotavora
Copy link
Owner

added an entry to NEWS.md. Please, check it when you have time.

Good call. I'm sure it's fine. We can do NEWS reviews from time to time as well...

@nemethf nemethf reopened this Jan 8, 2020
@nemethf
Copy link
Collaborator

nemethf commented Jan 8, 2020

I reverted the change on the master branch and reopened this issue, because the tests fail. I think eglot-move-to-lsp-abiding-column is buggy. If the column number received from the server is grater than the line length then the function should jump to the end of line. I think it does not. (Also it uses move-to-column, which turned out to have issues in case of whitespace-mode. But that might not be a problem here.)

From the spec:
"if the character value is greater than the line length it defaults back to the line length."

@nemethf
Copy link
Collaborator

nemethf commented Jan 10, 2020

I wrote a potential fix and made some measurements. The script is in a new branch. Results are at below, they show that eglot-move-to-column consequently fast and does not depend on the circumstances. It is even faster than move-to-column. The fix makes the slow eglot-move-to-lsp-abiding-column even slower.

I haven't tested the fix profoundly, but if it's correct, I expect it to keep Eglot responsive. However, I hope someone can come up with a faster solution. Please, try to test it.

This is the new function:

(defun fix-1 (column)
  "Move to COLUMN abiding by the LSP spec."
  (save-restriction
    (narrow-to-region (line-beginning-position) (line-end-position))
    (goto-char (point-max))
    (setq column (min column
                      (/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
                            2)
                         2)))
    (cl-loop
     initially (move-to-column column)
     for diff = (- column
                   (/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
                         2)
                      2))
     until (zerop diff)
     do (condition-case nil
            (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
          (end-of-buffer)))))

And these are the results (total elapsed time for execution, the number of garbage collections that ran, and the time taken by garbage collection):

test case fn result
short ascii eglot-move-to-column 0.042223 0 0
short ascii move-to-column 0.090825 4 0
short ascii eglot-move-to-lsp-abiding-column 0.203311 13 0
short ascii fix-1 0.354881 24 0
short non-ascii eglot-move-to-column 0.042645 0 0
short non-ascii move-to-column 0.414419 4 0
short non-ascii eglot-move-to-lsp-abiding-column 1.251451 82 0
short non-ascii fix-1 1.597782 108 0
long ascii front eglot-move-to-column 0.042150 0 0
long ascii front move-to-column 0.317773 4 0
long ascii front eglot-move-to-lsp-abiding-column 0.463316 17 0
long ascii front fix-1 1.381337 125 0
long ascii end eglot-move-to-column 0.042232 0 0
long ascii end move-to-column 1.949140 4 0
long ascii end eglot-move-to-lsp-abiding-column 2.803400 106 0
long ascii end fix-1 3.726439 214 1
long non-ascii front eglot-move-to-column 0.044116 0 0
long non-ascii front move-to-column 0.531376 4 0
long non-ascii front eglot-move-to-lsp-abiding-column 0.691837 17 0
long non-ascii front fix-1 2.078963 152 0
long non-ascii end eglot-move-to-column 0.047536 0 0
long non-ascii end move-to-column 2.290028 4 0
long non-ascii end eglot-move-to-lsp-abiding-column 3.396392 106 0
long non-ascii end fix-1 4.389540 241 1

@sam-mccall
Copy link

Disclaimer: I'm not familiar with emacs lisp, neither the language nor the performance characteristics. I worked on the encoding support in clangd.

It looks like that function loops over prefixes of the line and encodes the prefix inside the loop. That's going to be quadratic time which explains why it's slow on long lines.

You're going to want to iterate character-by-character instead, in pseudopython (sorry, I really don't know any lisp):

def charOffset(line, u16target):
  u16pos = 0
  for i in [0, len(line)):
    if u16pos >= u16target
      return i
    u16pos = u16pos + u16len(line[i])
  return len(line)

For u16len you could use encode-coding-region, though if it's slow even on short strings there may be something faster. If a "column" is a unicode codepoint in emacs then it's trivial (1 if codepoint < 0x10000, else 2). If a "column" may a cluster of codepoints but you can get them cheaply, then you can implement u16len easily without actually decoding and allocating strings.

@joaotavora
Copy link
Owner

@nemethf the functions that I and @mkcms wrote were analysed extensively at the time. I'm not saying there can't be bugs: something is indeed wrong, but since it is such a localized function, I'm pretty sure it's possible to come up with a short serverless snippet that demonstrates beyond doubt the bug. By skimming your #361 (comment) I can't be sure. Please try to spend the time to make a recipe like "In the most recent master, within a buffer with these contents, calling foo returns x and provokes z instead of returning y and provoking w" if you can.

@joaotavora
Copy link
Owner

Or, if you prefer, write a unit test for this.

@joaotavora
Copy link
Owner

joaotavora commented Jan 10, 2020

Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.

EDIT:
We are making the informed decision here of switching to full LSP compliance. We didn't make that decision before because it is incompatible with some servers -- that's why we had those options in the first place. So if my suspicions are correct (I haven't analysed deeply) then we just need to locally use the old versions in those tests, until Travis CI gets a new pyls or something.

@theothornhill
Copy link
Collaborator

Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.

My guess is this is what’s happening. However, I have one case where the function is not working properly, and was mentioned briefly by me in #397 and by @nemethf here:

what happens

  1. Use a buffer with contents described on the first post of this issue

  2. (eglot-move-to-lsp-abiding-column 75)

  3. Emacs reports end-of-buffer

what should happen:

Eglot moves cleanly to end-of-line

I think this is what causes my test to choke on clangd-7, since it is reporting a column number much higher.

@joaotavora
Copy link
Owner

joaotavora commented Jan 10, 2020

What I am describing here is the fact that it functions incorrectly when server would give the function a value higher than length of line.

I am not saying that shouldn't be fixed. It may need to be fixed. I will certainly consider the rest of your analysis, but first I would like you to answer my question, so I can make sense of this. In which versions of clangd do you see the behaviour? Do you see this behaviour on clangd-8 at all?

EDIT: previously I wrote that "I may need to be fixed" instead of "it may need to be fixed". It may look like a typo, but both things are obviously true :-)

@theothornhill
Copy link
Collaborator

theothornhill commented Jan 10, 2020

Do you see this behaviour on clangd-8 at all?

I haven't encountered this being a problem on clangd-8, no, but I have on clangd-7. I am not totally confident that this proves the function is safe on newer servers. Not because I am skeptic, but don't feel my analysis on this matter has been thorough enough.

By not seeing this behaviour I mean the server hasn't given me the end-of-buffer error. The possibility to enter a high number into the function and getting this error does not depend on any server at all

but both things are obviously true :-)

haha!

@joaotavora
Copy link
Owner

I haven't encountered this being a problem on clangd-8, no, but I have on clangd-7.

Thank you!

am not totally confident that this proves the function is safe on newer servers.

Nor am I, but the at least the mystery that was baffling me is cleared, or less opaque at least.

By not seeing this behaviour I mean the server hasn't given me the end-of-buffer error.

and has it been placing diagnostics on the correct column, for example? Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?

@theothornhill
Copy link
Collaborator

Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?

I have not witnessed this yet, no

@theothornhill
Copy link
Collaborator

theothornhill commented Jan 10, 2020

This, however, is a sad small attempt to implement @sam-mccall pseudocode. It stops at line end and moves to correct column, just decrementing as we go. Not sure if faster or anything, but wanted to have a go at this :)

(defun code-point ()
  (interactive)
  (or (get-char-property (point) 'untranslated-utf-8)
      (encode-char (char-after) 'ucs)))

(defun move (column)
  (interactive)
  (beginning-of-line)
  (let ((u16pos column))
    (cl-loop for i below column do
             (if (equalp (line-end-position) (point)) (return))
             (and (> u16pos 0)
                  (if (> (code-point) 65536)
                      (progn (setf u16pos (- u16pos 2))
                             (forward-char 1))
                    (progn (setf u16pos (1- u16pos))
                           (forward-char 1)))))
    u16pos))

@joaotavora
Copy link
Owner

@theothornhill why just not make the existing eglot-move-to-lsp-abiding-column be a little more resilient when passed bogus columns by non-UTF16-compliant LSP servers? I.e. why not this 100% untested thing?

diff --git a/eglot.el b/eglot.el
index e952b91..7799191 100644
--- a/eglot.el
+++ b/eglot.el
@@ -1050,15 +1050,20 @@ be set to `eglot-move-to-lsp-abiding-column', and
 
 (defun eglot-move-to-lsp-abiding-column (column)
   "Move to COLUMN abiding by the LSP spec."
-  (cl-loop
-   initially (move-to-column column)
-   with lbp = (line-beginning-position)
-   for diff = (- column
-                 (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
-                       2)
-                    2))
-   until (zerop diff)
-   do (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))))
+  (save-restriction
+    (cl-loop
+     with lbp = (line-beginning-position)
+     initially
+     (narrow-to-region lbp (line-end-position))
+     (move-to-column column)
+     for diff = (- column
+                   (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
+                         2)
+                      2))
+     until (zerop diff)
+     do (condition-case eob-err
+            (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
+          (end-of-buffer (cl-return eob-err))))))
 
 (defun eglot--lsp-position-to-point (pos-plist &optional marker)
   "Convert LSP position POS-PLIST to Emacs point.

There are of course other ways to implement this resiliency, namely checking whether forward-char argument would overflow the line end.

@joaotavora
Copy link
Owner

joaotavora commented Jan 11, 2020

If I recall correctly, the relative speed advantage we got from the version of eglot-move-to-lsp-abiding-column that I programmed some time ago had to do with the fact that it does a binary search, and so checks fewer buffer positions: it shouldn't have to check every position from 0 upto column: it takes a guess and then adjusts until it finds it is in the correct position. See #124 and #125, particularly this #125 (comment) and the following ones.

EDIT: For clarity, this means I have reasons to disagree with @sam-mccall's assertion that we "want to iterate character-by-character". We can do it like that, and it was attempted but bisecting was found to be faster.

@theothornhill
Copy link
Collaborator

See #124 and #125, particularly this #125 (comment) and the following ones.

I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region. This function, as far as i can read from its source, has a running time of at least O(n) (It has to read the string), and this happen from n times to log n times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn). Please correct me if I'm wrong here.

Sams version is O(n), because it only scans once, at the most to EOL.

An added benefit is that we don't create log n utf-16-strings, take their length, which also seems expensive, and not related to moving point.

If my point here isn't moot - it may very well be . maybe @nemethf can try some variant of this function in his benchmarks?

(defun code-point ()
  (interactive)
  (encode-char (char-after) 'ucs))

(defun move (column)
  (interactive)
  (beginning-of-line)
  (let ((u16pos column))
    (cl-loop for i below column do
             (if (equalp (line-end-position) (point)) (return))
             (and (> u16pos 0)
                  (if (> (code-point) 65536)
                      (progn (setf u16pos (- u16pos 2))
                             (forward-char 1))
                    (progn (setf u16pos (1- u16pos))
                           (forward-char 1)))))
    u16pos))

And please forgive me my layman's complexity analysis :-)

@nemethf
Copy link
Collaborator

nemethf commented Jan 11, 2020

The basic-diagnostics test fails because pyls sends the following notification to a buffer of 23 characters:

server-notification Sat Jan 11 12:30:22 2020:
(:params
 (:uri "file:///home/nemethf/src/eglot/py-basic-diagnostics/main.py" :diagnostics
       [(:source "pyflakes" :range
                 (:start
                  (:line 0 :character 13)
                  :end
                  (:line 0 :character 37))
                 :message "invalid syntax" :severity 1)])
 :jsonrpc "2.0" :method "textDocument/publishDiagnostics")

This notification is perfectly in line with the specification.

The new benchmarks show that the patch João proposed doesn't really increase the execution time, whereas Theodor's solution is quite slow. I guess this is because encode-coding-region is in C, so it's cheap to run. However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I also see a potential problem with the usage of forward-char, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?

test case fn result
short ascii eglot-move-to-column 0.042893 0 0
short ascii move-to-column 0.091771 4 0
short ascii eglot-move-to-lsp-abiding-column 0.220460 13 0
short ascii fix-1 0.380529 24 0
short ascii joaos-fix 0.246880 13 0
short ascii theodors-move 0.228392 0 0
short non-ascii eglot-move-to-column 0.043326 0 0
short non-ascii move-to-column 0.414419 4 0
short non-ascii eglot-move-to-lsp-abiding-column 1.286369 82 0
short non-ascii fix-1 1.607467 108 0
short non-ascii joaos-fix 1.382975 82 0
short non-ascii theodors-move 2.960463 0 0
long ascii front eglot-move-to-column 0.044885 0 0
long ascii front move-to-column 0.317910 4 0
long ascii front eglot-move-to-lsp-abiding-column 0.478651 17 0
long ascii front fix-1 1.458539 125 0
long ascii front joaos-fix 0.504161 17 0
long ascii front theodors-move 0.840904 0 0
long ascii end eglot-move-to-column 0.041956 0 0
long ascii end move-to-column 1.965042 4 0
long ascii end eglot-move-to-lsp-abiding-column 2.849772 106 0
long ascii end fix-1 3.776413 214 1
long ascii end joaos-fix 2.878306 106 0
long ascii end theodors-move 15.581334 0 0
long non-ascii front eglot-move-to-column 0.043761 0 0
long non-ascii front move-to-column 0.513725 4 0
long non-ascii front eglot-move-to-lsp-abiding-column 0.672835 17 0
long non-ascii front fix-1 1.937640 152 0
long non-ascii front joaos-fix 0.707602 17 0
long non-ascii front theodors-move 0.889215 0 0
long non-ascii end eglot-move-to-column 0.043063 0 0
long non-ascii end move-to-column 1.987880 4 0
long non-ascii end eglot-move-to-lsp-abiding-column 2.924919 106 0
long non-ascii end fix-1 4.167515 241 1
long non-ascii end joaos-fix 2.956717 106 0
long non-ascii end theodors-move 15.707058 0 0

@joaotavora
Copy link
Owner

I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region. This function, as far as i can read from its source, has a running time of at least O(n) (It has to read the string), and this happen from n times to log n times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn). Please correct me if I'm wrong here.

I don't have any reason to believe you're wrong, and n log(n) seems to ring a bell here. It's been more than a year and I dont' remember all the details, but I remember having being convinced that we had to use encode-coding-region because that's the only function in Emacs that "knows" how to encode in all circumstances, circumstances that I'm not 100% familiar with and which will certainly include some corner cases. So, given that assumption, which I'm happy to see successfully disputed and defeated (probably depending on your knowledge of UTF-16 things, which is surely greater than mine), the cheapest version I came up with algorithmically was the binary search.

I don't understand @nemethf's benchmarks super-well, but they seem to indicate your O(n) version is slower for some reason.

There are a lot of tests there in #125, though it was so specific that me and @mkcms put them in a branch. If you can come up with a more correct and faster version, please do!

@joaotavora
Copy link
Owner

In the meantime, @nemeth, if I understand correctly, my fixed version worked and has identical performance to the version in master, right?

If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch. Let Travis run on that branch. If there's a mistake, it is probably because of non-compliant pyls, so let-bind the eglot.*column pair to the old version for those tests only. Do this is two pretty commits so we can keep track of the history. Post the resulting side-branch here and we'll take a final decision.

If at any moment you discover another problem, post here.

I also see a potential problem with the usage of forward-char, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?

I can sort of see the right to left thing, but it could be a question of later using backward-char instead of forward-char. Or maybe forward-char already has that idea built in (who says "forward" is left-to-right?). Anyway, let's forget about that for now, I don't think it changes the fundamental nature of the problem, it's just a symmetry quirk or something. Again, happy to be contradicted, but let's not let it hold this up while the contradicting happens :-)

And thanks a lot, both of you, for all this work.

@joaotavora
Copy link
Owner

However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".

nemethf pushed a commit that referenced this issue Jan 19, 2020
Ensure conformance with the this part of the specification: "if the
character value is greater than the line length it defaults back to
the line length."

* eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond
line-end.
nemethf added a commit that referenced this issue Jan 19, 2020
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>
nemethf pushed a commit that referenced this issue Jan 19, 2020
Ensure conformance with the this part of the specification: "if the
character value is greater than the line length it defaults back to
the line length."

* eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond
line-end.
nemethf added a commit that referenced this issue Jan 19, 2020
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>
@nemethf
Copy link
Collaborator

nemethf commented Jan 19, 2020

your O(n) version is slower for some reason.

Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.

my fixed version worked and has identical performance to the version in master, right?

Yes, the difference was marginal.

If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch.

Without further optimization, I pushed it to fix-361-attempt-2.

However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".

My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?

@joaotavora
Copy link
Owner

Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.

Notice that from your "short" to your "long" version, the difference between the samples you took for the two algorithms increased, not decreased. However, regarding complexity, what you say holds true in general, and if that reasoning applies here, it's easy to test: just benchmark both algorithms with a ridiculously long line. If your algorithm indeed holds an asymptotic advantage as you propose, then it should become evident sooner or later. Otherwise you made an error somewhere in your complexity analysis.

Regardless, we are dealing with real code and real systems, so while I am of course curious to know the results of the above experiment, I still think that in the meantime we have a clear winner, at least for "normal" length lines (that the function was originally tested against).

My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?

That is very difficult to know, i.e. it is very difficult to prove a negative. It is much easier for you or anyone to show a single case of misbehaviour (perhaps writing tests), than for me to prove that no possible unforeseen interaction can ever make it misbehave. All I can say is that I designed the function to play along with compliant LSP servers: if a server mentions a column in a line that indeed has that column, it should be safe to travel less than that column's width and re-measure the distance travelled. If the server is lying, then indeed we might overshoot (and that's why I added the condition-case). But if the server is lying then it isn't a compliant LSP server.

joaotavora added a commit that referenced this issue Apr 16, 2020
Ensure conformance with the this part of the specification: "if the
character value is greater than the line length it defaults back to
the line length."

* eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond
line-end.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
…olumns

* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
…olumns

* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
…olumns

* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
…olumns

* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

#361: joaotavora/eglot#361
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Ensure conformance with the this part of the specification: "if the
character value is greater than the line length it defaults back to
the line length."

* eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond
line-end.

(#361: joaotavora/eglot#361
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>

#361: joaotavora/eglot#361
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

GitHub-reference: fix joaotavora/eglot#361
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
Ensure conformance with the this part of the specification: "if the
character value is greater than the line length it defaults back to
the line length."

* eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond
line-end.

GitHub-reference: joaotavora/eglot#361
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
* eglot.el (eglot-current-column-function): Set to
eglot-lsp-abiding-column.
(eglot-move-to-column-function): Set to
eglot-move-to-lsp-abiding-column.

* NEWS.md: Log the change here as well.

Co-authored-by: João Távora <joaotavora@gmail.com>
GitHub-reference: fix joaotavora/eglot#361
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants