Wrong handling of codepoint offsets #361

kadircet · 2019-11-28T09:16:19Z

When modifying a line like this:

  const char write_data[] = u8"🚂🚃🚄🚅🚆🚈🚇🚈🚉🚊🚋🚌🚎🚝🚞🚟🚠🚡🛤🛲";

if you try to append some text after closing quote ("), eglot tells LSP servers that there was a change at offset 52, which is wrong as those emojis are encoded with 4 utf-8 bytes, therefore each results in 2 codepoints. the correct offset should've been 72.

This results in synchronisation problems regarding the file contents between editor and lsp servers.

nemethf · 2019-11-28T10:33:59Z

I think more info is needed. What server are you running? What are the values of the variables eglot-current-column-function and eglot-move-to-column-function? What happens if you flip the values of those variables after reading their documentations?

kadircet · 2019-11-28T11:04:15Z

I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:

The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16. To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: ‘\n’, ‘\r\n’ and ‘\r’.

here is the info you asked though:

i am using eglot with clangd (trunk cdfdabbb99c32fb34283333c77f59cc62a51904c), and
it is vanilla eglot without any modifications to those variables.

sam-mccall · 2019-11-28T12:26:25Z

Workaround for clangd specifically: it sounds like eglot is specifying codepoint offsets, so you can pass clangd -offset-encoding=utf-32 to get that nonstandard behavior.

This still needs to be fixed, as eglot doesn't pass that flag to clangd by default, and other servers will rely on the standard utf-16 encoding.

joaotavora · 2019-11-28T12:34:15Z

I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:

I agree, but many servers (and clients) don't follow the spec so Eglot has been pragmatic about this in the past. So @nemethf is asking you what values do you have for the variables eglot-move-to-column-function and eglot-current-column-function. If you set them to #'eglot-move-to-lsp-abiding-column and #'eglot-lsp-abiding-column ,respectively, you should get the behaviour you want.

If you don't get it, then it's a bug.

You may also argue for using those functions as the default. Eventually we will have to. But they make things run slower, and last time I checked it was intrinsic to the protocol (@MaskRay had some insight on this, I think).

nemethf · 2019-11-28T13:11:34Z

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

joaotavora · 2019-11-28T14:31:15Z

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

No. The protocol seems to suck in this regard, but it's the protocol. But you can change the protocol.

Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default. So far I only count clangd, but surely there are more (perhaps more than last time we counted...).

sam-mccall · 2019-11-28T19:45:51Z

So our perspective is as clangd implementers. We'd like clangd to work robustly with eglot by default, as we can't get all our users to tinker with their config.

We follow the spec, and also support other options, there's not much more we can do. If you really want to keep codepoints as the default, maybe you could tell clangd to use them? (Flag or negotiation)

Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default.

https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0
(From microsoft/language-server-protocol#376)

(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)

Clangd supports that as an LSP extension: https://clangd.github.io/extensions.html#utf-8-offsets
It's backwards compatible (safe to send to any server). You can request any encoding, clangd supports utf-8, utf-16, utf-32 (eglot default, I think)

nemethf · 2019-11-28T20:23:46Z

@sam-mccall, that's a nicely designed extension! Have you proposed for inclusion in the standard? (That issue is way too long to read every comment carefully.)

@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function and eglot-current-column-function.

On the other hand, the default server for c-mode and c++-mode is ccls, so eglot does not start automatically clangd. As a result, currently users need to set eglot-server-program manually, so they could set it to clangd -offset-encoding=utf-32.

joaotavora · 2019-11-28T21:34:02Z

@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function and eglot-current-column-function.

Or maybe we can use a special class for eglot-server-programs that sets these things as the default for clangd sessions. Or just change the global defaults and let other servers complain, after all they're the ones in the wrong, in principle.

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.

joaotavora · 2019-11-29T23:44:43Z

https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0
(From microsoft/language-server-protocol#376)

There a few missing there, but it seems the tide is turning... @sam-mccall @kadircet can you test the latest commit and confirm Eglot is working correctly out-of-the-box with clangd?

joaotavora · 2019-12-08T22:09:07Z

@kadircet @sam-mccall ping? Can you test 9fa4561?

joaotavora · 2020-01-03T21:46:54Z

@kadircet @sam-mccall ping again? Or ping anyone that can confirm out-of-the-box Eglot works correctly with out-of-the-box clangd using that Eglot commit 9fa4561.

theothornhill · 2020-01-04T15:32:16Z

I can confirm that when using C-u M-x eglot Using clangd-8 and these functions set accordingly eglot reports code offsets correctly. #397 adds tests for this.

No need to set —code-offsets

kadircet · 2020-01-07T10:25:25Z

sorry for the late reply, I've also tested 9fa4561 and it seems to be working with clangd, without any extra command line flags.

joaotavora · 2020-01-07T10:51:26Z

Then, because I'm very busy right now, I invite anyone with push access to bring in 9fa4561 into master.

nemethf · 2020-01-08T16:04:52Z

It's now landed on master. I've added an entry to NEWS.md. Please, check it when you have time.

joaotavora · 2020-01-08T16:17:34Z

added an entry to NEWS.md. Please, check it when you have time.

Good call. I'm sure it's fine. We can do NEWS reviews from time to time as well...

nemethf · 2020-01-08T17:10:42Z

I reverted the change on the master branch and reopened this issue, because the tests fail. I think eglot-move-to-lsp-abiding-column is buggy. If the column number received from the server is grater than the line length then the function should jump to the end of line. I think it does not. (Also it uses move-to-column, which turned out to have issues in case of whitespace-mode. But that might not be a problem here.)

From the spec:
"if the character value is greater than the line length it defaults back to the line length."

nemethf · 2020-01-10T17:48:25Z

I wrote a potential fix and made some measurements. The script is in a new branch. Results are at below, they show that eglot-move-to-column consequently fast and does not depend on the circumstances. It is even faster than move-to-column. The fix makes the slow eglot-move-to-lsp-abiding-column even slower.

I haven't tested the fix profoundly, but if it's correct, I expect it to keep Eglot responsive. However, I hope someone can come up with a faster solution. Please, try to test it.

This is the new function:

(defun fix-1 (column)
  "Move to COLUMN abiding by the LSP spec."
  (save-restriction
    (narrow-to-region (line-beginning-position) (line-end-position))
    (goto-char (point-max))
    (setq column (min column
                      (/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
                            2)
                         2)))
    (cl-loop
     initially (move-to-column column)
     for diff = (- column
                   (/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
                         2)
                      2))
     until (zerop diff)
     do (condition-case nil
            (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
          (end-of-buffer)))))

And these are the results (total elapsed time for execution, the number of garbage collections that ran, and the time taken by garbage collection):

test case	fn	result
short ascii	eglot-move-to-column	0.042223 0 0
short ascii	move-to-column	0.090825 4 0
short ascii	eglot-move-to-lsp-abiding-column	0.203311 13 0
short ascii	fix-1	0.354881 24 0
short non-ascii	eglot-move-to-column	0.042645 0 0
short non-ascii	move-to-column	0.414419 4 0
short non-ascii	eglot-move-to-lsp-abiding-column	1.251451 82 0
short non-ascii	fix-1	1.597782 108 0
long ascii front	eglot-move-to-column	0.042150 0 0
long ascii front	move-to-column	0.317773 4 0
long ascii front	eglot-move-to-lsp-abiding-column	0.463316 17 0
long ascii front	fix-1	1.381337 125 0
long ascii end	eglot-move-to-column	0.042232 0 0
long ascii end	move-to-column	1.949140 4 0
long ascii end	eglot-move-to-lsp-abiding-column	2.803400 106 0
long ascii end	fix-1	3.726439 214 1
long non-ascii front	eglot-move-to-column	0.044116 0 0
long non-ascii front	move-to-column	0.531376 4 0
long non-ascii front	eglot-move-to-lsp-abiding-column	0.691837 17 0
long non-ascii front	fix-1	2.078963 152 0
long non-ascii end	eglot-move-to-column	0.047536 0 0
long non-ascii end	move-to-column	2.290028 4 0
long non-ascii end	eglot-move-to-lsp-abiding-column	3.396392 106 0
long non-ascii end	fix-1	4.389540 241 1

sam-mccall · 2020-01-10T21:19:58Z

Disclaimer: I'm not familiar with emacs lisp, neither the language nor the performance characteristics. I worked on the encoding support in clangd.

It looks like that function loops over prefixes of the line and encodes the prefix inside the loop. That's going to be quadratic time which explains why it's slow on long lines.

You're going to want to iterate character-by-character instead, in pseudopython (sorry, I really don't know any lisp):

def charOffset(line, u16target):
  u16pos = 0
  for i in [0, len(line)):
    if u16pos >= u16target
      return i
    u16pos = u16pos + u16len(line[i])
  return len(line)

For u16len you could use encode-coding-region, though if it's slow even on short strings there may be something faster. If a "column" is a unicode codepoint in emacs then it's trivial (1 if codepoint < 0x10000, else 2). If a "column" may a cluster of codepoints but you can get them cheaply, then you can implement u16len easily without actually decoding and allocating strings.

joaotavora · 2020-01-10T21:24:37Z

@nemethf the functions that I and @mkcms wrote were analysed extensively at the time. I'm not saying there can't be bugs: something is indeed wrong, but since it is such a localized function, I'm pretty sure it's possible to come up with a short serverless snippet that demonstrates beyond doubt the bug. By skimming your #361 (comment) I can't be sure. Please try to spend the time to make a recipe like "In the most recent master, within a buffer with these contents, calling foo returns x and provokes z instead of returning y and provoking w" if you can.

joaotavora · 2020-01-10T21:25:33Z

Or, if you prefer, write a unit test for this.

joaotavora · 2020-01-10T21:30:16Z

Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.

EDIT:
We are making the informed decision here of switching to full LSP compliance. We didn't make that decision before because it is incompatible with some servers -- that's why we had those options in the first place. So if my suspicions are correct (I haven't analysed deeply) then we just need to locally use the old versions in those tests, until Travis CI gets a new pyls or something.

theothornhill · 2020-01-10T21:50:35Z

Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.

My guess is this is what’s happening. However, I have one case where the function is not working properly, and was mentioned briefly by me in #397 and by @nemethf here:

what happens

Use a buffer with contents described on the first post of this issue
(eglot-move-to-lsp-abiding-column 75)
Emacs reports end-of-buffer

what should happen:

Eglot moves cleanly to end-of-line

I think this is what causes my test to choke on clangd-7, since it is reporting a column number much higher.

joaotavora · 2020-01-10T22:57:35Z

What I am describing here is the fact that it functions incorrectly when server would give the function a value higher than length of line.

I am not saying that shouldn't be fixed. It may need to be fixed. I will certainly consider the rest of your analysis, but first I would like you to answer my question, so I can make sense of this. In which versions of clangd do you see the behaviour? Do you see this behaviour on clangd-8 at all?

EDIT: previously I wrote that "I may need to be fixed" instead of "it may need to be fixed". It may look like a typo, but both things are obviously true :-)

theothornhill · 2020-01-10T23:10:20Z

Do you see this behaviour on clangd-8 at all?

I haven't encountered this being a problem on clangd-8, no, but I have on clangd-7. I am not totally confident that this proves the function is safe on newer servers. Not because I am skeptic, but don't feel my analysis on this matter has been thorough enough.

By not seeing this behaviour I mean the server hasn't given me the end-of-buffer error. The possibility to enter a high number into the function and getting this error does not depend on any server at all

but both things are obviously true :-)

haha!

joaotavora · 2020-01-10T23:13:40Z

I haven't encountered this being a problem on clangd-8, no, but I have on clangd-7.

Thank you!

am not totally confident that this proves the function is safe on newer servers.

Nor am I, but the at least the mystery that was baffling me is cleared, or less opaque at least.

By not seeing this behaviour I mean the server hasn't given me the end-of-buffer error.

and has it been placing diagnostics on the correct column, for example? Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?

theothornhill · 2020-01-10T23:22:50Z

Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?

I have not witnessed this yet, no

theothornhill · 2020-01-10T23:41:37Z

This, however, is a sad small attempt to implement @sam-mccall pseudocode. It stops at line end and moves to correct column, just decrementing as we go. Not sure if faster or anything, but wanted to have a go at this :)

(defun code-point ()
  (interactive)
  (or (get-char-property (point) 'untranslated-utf-8)
      (encode-char (char-after) 'ucs)))

(defun move (column)
  (interactive)
  (beginning-of-line)
  (let ((u16pos column))
    (cl-loop for i below column do
             (if (equalp (line-end-position) (point)) (return))
             (and (> u16pos 0)
                  (if (> (code-point) 65536)
                      (progn (setf u16pos (- u16pos 2))
                             (forward-char 1))
                    (progn (setf u16pos (1- u16pos))
                           (forward-char 1)))))
    u16pos))

joaotavora · 2020-01-11T00:27:09Z

@theothornhill why just not make the existing eglot-move-to-lsp-abiding-column be a little more resilient when passed bogus columns by non-UTF16-compliant LSP servers? I.e. why not this 100% untested thing?

diff --git a/eglot.el b/eglot.el
index e952b91..7799191 100644
--- a/eglot.el
+++ b/eglot.el
@@ -1050,15 +1050,20 @@ be set to `eglot-move-to-lsp-abiding-column', and
 
 (defun eglot-move-to-lsp-abiding-column (column)
   "Move to COLUMN abiding by the LSP spec."
-  (cl-loop
-   initially (move-to-column column)
-   with lbp = (line-beginning-position)
-   for diff = (- column
-                 (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
-                       2)
-                    2))
-   until (zerop diff)
-   do (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))))
+  (save-restriction
+    (cl-loop
+     with lbp = (line-beginning-position)
+     initially
+     (narrow-to-region lbp (line-end-position))
+     (move-to-column column)
+     for diff = (- column
+                   (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
+                         2)
+                      2))
+     until (zerop diff)
+     do (condition-case eob-err
+            (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
+          (end-of-buffer (cl-return eob-err))))))
 
 (defun eglot--lsp-position-to-point (pos-plist &optional marker)
   "Convert LSP position POS-PLIST to Emacs point.

There are of course other ways to implement this resiliency, namely checking whether forward-char argument would overflow the line end.

joaotavora · 2020-01-11T00:32:50Z

If I recall correctly, the relative speed advantage we got from the version of eglot-move-to-lsp-abiding-column that I programmed some time ago had to do with the fact that it does a binary search, and so checks fewer buffer positions: it shouldn't have to check every position from 0 upto column: it takes a guess and then adjusts until it finds it is in the correct position. See #124 and #125, particularly this #125 (comment) and the following ones.

EDIT: For clarity, this means I have reasons to disagree with @sam-mccall's assertion that we "want to iterate character-by-character". We can do it like that, and it was attempted but bisecting was found to be faster.

theothornhill · 2020-01-11T11:05:06Z

See #124 and #125, particularly this #125 (comment) and the following ones.

I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region. This function, as far as i can read from its source, has a running time of at least O(n) (It has to read the string), and this happen from n times to log n times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn). Please correct me if I'm wrong here.

Sams version is O(n), because it only scans once, at the most to EOL.

An added benefit is that we don't create log n utf-16-strings, take their length, which also seems expensive, and not related to moving point.

If my point here isn't moot - it may very well be . maybe @nemethf can try some variant of this function in his benchmarks?

(defun code-point ()
  (interactive)
  (encode-char (char-after) 'ucs))

(defun move (column)
  (interactive)
  (beginning-of-line)
  (let ((u16pos column))
    (cl-loop for i below column do
             (if (equalp (line-end-position) (point)) (return))
             (and (> u16pos 0)
                  (if (> (code-point) 65536)
                      (progn (setf u16pos (- u16pos 2))
                             (forward-char 1))
                    (progn (setf u16pos (1- u16pos))
                           (forward-char 1)))))
    u16pos))

And please forgive me my layman's complexity analysis :-)

nemethf · 2020-01-11T12:09:32Z

The basic-diagnostics test fails because pyls sends the following notification to a buffer of 23 characters:

server-notification Sat Jan 11 12:30:22 2020:
(:params
 (:uri "file:///home/nemethf/src/eglot/py-basic-diagnostics/main.py" :diagnostics
       [(:source "pyflakes" :range
                 (:start
                  (:line 0 :character 13)
                  :end
                  (:line 0 :character 37))
                 :message "invalid syntax" :severity 1)])
 :jsonrpc "2.0" :method "textDocument/publishDiagnostics")

This notification is perfectly in line with the specification.

The new benchmarks show that the patch João proposed doesn't really increase the execution time, whereas Theodor's solution is quite slow. I guess this is because encode-coding-region is in C, so it's cheap to run. However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I also see a potential problem with the usage of forward-char, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?

test case	fn	result
short ascii	eglot-move-to-column	0.042893 0 0
short ascii	move-to-column	0.091771 4 0
short ascii	eglot-move-to-lsp-abiding-column	0.220460 13 0
short ascii	fix-1	0.380529 24 0
short ascii	joaos-fix	0.246880 13 0
short ascii	theodors-move	0.228392 0 0
short non-ascii	eglot-move-to-column	0.043326 0 0
short non-ascii	move-to-column	0.414419 4 0
short non-ascii	eglot-move-to-lsp-abiding-column	1.286369 82 0
short non-ascii	fix-1	1.607467 108 0
short non-ascii	joaos-fix	1.382975 82 0
short non-ascii	theodors-move	2.960463 0 0
long ascii front	eglot-move-to-column	0.044885 0 0
long ascii front	move-to-column	0.317910 4 0
long ascii front	eglot-move-to-lsp-abiding-column	0.478651 17 0
long ascii front	fix-1	1.458539 125 0
long ascii front	joaos-fix	0.504161 17 0
long ascii front	theodors-move	0.840904 0 0
long ascii end	eglot-move-to-column	0.041956 0 0
long ascii end	move-to-column	1.965042 4 0
long ascii end	eglot-move-to-lsp-abiding-column	2.849772 106 0
long ascii end	fix-1	3.776413 214 1
long ascii end	joaos-fix	2.878306 106 0
long ascii end	theodors-move	15.581334 0 0
long non-ascii front	eglot-move-to-column	0.043761 0 0
long non-ascii front	move-to-column	0.513725 4 0
long non-ascii front	eglot-move-to-lsp-abiding-column	0.672835 17 0
long non-ascii front	fix-1	1.937640 152 0
long non-ascii front	joaos-fix	0.707602 17 0
long non-ascii front	theodors-move	0.889215 0 0
long non-ascii end	eglot-move-to-column	0.043063 0 0
long non-ascii end	move-to-column	1.987880 4 0
long non-ascii end	eglot-move-to-lsp-abiding-column	2.924919 106 0
long non-ascii end	fix-1	4.167515 241 1
long non-ascii end	joaos-fix	2.956717 106 0
long non-ascii end	theodors-move	15.707058 0 0

joaotavora · 2020-01-11T12:34:29Z

I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region. This function, as far as i can read from its source, has a running time of at least O(n) (It has to read the string), and this happen from n times to log n times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn). Please correct me if I'm wrong here.

I don't have any reason to believe you're wrong, and n log(n) seems to ring a bell here. It's been more than a year and I dont' remember all the details, but I remember having being convinced that we had to use encode-coding-region because that's the only function in Emacs that "knows" how to encode in all circumstances, circumstances that I'm not 100% familiar with and which will certainly include some corner cases. So, given that assumption, which I'm happy to see successfully disputed and defeated (probably depending on your knowledge of UTF-16 things, which is surely greater than mine), the cheapest version I came up with algorithmically was the binary search.

I don't understand @nemethf's benchmarks super-well, but they seem to indicate your O(n) version is slower for some reason.

There are a lot of tests there in #125, though it was so specific that me and @mkcms put them in a branch. If you can come up with a more correct and faster version, please do!

joaotavora · 2020-01-11T12:42:42Z

In the meantime, @nemeth, if I understand correctly, my fixed version worked and has identical performance to the version in master, right?

If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch. Let Travis run on that branch. If there's a mistake, it is probably because of non-compliant pyls, so let-bind the eglot.*column pair to the old version for those tests only. Do this is two pretty commits so we can keep track of the history. Post the resulting side-branch here and we'll take a final decision.

If at any moment you discover another problem, post here.

I also see a potential problem with the usage of forward-char, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?

I can sort of see the right to left thing, but it could be a question of later using backward-char instead of forward-char. Or maybe forward-char already has that idea built in (who says "forward" is left-to-right?). Anyway, let's forget about that for now, I don't think it changes the fundamental nature of the problem, it's just a symmetry quirk or something. Again, happy to be contradicted, but let's not let it hold this up while the contradicting happens :-)

And thanks a lot, both of you, for all this work.

joaotavora · 2020-01-11T12:46:23Z

However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".

Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>

Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>

nemethf · 2020-01-19T10:50:43Z

your O(n) version is slower for some reason.

Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.

my fixed version worked and has identical performance to the version in master, right?

Yes, the difference was marginal.

If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch.

Without further optimization, I pushed it to fix-361-attempt-2.

However, isn't it possible for João's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?

I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".

My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?

joaotavora · 2020-01-19T23:12:02Z

Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.

Notice that from your "short" to your "long" version, the difference between the samples you took for the two algorithms increased, not decreased. However, regarding complexity, what you say holds true in general, and if that reasoning applies here, it's easy to test: just benchmark both algorithms with a ridiculously long line. If your algorithm indeed holds an asymptotic advantage as you propose, then it should become evident sooner or later. Otherwise you made an error somewhere in your complexity analysis.

Regardless, we are dealing with real code and real systems, so while I am of course curious to know the results of the above experiment, I still think that in the meantime we have a clear winner, at least for "normal" length lines (that the function was originally tested against).

My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?

That is very difficult to know, i.e. it is very difficult to prove a negative. It is much easier for you or anyone to show a single case of misbehaviour (perhaps writing tests), than for me to prove that no possible unforeseen interaction can ever make it misbehave. All I can say is that I designed the function to play along with compliant LSP servers: if a server mentions a column in a line that indeed has that column, it should be safe to travel less than that column's width and re-measure the distance travelled. If the server is lying, then indeed we might overshoot (and that's why I added the condition-case). But if the server is lying then it isn't a compliant LSP server.

Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end.

…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.

…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>

…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.

…olumns * eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com>

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. #361: joaotavora/eglot#361

Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end. (#361: joaotavora/eglot#361

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com> #361: joaotavora/eglot#361

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. GitHub-reference: fix joaotavora/eglot#361

Ensure conformance with the this part of the specification: "if the character value is greater than the line length it defaults back to the line length." * eglot.el: (eglot-move-to-lsp-abiding-column): Don't move beyond line-end. GitHub-reference: joaotavora/eglot#361

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column. * NEWS.md: Log the change here as well. Co-authored-by: João Távora <joaotavora@gmail.com> GitHub-reference: fix joaotavora/eglot#361

joaotavora added a commit that referenced this issue Nov 29, 2019

Fix #361: abide by LSP when reporting and moving to columns

9fa4561

* eglot.el (eglot-current-column-function): Set to eglot-lsp-abiding-column. (eglot-move-to-column-function): Set to eglot-move-to-lsp-abiding-column.

joaotavora added bug tentative fix labels Nov 29, 2019

joaotavora mentioned this issue Jan 3, 2020

Use eglot-lsp-abiding-column #397

Closed

nemethf closed this as completed in 633979e Jan 8, 2020

nemethf reopened this Jan 8, 2020

joaotavora closed this as completed in cacfd24 Apr 16, 2020

Wrong handling of codepoint offsets #361

Wrong handling of codepoint offsets #361

Comments

kadircet commented Nov 28, 2019

nemethf commented Nov 28, 2019

kadircet commented Nov 28, 2019

sam-mccall commented Nov 28, 2019

joaotavora commented Nov 28, 2019 • edited Loading

nemethf commented Nov 28, 2019

joaotavora commented Nov 28, 2019

sam-mccall commented Nov 28, 2019 • edited Loading

nemethf commented Nov 28, 2019

joaotavora commented Nov 28, 2019

joaotavora commented Nov 29, 2019

joaotavora commented Dec 8, 2019

joaotavora commented Jan 3, 2020

theothornhill commented Jan 4, 2020 • edited Loading

kadircet commented Jan 7, 2020

joaotavora commented Jan 7, 2020

nemethf commented Jan 8, 2020

joaotavora commented Jan 8, 2020

nemethf commented Jan 8, 2020

nemethf commented Jan 10, 2020

sam-mccall commented Jan 10, 2020

joaotavora commented Jan 10, 2020

joaotavora commented Jan 10, 2020

joaotavora commented Jan 10, 2020 • edited Loading

theothornhill commented Jan 10, 2020

what happens

what should happen:

joaotavora commented Jan 10, 2020 • edited Loading

theothornhill commented Jan 10, 2020 • edited Loading

joaotavora commented Jan 10, 2020

theothornhill commented Jan 10, 2020

theothornhill commented Jan 10, 2020 • edited Loading

joaotavora commented Jan 11, 2020

joaotavora commented Jan 11, 2020 • edited Loading

theothornhill commented Jan 11, 2020

nemethf commented Jan 11, 2020

joaotavora commented Jan 11, 2020

joaotavora commented Jan 11, 2020

joaotavora commented Jan 11, 2020

nemethf commented Jan 19, 2020

joaotavora commented Jan 19, 2020

joaotavora commented Nov 28, 2019 •

edited

Loading

sam-mccall commented Nov 28, 2019 •

edited

Loading

theothornhill commented Jan 4, 2020 •

edited

Loading

joaotavora commented Jan 10, 2020 •

edited

Loading

joaotavora commented Jan 10, 2020 •

edited

Loading

theothornhill commented Jan 10, 2020 •

edited

Loading

theothornhill commented Jan 10, 2020 •

edited

Loading

joaotavora commented Jan 11, 2020 •

edited

Loading