Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI-like escaped chars are shown if Korean variable names are hovered #740

Closed
xnuk opened this issue Apr 3, 2024 · 3 comments
Closed
Labels
external dependency Something need to be fixed upstream

Comments

@xnuk
Copy link

xnuk commented Apr 3, 2024

  • rust-analyzer 1.79.0-nightly (c9f8f34 2024-03-27)
  • kak-lsp 16.0.0-snapshot (377a1a0)
    • kak-lsp 16.0.0 also reproducible
  • Kakoune v2023.08.05
  • on Arch Linux x64
// main.rs

fn main() {
    let 된장 = 1;
}
zz.mp4

I named it as 된장 because I ate 된장찌개 today

krobelus added a commit to krobelus/rust-analyzer that referenced this issue Apr 3, 2024
On this file,

```rust
fn main() {
    let 된장 = 1;
}
```

when using `"positionEncodings":["utf-16"]` I get an "unused variable" diagnostic on the variable
name (codepoint offset range `8..10`). So far so good.

When using `positionEncodings":["utf-8"]`, I expect to get the equivalent range in bytes (LSP:
"Character offsets count UTF-8 code units (e.g bytes)."), which is `8..14`, because both
characters are 3 bytes in UTF-8.  However I actually get `10..14`.

Looks like this is because we accidentally treat a 1-based index as an offset value: when
converting from our internal char-indices to LSP byte offsets, we look at one character to many.
This causes wrong results if the extra character is a multi-byte one, such as when computing
the start coordinate of 된장.

Fix that by actually passing an offset. While at it, fix the variable name of the line number,
which is not an offset (yet).

Originally reported at kakoune-lsp/kakoune-lsp#740
@krobelus
Copy link
Member

krobelus commented Apr 3, 2024

proposed upstream fix is at rust-lang/rust-analyzer#17003

if you need a workaround, you can use our offset_encoding config value (needs below fix unfortunately):

From a54e5f9c3297e2882e8dced8e78a49a85e026f29 Mon Sep 17 00:00:00 2001
From: Johannes Altmanninger <aclopte@gmail.com>
Date: Wed, 3 Apr 2024 14:57:20 +0200
Subject: [PATCH 1/2] If offset_encoding is explicitly specified, always use
 that

It looks like when we ask rust-analyzer that we prefer utf-16 but
tell it that we also support utf-8, it will ignore our preference and
always use utf-8. I think this violates LSP; I'm not sure if we want to
work around it like below (because that might make user errors worse)
---
 src/capabilities.rs | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/capabilities.rs b/src/capabilities.rs
index b19cb77..9e72f24 100644
--- a/src/capabilities.rs
+++ b/src/capabilities.rs
@@ -362,17 +362,17 @@ pub fn initialize(meta: EditorMeta, ctx: &mut Context) {
                                 }),
                                 stale_request_support: None,
                                 position_encodings: Some(match preferred_offset_encoding {
-                                    None | Some(OffsetEncoding::Utf8) => {
+                                    None => {
                                         vec![
                                             PositionEncodingKind::UTF8,
                                             PositionEncodingKind::UTF16,
                                         ]
                                     }
+                                    Some(OffsetEncoding::Utf8) => {
+                                        vec![PositionEncodingKind::UTF8]
+                                    }
                                     Some(OffsetEncoding::Utf16) => {
-                                        vec![
-                                            PositionEncodingKind::UTF16,
-                                            PositionEncodingKind::UTF8,
-                                        ]
+                                        vec![PositionEncodingKind::UTF16]
                                     }
                                 }),
                             }),
-- 
2.44.0.413.gd6fd04375f


From 9e3600961c690f52e5a5219c8b93472ec4c093a1 Mon Sep 17 00:00:00 2001
From: Johannes Altmanninger <aclopte@gmail.com>
Date: Wed, 3 Apr 2024 14:59:41 +0200
Subject: [PATCH 2/2] Use utf-16 position encoding for rust-analyzer

---
 kak-lsp.toml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kak-lsp.toml b/kak-lsp.toml
index f194628..e2fabd3 100644
--- a/kak-lsp.toml
+++ b/kak-lsp.toml
@@ -369,6 +369,7 @@ command = "ocamllsp"
 filetypes = ["rust"]
 roots = ["Cargo.toml"]
 command = "rust-analyzer"
+offset_encoding = "utf-16"
 # command = "sh"
 # args = [
 #     "-c",
-- 
2.44.0.413.gd6fd04375f

@xnuk
Copy link
Author

xnuk commented Apr 3, 2024

Thanks! Should I close this issue then?

@krobelus
Copy link
Member

krobelus commented Apr 3, 2024

yeah I think so

@krobelus krobelus closed this as completed Apr 3, 2024
@krobelus krobelus added the external dependency Something need to be fixed upstream label Apr 3, 2024
bors added a commit to rust-lang/rust-analyzer that referenced this issue Apr 3, 2024
Fix off-by-one error converting to LSP UTF8 offsets with multi-byte char

On this file,

```rust
fn main() {
    let 된장 = 1;
}
```

when using `"positionEncodings":["utf-16"]` I get an "unused variable" diagnostic on the variable
name (codepoint offset range `8..10`). So far so good.

When using `positionEncodings":["utf-8"]`, I expect to get the equivalent range in bytes (LSP:
"Character offsets count UTF-8 code units (e.g bytes)."), which is `8..14`, because both
characters are 3 bytes in UTF-8.  However I actually get `10..14`.

Looks like this is because we accidentally treat a 1-based index as an offset value: when
converting from our internal char-indices to LSP byte offsets, we look at one character to many.
This causes wrong results if the extra character is a multi-byte one, such as when computing
the start coordinate of 된장.

Fix that by actually passing an offset. While at it, fix the variable name of the line number,
which is not an offset (yet).

Originally reported at kakoune-lsp/kakoune-lsp#740
lnicola pushed a commit to lnicola/rust that referenced this issue Apr 7, 2024
On this file,

```rust
fn main() {
    let 된장 = 1;
}
```

when using `"positionEncodings":["utf-16"]` I get an "unused variable" diagnostic on the variable
name (codepoint offset range `8..10`). So far so good.

When using `positionEncodings":["utf-8"]`, I expect to get the equivalent range in bytes (LSP:
"Character offsets count UTF-8 code units (e.g bytes)."), which is `8..14`, because both
characters are 3 bytes in UTF-8.  However I actually get `10..14`.

Looks like this is because we accidentally treat a 1-based index as an offset value: when
converting from our internal char-indices to LSP byte offsets, we look at one character to many.
This causes wrong results if the extra character is a multi-byte one, such as when computing
the start coordinate of 된장.

Fix that by actually passing an offset. While at it, fix the variable name of the line number,
which is not an offset (yet).

Originally reported at kakoune-lsp/kakoune-lsp#740
lnicola pushed a commit to lnicola/rust that referenced this issue Apr 7, 2024
…Veykril

Fix off-by-one error converting to LSP UTF8 offsets with multi-byte char

On this file,

```rust
fn main() {
    let 된장 = 1;
}
```

when using `"positionEncodings":["utf-16"]` I get an "unused variable" diagnostic on the variable
name (codepoint offset range `8..10`). So far so good.

When using `positionEncodings":["utf-8"]`, I expect to get the equivalent range in bytes (LSP:
"Character offsets count UTF-8 code units (e.g bytes)."), which is `8..14`, because both
characters are 3 bytes in UTF-8.  However I actually get `10..14`.

Looks like this is because we accidentally treat a 1-based index as an offset value: when
converting from our internal char-indices to LSP byte offsets, we look at one character to many.
This causes wrong results if the extra character is a multi-byte one, such as when computing
the start coordinate of 된장.

Fix that by actually passing an offset. While at it, fix the variable name of the line number,
which is not an offset (yet).

Originally reported at kakoune-lsp/kakoune-lsp#740
HKalbasi pushed a commit to HKalbasi/rust-analyzer that referenced this issue Apr 13, 2024
On this file,

```rust
fn main() {
    let 된장 = 1;
}
```

when using `"positionEncodings":["utf-16"]` I get an "unused variable" diagnostic on the variable
name (codepoint offset range `8..10`). So far so good.

When using `positionEncodings":["utf-8"]`, I expect to get the equivalent range in bytes (LSP:
"Character offsets count UTF-8 code units (e.g bytes)."), which is `8..14`, because both
characters are 3 bytes in UTF-8.  However I actually get `10..14`.

Looks like this is because we accidentally treat a 1-based index as an offset value: when
converting from our internal char-indices to LSP byte offsets, we look at one character to many.
This causes wrong results if the extra character is a multi-byte one, such as when computing
the start coordinate of 된장.

Fix that by actually passing an offset. While at it, fix the variable name of the line number,
which is not an offset (yet).

Originally reported at kakoune-lsp/kakoune-lsp#740
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external dependency Something need to be fixed upstream
Projects
None yet
Development

No branches or pull requests

2 participants