[question] Line/column information #17

suhr · 2019-05-04T15:27:48Z

It seems like SyntaxNode only has TextRange which contains only infomation about absolute offsets. But how do you handle line/column ranges (necessary for printing errors)?

The text was updated successfully, but these errors were encountered:

matklad · 2019-05-04T15:35:51Z

In rust-analyzer, we maintain a separate index to translate utf8-offsets into (invalid) utf-16 line/column as per LSP:

https://github.com/rust-analyzer/rust-analyzer/blob/fcdb387f0d7e76f325a858e4463efd5d7ed3efc3/crates/ra_ide_api/src/line_index.rs
https://github.com/rust-analyzer/rust-analyzer/blob/fcdb387f0d7e76f325a858e4463efd5d7ed3efc3/crates/ra_ide_api/src/db.rs#L61-L68

suhr · 2019-05-04T15:44:45Z

A separate index sounds somewhat inconvenient. By the way, why UTF-16?

matklad · 2019-05-04T15:50:10Z

By the way, why UTF-16?

LSP requires UTF-16

This is actually one of the main reasons why a separate index makes sense: there's no universal definition of line/column: for some editors it is UTF-16 codepoints (VS Code), for some it is Unicode Characters (Emacs), and I bet for others it could be grapheme clusters as well.

Determining the column number is not as simple as performing byte arithmetic, because certain characters have different widths. Even if we only accepted ASCII, control characters aren't visible to the user. This uses the unicode-width crate as an alternative to POSIX wcwidth, to determine (hopefully) the number of fixed-width cells that a unicode character will take up on a terminal. For example, control characters are zero-width, while an emoji is likely double-width. See test cases for more information on that. There is also the unicode-segmentation crate, which can handle extended grapheme clusters and such, but (a) we'll be outputting the line to the terminal and (b) there's no guarantee that the user's editor displays grapheme clusters as a single column. LSP measures in UTF-16, apparently. I use both Emacs and Vim from a terminal, so unicode-width applies to me. There's too much variation to try to solve that right now. The columns can be considered a visual span---this gives us enough information to draw line annotations, which will happen soon. Here are some useful links: - https://hsivonen.fi/string-length/ - https://unicode.org/reports/tr29/ - rust-analyzer/rowan#17 - https://www.reddit.com/r/rust/comments/gpw2ra/how_is_the_rust_compiler_able_to_tell_the_visible/ DEV-10935

shilangyu · 2023-05-12T11:51:46Z

FYI: thanks to @azdavis line-index has been extracted to a crate, so you can use it directly.

suhr closed this as completed May 10, 2019

christoph-heiss mentioned this issue Nov 16, 2022

Analyzer: Provide line and column information along with offsets cybertec-postgresql/poc-plpgsql-analyzer#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Line/column information #17

[question] Line/column information #17

suhr commented May 4, 2019

matklad commented May 4, 2019 •

edited

Loading

suhr commented May 4, 2019

matklad commented May 4, 2019

shilangyu commented May 12, 2023

[question] Line/column information #17

[question] Line/column information #17

Comments

suhr commented May 4, 2019

matklad commented May 4, 2019 • edited Loading

suhr commented May 4, 2019

matklad commented May 4, 2019

shilangyu commented May 12, 2023

matklad commented May 4, 2019 •

edited

Loading