Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Line/column information #17

Closed
suhr opened this issue May 4, 2019 · 4 comments
Closed

[question] Line/column information #17

suhr opened this issue May 4, 2019 · 4 comments

Comments

@suhr
Copy link

suhr commented May 4, 2019

It seems like SyntaxNode only has TextRange which contains only infomation about absolute offsets. But how do you handle line/column ranges (necessary for printing errors)?

@matklad
Copy link
Member

matklad commented May 4, 2019

@suhr
Copy link
Author

suhr commented May 4, 2019

A separate index sounds somewhat inconvenient. By the way, why UTF-16?

@matklad
Copy link
Member

matklad commented May 4, 2019

By the way, why UTF-16?

LSP requires UTF-16

This is actually one of the main reasons why a separate index makes sense: there's no universal definition of line/column: for some editors it is UTF-16 codepoints (VS Code), for some it is Unicode Characters (Emacs), and I bet for others it could be grapheme clusters as well.

@suhr suhr closed this as completed May 10, 2019
mikegerwitz pushed a commit to lovullo/tame that referenced this issue Nov 11, 2022
Determining the column number is not as simple as performing byte
arithmetic, because certain characters have different widths.  Even if we
only accepted ASCII, control characters aren't visible to the user.

This uses the unicode-width crate as an alternative to POSIX wcwidth, to
determine (hopefully) the number of fixed-width cells that a unicode
character will take up on a terminal.  For example, control characters are
zero-width, while an emoji is likely double-width.  See test cases for more
information on that.

There is also the unicode-segmentation crate, which can handle extended
grapheme clusters and such, but (a) we'll be outputting the line to the
terminal and (b) there's no guarantee that the user's editor displays
grapheme clusters as a single column.  LSP measures in UTF-16,
apparently.  I use both Emacs and Vim from a terminal, so unicode-width
applies to me.  There's too much variation to try to solve that right now.

The columns can be considered a visual span---this gives us enough
information to draw line annotations, which will happen soon.

Here are some useful links:

  - https://hsivonen.fi/string-length/
  - https://unicode.org/reports/tr29/
  - rust-analyzer/rowan#17
  - https://www.reddit.com/r/rust/comments/gpw2ra/how_is_the_rust_compiler_able_to_tell_the_visible/

DEV-10935
@shilangyu
Copy link

FYI: thanks to @azdavis line-index has been extracted to a crate, so you can use it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants