Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF-8 string manipulation functions to the lua stdlib #14281

Open
ronisbr opened this issue Apr 3, 2021 · 15 comments
Open

Add UTF-8 string manipulation functions to the lua stdlib #14281

ronisbr opened this issue Apr 3, 2021 · 15 comments
Labels
encoding enhancement feature request unicode 💩 (multibyte) unicode characters
Projects

Comments

@ronisbr
Copy link

ronisbr commented Apr 3, 2021

Since we are using Lua 5.1, we do not have any library to manage UTF-8 string. This can be a problem when creating lua scripts that manipulates the buffer with UTF-8 strings.

@ronisbr ronisbr added the enhancement feature request label Apr 3, 2021
@bfredl
Copy link
Member

bfredl commented Apr 3, 2021

we should expose select functions from mbyte.c . In particalar, i would be good for scripts to have direct access to neovim's specific idea of grapheme cluster (which is a bit limited but might get upgraded later).

@bfredl bfredl added encoding unicode 💩 (multibyte) unicode characters labels Apr 3, 2021
@ronisbr
Copy link
Author

ronisbr commented Apr 3, 2021

Nice! In the mean time, we can always clone and use for example https://github.com/Stepets/utf8.lua

@ronisbr
Copy link
Author

ronisbr commented Apr 3, 2021

Btw, it will be very nice to have access to the functions in mbyte, especially those related to the conversion between a UTF-8 char and the display size of it.

@clason
Copy link
Member

clason commented Apr 15, 2021

On a related note (assuming this library includes offset calculations etc.), it would be useful to be able to directly compute those for files that are not loaded (without the current bufload step which may trigger side effects).

Julian added a commit to Julian/lean.nvim that referenced this issue Jun 21, 2021
Also make the implementation via neovim API rather than normal mode.

I had hell before I realized the LSP server was speaking characters
but nvim_buf_set_text speaks bytes.

See neovim/neovim#14281 for the upstream issue that'd probably help.

Refs: #57
@clason clason added this to To do in Neovim API Jul 16, 2021
@clason clason removed this from To do in Neovim API Jul 16, 2021
@clason clason added this to To do in Neovim API Jul 16, 2021
@mjlbach
Copy link
Contributor

mjlbach commented Oct 30, 2021

@bfredl What do you think remains here with vim.str_utf_{pos,start,end} now in core?

@bfredl
Copy link
Member

bfredl commented Oct 31, 2021

We have added functions for codepoints. for some usecases, functions for composing chars (i e vims idea of grapheme clusters, the smallest editable unit) might be needed.

@clason
Copy link
Member

clason commented Nov 23, 2021

Related to the original request (not so much the current plan): https://github.com/uga-rosa/utf8.nvim

ajitid added a commit to ajitid/dotfiles-2023 that referenced this issue Jul 2, 2022
resources:
vim.str_utf_pos // utf 8

vim.fn.getline

vim.fn.setcursorcharpos
vim.fn.cursor

vim.str_utfindex

vim.str_utfindex // [32, 16]
vim.str_byteindex

clangd/clangd#3
neovim/neovim#14542
https://github.com/neovim/neovim/pull/16252/files

neovim/neovim#14281
ajitid added a commit to ajitid/dotfiles-2023 that referenced this issue Jul 2, 2022
resources:
vim.str_utf_pos // utf 8

vim.fn.getline

vim.fn.setcursorcharpos
vim.fn.cursor

vim.str_utfindex

vim.str_utfindex // [32, 16]
vim.str_byteindex

clangd/clangd#3
neovim/neovim#14542
https://github.com/neovim/neovim/pull/16252/files

neovim/neovim#14281


offsetEncoding is off spec and using it might break in future, 
see neovim/neovim#17049 (comment)
and hrsh7th/nvim-cmp#726 (comment)
microsoft/language-server-protocol@f9c85d5
@ronisbr
Copy link
Author

ronisbr commented Oct 20, 2023

Hi! Are there any plans to expose those functions anytime soon?

@clason
Copy link
Member

clason commented Oct 20, 2023

Are you volunteering to work on it?

@ronisbr
Copy link
Author

ronisbr commented Oct 20, 2023

Unfortunately, I cannot. I am just asking if there is any update since I do not follow Neovim development updates... Is there anything wrong about it?

The lack of UTF-8 support was the reason why this bug was opened: Vonr/align.nvim#15

That's why I decided to ping this issue.

For all my use cases, I bundled https://github.com/uga-rosa/utf8.nvim

@ronisbr
Copy link
Author

ronisbr commented Oct 20, 2023

On second thoughts, is there anything extremely wrong in bundling the code in https://github.com/uga-rosa/utf8.nvim inside Neovim? This can be used as the API is very similar to what we have in Lua 5.3. At a latter stage, we can replace those functions by low-level ones written in C.

If this is the case, I think I might help.

@clason
Copy link
Member

clason commented Oct 20, 2023

No, we will not simply bundle an external plugin. As bfredl explained, we need an API that exposes Neovim's own string handling capabilities. This is the only way of making sure that these functions play nicely with other Neovim APIs. And that requires thought and effort -- as much as other things that are higher priority currently.

@ronisbr
Copy link
Author

ronisbr commented Oct 20, 2023

Ok, so the answer to my question "Are there any plans to expose those functions anytime soon?" would be just "No, unfortunately, there are not".

@clason
Copy link
Member

clason commented Oct 20, 2023

I mean, this does not simply happen by itself, despite the suggestive impersonal phrasing. It will happen when someone who personally cares about this topic invests the time and effort to do it -- same as any open source development.

@ronisbr
Copy link
Author

ronisbr commented Oct 20, 2023

What is your point? I pretty much know how open-source development works. I did not demand anything from anyone. I just asked if anyone has any plans to close this issue. You really do not need to be harsh over one simple question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding enhancement feature request unicode 💩 (multibyte) unicode characters
Projects
Development

No branches or pull requests

4 participants