-
Notifications
You must be signed in to change notification settings - Fork 11
Handle UTF-16 code unit offsets in file changes API #24
Conversation
This pulls in new dependencies and modifies public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Since we embed index base in Span type, we could also embed the 'text atom' there as well. Should we do that?
I think we should. It seems like it would be easy to go wrong otherwise. Also, the SpanAtom seems more like a property of the span or length or whatever, rather than of the change, so I think it would be good to tie those together.
The tricky bit is that conversion between those requires a source buffer - to change from 0-based to 1-based indexing we only need to add/subtract 1 but to translate column offset we need actual string of characters for which the range is defined.
We don't need to provide conversion methods as part of the type, they can be in the VFS or wherever.
Should I come up with more tests? Previously crashing UTF-16 test case now works and RLS using this patched vfs version doesn't crash and reformats succesfully the test case from rust-lang/rls#1104.
No, this looks fine.
Done! However, this seems like ergonomics loss - while this binds range length and span under a single type, this introduces more type-level machinery and makes the actual types more verbose; also we can't process now series of changes with different Maybe it'd make more sense to implement an enum ChangeText with those two USV/UTF-16 variants instead? This wouldn't infect outer types and corresponding function signatures with another generic parameter. @nrc what do you think? |
This sounds like a good compromise to me |
62ee1b9
to
3fde048
Compare
Pushed a commit with second approach. Does this look good now? EDIT: For comparison, previous commit with type parameter is at Xanewok@62ee1b9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from the little nit about byte_in_str
Thanks! |
This adds the ability to change text with ranges defined in UTF-16 code units (per LSP spec, needed for the RLS).
Since we embed index base in
Span
type, we could also embed the 'text atom' there as well. Should we do that?The tricky bit is that conversion between those requires a source buffer - to change from 0-based to 1-based indexing we only need to add/subtract 1 but to translate column offset we need actual string of characters for which the range is defined.
Since we need the source buffer, I decided that it'd be best to implement that API on VFS directly, because it operates on the buffer directly.
Should I come up with more tests? Previously crashing UTF-16 test case now works and RLS using this patched vfs version doesn't crash and reformats succesfully the test case from rust-lang/rls#1104.