feat: add custom commands for piping and unpiping text #515

polvalente · 2021-03-16T06:24:54Z

This PR aims to bring the functionality developed here to ElixirLS as a custom language server command, as suggested by @axelson.

~~As of opening of this PR, this is still a work in progress so implementation details can be discussed~~

axelson

I'm excited for this change. I think it will be a helpful refactoring to support.

In terms of code structure can you extract a "pure" functional module from ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes to handle the core piping and unpiping logic? I guess this might result in an even longer name which is a bit awkward but is probably fine for now. The separate module will make adding more test cases easier because they'll require less setup.

Also I believe it should be possible to implement this without a regex scan, which should help the resiliency of this code. Maybe @lukaszsamson has some specific ideas on that front.

polvalente · 2021-03-17T19:49:03Z

I'm excited for this change. I think it will be a helpful refactoring to support.

In terms of code structure can you extract a "pure" functional module from ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes to handle the core piping and unpiping logic? I guess this might result in an even longer name which is a bit awkward but is probably fine for now. The separate module will make adding more test cases easier because they'll require less setup.

Also I believe it should be possible to implement this without a regex scan, which should help the resiliency of this code. Maybe @lukaszsamson has some specific ideas on that front.

Sure, I'll split the module further. Should I leave that as a submodule of the command?

yashin5 · 2021-03-17T19:54:12Z

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes.ex

+        fn current_char, remaining_text, current_line, current_col, acc ->
+          if current_line == line and current_col == col do
+            {:ok, function_call, call_range} =
+              get_function_call(line, col, acc.walked_text, current_char, remaining_text)
+
+            {remaining_text,
+             %{
+               acc
+               | walked_text: acc.walked_text <> current_char,
+                 function_call: function_call,
+                 range: call_range
+             }}
+          else
+            {remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}
+          end


Suggested change

fn current_char, remaining_text, current_line, current_col, acc ->

if current_line == line and current_col == col do

{:ok, function_call, call_range} =

get_function_call(line, col, acc.walked_text, current_char, remaining_text)

{remaining_text,

%{

acc

| walked_text: acc.walked_text <> current_char,

function_call: function_call,

range: call_range

}}

else

{remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}

end

fn current_char, remaining_text, ^line, ^col, acc ->

{:ok, function_call, call_range} =

get_function_call(line, col, acc.walked_text, current_char, remaining_text)

{remaining_text,

%{

acc

| walked_text: acc.walked_text <> current_char,

function_call: function_call,

range: call_range

}}

current_char, remaining_text, _current_line, _current_col, acc ->

{remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}

end

polvalente · 2021-03-18T03:10:59Z

@axelson I've split the module into one which handles the AST formatting and another which handles the string parsing and the command interface. Should I also move the string parsing to a separate module? IMO it's good to have a kind of "integration test" so I wouldn't remove any tests anyway.

I've also managed to refactor the code so Regex.scan could be removed

polvalente · 2021-03-18T03:14:21Z

apps/language_server/test/providers/execute_command/manipulate_pipes/ast_test.exs

@@ -0,0 +1,159 @@
+defmodule ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes.ASTTest do


Most of these test I imported from my original repo, so if any of these tests feel redundant, feel free to make suggestions for me to remove them!

polvalente · 2021-03-18T03:17:24Z

Finally, there are some tests failing in earlier versions of Elixir but still yielding results that make sense. Should I include an assert X or Y in those, or maybe make the assertion depend on the Elixir version somehow?

axelson

Finally, there are some tests failing in earlier versions of Elixir but still yielding results that make sense. Should I include an assert X or Y in those, or maybe make the assertion depend on the Elixir version somehow?

I put some details in a comment, but I think we should change the expected result based on the elixir version

Running out of time for this review, but I wanted to send what I have. Will take another look in the future.

apps/language_server/test/providers/execute_command/manipulate_pipes/ast_test.exs

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes.ex

apps/language_server/.tool-versions

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes.ex

Co-authored-by: Łukasz Samson <lukaszsamson@gmail.com>

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes/ast.ex

apps/language_server/test/providers/execute_command/manipulate_pipes/ast_test.exs

apps/language_server/test/providers/execute_command/manipulate_pipes_test.exs

…te/elixir-ls into feat/add-custom-piping-commands

lukaszsamson · 2021-03-21T06:33:34Z

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes.ex

+    # line and col are assumed to be 0-indexed
+    source_file = Server.get_source_file(state, uri)
+
+    {:ok, %{edited_text: edited_text, edit_range: edit_range}} =


Since you are using the column parameter to get the correct ranges you need to convert it from UTF16 to UTF8 index. We do this e.g. in

elixir-ls/apps/language_server/lib/language_server/source_file.ex

Line 199 in 794ed49

beginning_utf8 =

@lukaszsamson I've applied all requested changes. I've looked at this code, but I don't completely follow how to apply it to my PR. Could you perhaps add a suggestion to where I should add this?

I tried adding this to the beggining of the pipeline, but the cursor stopped pointing at the correct character and thus the code broke

I've added a test that breaks on 2-byte characters for the "from_pipe" command, so there's that.
However, I'm still kinda lost on how to deal with the initial offsetting and how to convert the output ranges correctly so I can work with purely utf-16 binaries or something like that

One thing I looked into is that ElixirSense's walk_text uses String.next_grapheme to recurse through the binary, and that ends up dealing with multi-byte characters.

iex(8)> "olá\r\nç\nç" |> String.graphemes ["o", "l", "á", "\r\n", "ç", "\n", "ç"]

I think that if I solve the issue on my custom recursions the code will yield the expected ranges. Any thoughts on this?

Yes that part of LSP/Elixir integration is really messy. To give you an example suppose you have range

{"line": 1, character: 2} to {line: 6, character: 5}

Character indices are codepoints in UTF16 encoding but elixir String are in UTF8 (and graphemes are a different beast as you noticed)
To get the source slice range right you need to

split it into lines

convert first (1) and last (6) line into UTF16

take slice [2 * 16..] from first line and convert it back into UTF8

append lines 2-5 (no need to convert here)

take slice [0..5 * 16] from last line and convert it back into UTF8

But let's leave it for another PR. There are more places in the codebase where it's not addressed properly.

@lukaszsamson I am curious about this necessity...

It seems to me that upon starting, the LS server changes the encoding of the user process to latin (which is a subset of UTF16 if I am not mistaken). It "should" adapt properly, should it not?

Sorry for the noise here :)

@victorolinasc
The reason we switch to latin1 is to use binary mode on stdio instead of the default character mode (which is charset dependant). We just send and receive bytes on the wire. The LSP protocol messages are UTF8 encoded as the spec requires but thats on another layer (see https://microsoft.github.io/language-server-protocol/specifications/specification-current/#baseProtocol). The indices used in LSP requests and responses use character counts in UTF16 (see https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocuments) but the files themselves are not transferred over LSP and read directly by language server. Thus we end up with UTF8 encoded elixir binaries that are indexed by UTF16 character counts. Everything works fine as long as we stay inside ASCII range.

Some other relevant links:

feature request for LSP to support UTF-8: Change character units from UTF-16 code unit to Unicode codepoint microsoft/language-server-protocol#376

UTF-8 extension supported by rust-analyzer and clangd: https://clangd.llvm.org/extensions.html#utf-8-offsets

apps/language_server/test/providers/execute_command/manipulate_pipes_test.exs

…ipulate_pipes_test.exs Co-authored-by: Łukasz Samson <lukaszsamson@gmail.com>

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes/ast.ex

lukaszsamson · 2021-03-26T10:35:09Z

apps/language_server/lib/language_server/providers/execute_command/manipulate_pipes.ex

+    do: <<?\r::utf16, ?\n::utf16, acc::bitstring>>
+
+  defp do_get_pipe_call(<<0, c::utf8, _::bitstring>>, {acc, true, _})
+       when c in [?\t, ?\v, ?\r, ?\n, ?\s],


is anyone still using vertical tabs?

I've never encountered one in the wild haha

lukaszsamson · 2021-03-26T10:46:34Z

Great job @polvalente, it's going to be a nice addition.

polvalente · 2021-03-26T11:31:05Z

Great job @polvalente, it's going to be a nice addition.

Thank you! I learned a lot with this PR and the reviews were great!

axelson

Looks good 🙌!

Excited to try this out ❤️

Now it would be good to add this command to vscode-elixir-ls (PR will probably look similar to elixir-lsp/vscode-elixir-ls#176)

polvalente · 2021-04-01T18:08:35Z

Looks good !

Excited to try this out

Now it would be good to add this command to vscode-elixir-ls (PR will probably look similar to elixir-lsp/vscode-elixir-ls#176)

@axelson If no one tackles this before me, I might be able to open the pull request later in the weekend!

polvalente added 3 commits March 16, 2021 03:21

feat: add basic to_pipe command core

f0b402b

refactor: rework text range finding algorithm

79d9b4c

feat: make to_pipe/1 work correctly

311335c

axelson reviewed Mar 17, 2021

View reviewed changes

yashin5 reviewed Mar 17, 2021

View reviewed changes

polvalente added 4 commits March 17, 2021 22:42

refactor: move core ast code to separate module

f2030b4

feat: add from_pipe core implementation

9a04b1b

feat: working implementation of from_pipe

28e2338

refactor: remove regex scan

cf091df

polvalente marked this pull request as ready for review March 18, 2021 03:09

polvalente requested a review from axelson March 18, 2021 03:09

polvalente commented Mar 18, 2021

View reviewed changes

axelson reviewed Mar 18, 2021

View reviewed changes

polvalente added 3 commits March 18, 2021 22:44

chore: comply with code review

9810618

test: make tests pass in older versions

1452b1c

fix: make test use correct elixir syntax

3ac9e8c