Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom commands for piping and unpiping text #515

Merged
merged 26 commits into from
Apr 1, 2021

Conversation

polvalente
Copy link
Contributor

@polvalente polvalente commented Mar 16, 2021

This PR aims to bring the functionality developed here to ElixirLS as a custom language server command, as suggested by @axelson.

As of opening of this PR, this is still a work in progress so implementation details can be discussed

Copy link
Member

@axelson axelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited for this change. I think it will be a helpful refactoring to support.

In terms of code structure can you extract a "pure" functional module from ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes to handle the core piping and unpiping logic? I guess this might result in an even longer name which is a bit awkward but is probably fine for now. The separate module will make adding more test cases easier because they'll require less setup.

Also I believe it should be possible to implement this without a regex scan, which should help the resiliency of this code. Maybe @lukaszsamson has some specific ideas on that front.

@polvalente
Copy link
Contributor Author

I'm excited for this change. I think it will be a helpful refactoring to support.

In terms of code structure can you extract a "pure" functional module from ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes to handle the core piping and unpiping logic? I guess this might result in an even longer name which is a bit awkward but is probably fine for now. The separate module will make adding more test cases easier because they'll require less setup.

Also I believe it should be possible to implement this without a regex scan, which should help the resiliency of this code. Maybe @lukaszsamson has some specific ideas on that front.

Sure, I'll split the module further. Should I leave that as a submodule of the command?

Comment on lines 58 to 72
fn current_char, remaining_text, current_line, current_col, acc ->
if current_line == line and current_col == col do
{:ok, function_call, call_range} =
get_function_call(line, col, acc.walked_text, current_char, remaining_text)

{remaining_text,
%{
acc
| walked_text: acc.walked_text <> current_char,
function_call: function_call,
range: call_range
}}
else
{remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn current_char, remaining_text, current_line, current_col, acc ->
if current_line == line and current_col == col do
{:ok, function_call, call_range} =
get_function_call(line, col, acc.walked_text, current_char, remaining_text)
{remaining_text,
%{
acc
| walked_text: acc.walked_text <> current_char,
function_call: function_call,
range: call_range
}}
else
{remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}
end
fn current_char, remaining_text, ^line, ^col, acc ->
{:ok, function_call, call_range} =
get_function_call(line, col, acc.walked_text, current_char, remaining_text)
{remaining_text,
%{
acc
| walked_text: acc.walked_text <> current_char,
function_call: function_call,
range: call_range
}}
current_char, remaining_text, _current_line, _current_col, acc ->
{remaining_text, %{acc | walked_text: acc.walked_text <> current_char}}
end

@polvalente polvalente marked this pull request as ready for review March 18, 2021 03:09
@polvalente polvalente requested a review from axelson March 18, 2021 03:09
@polvalente
Copy link
Contributor Author

@axelson I've split the module into one which handles the AST formatting and another which handles the string parsing and the command interface. Should I also move the string parsing to a separate module? IMO it's good to have a kind of "integration test" so I wouldn't remove any tests anyway.

I've also managed to refactor the code so Regex.scan could be removed

@@ -0,0 +1,159 @@
defmodule ElixirLS.LanguageServer.Providers.ExecuteCommand.ManipulatePipes.ASTTest do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these test I imported from my original repo, so if any of these tests feel redundant, feel free to make suggestions for me to remove them!

@polvalente
Copy link
Contributor Author

Finally, there are some tests failing in earlier versions of Elixir but still yielding results that make sense. Should I include an assert X or Y in those, or maybe make the assertion depend on the Elixir version somehow?

Copy link
Member

@axelson axelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, there are some tests failing in earlier versions of Elixir but still yielding results that make sense. Should I include an assert X or Y in those, or maybe make the assertion depend on the Elixir version somehow?

I put some details in a comment, but I think we should change the expected result based on the elixir version

Running out of time for this review, but I wanted to send what I have. Will take another look in the future.

Co-authored-by: Łukasz Samson <lukaszsamson@gmail.com>
# line and col are assumed to be 0-indexed
source_file = Server.get_source_file(state, uri)

{:ok, %{edited_text: edited_text, edit_range: edit_range}} =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using the column parameter to get the correct ranges you need to convert it from UTF16 to UTF8 index. We do this e.g. in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukaszsamson I've applied all requested changes. I've looked at this code, but I don't completely follow how to apply it to my PR. Could you perhaps add a suggestion to where I should add this?

I tried adding this to the beggining of the pipeline, but the cursor stopped pointing at the correct character and thus the code broke

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test that breaks on 2-byte characters for the "from_pipe" command, so there's that.
However, I'm still kinda lost on how to deal with the initial offsetting and how to convert the output ranges correctly so I can work with purely utf-16 binaries or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I looked into is that ElixirSense's walk_text uses String.next_grapheme to recurse through the binary, and that ends up dealing with multi-byte characters.

iex(8)> "olá\r\nç\nç" |> String.graphemes                                                                      
["o", "l", "á", "\r\n", "ç", "\n", "ç"]

I think that if I solve the issue on my custom recursions the code will yield the expected ranges. Any thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that part of LSP/Elixir integration is really messy. To give you an example suppose you have range

{"line": 1, character: 2} to {line: 6, character: 5}

Character indices are codepoints in UTF16 encoding but elixir String are in UTF8 (and graphemes are a different beast as you noticed)
To get the source slice range right you need to

  1. split it into lines
  2. convert first (1) and last (6) line into UTF16
  3. take slice [2 * 16..] from first line and convert it back into UTF8
  4. append lines 2-5 (no need to convert here)
  5. take slice [0..5 * 16] from last line and convert it back into UTF8

But let's leave it for another PR. There are more places in the codebase where it's not addressed properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukaszsamson I am curious about this necessity...

It seems to me that upon starting, the LS server changes the encoding of the user process to latin (which is a subset of UTF16 if I am not mistaken). It "should" adapt properly, should it not?

Sorry for the noise here :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorolinasc
The reason we switch to latin1 is to use binary mode on stdio instead of the default character mode (which is charset dependant). We just send and receive bytes on the wire. The LSP protocol messages are UTF8 encoded as the spec requires but thats on another layer (see https://microsoft.github.io/language-server-protocol/specifications/specification-current/#baseProtocol). The indices used in LSP requests and responses use character counts in UTF16 (see https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocuments) but the files themselves are not transferred over LSP and read directly by language server. Thus we end up with UTF8 encoded elixir binaries that are indexed by UTF16 character counts. Everything works fine as long as we stay inside ASCII range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other relevant links:

do: <<?\r::utf16, ?\n::utf16, acc::bitstring>>

defp do_get_pipe_call(<<0, c::utf8, _::bitstring>>, {acc, true, _})
when c in [?\t, ?\v, ?\r, ?\n, ?\s],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is anyone still using vertical tabs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never encountered one in the wild haha

@lukaszsamson
Copy link
Collaborator

Great job @polvalente, it's going to be a nice addition.

@polvalente
Copy link
Contributor Author

polvalente commented Mar 26, 2021

Great job @polvalente, it's going to be a nice addition.

Thank you! I learned a lot with this PR and the reviews were great!

@axelson axelson mentioned this pull request Apr 1, 2021
Copy link
Member

@axelson axelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🙌!

Excited to try this out ❤️

Now it would be good to add this command to vscode-elixir-ls (PR will probably look similar to elixir-lsp/vscode-elixir-ls#176)

@axelson axelson merged commit eb969ef into elixir-lsp:master Apr 1, 2021
@polvalente polvalente deleted the feat/add-custom-piping-commands branch April 1, 2021 18:07
@polvalente
Copy link
Contributor Author

Looks good !

Excited to try this out

Now it would be good to add this command to vscode-elixir-ls (PR will probably look similar to elixir-lsp/vscode-elixir-ls#176)

@axelson If no one tackles this before me, I might be able to open the pull request later in the weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants