Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not trim whitespace when part of strings or regexes #198164

Merged
merged 7 commits into from Mar 16, 2024

Conversation

333fred
Copy link
Contributor

@333fred 333fred commented Nov 14, 2023

Fixes #195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important.

Before I mark this as ready to review, I need some help with how to approach testing this. Presumably, tests need to go in trimTrailingWhitespaceCommand.test.ts, but I'm going to need a language in the unit test suite with standard token types. Is there an example of this that isn't creating an entire language provider just for the test? Every example of testTextModel.createTextModel that I can see that has a language makes an entire custom one just for the test; if this is the only approach, I could use someone to point out what things I'll need to mock to test this. I would also like to add some integration tests, particularly around whitespace after the end a string, to ensure that it works with real tokenization as well.

The other open question I have is whether this behavior should be on by default, or if we need to update the trim whitespace option to have a "trim except in strings and regexes" option, as this behavior may not be desired by some?

Fixes #195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important.
@333fred
Copy link
Contributor Author

333fred commented Nov 14, 2023

@hediet, since you're the assignee on #195010, would you be able to comment on my questions above?

@coreyward
Copy link

Not sure how viable this is, but I actually prefer trailing whitespace in multiline strings to be trimmed most of the time (in JS), but in some cases it's not desirable (specifically for tests). Perhaps there is a way to alter this behavior just for test files?

@333fred
Copy link
Contributor Author

333fred commented Nov 15, 2023

@hediet or @alexdima, thoughts on my questions? Given the comment from @coreyward on just this draft PR, I'm definitely leaning towards "needs an option switch". Personally, I do not want whitespace trimming in a multiline string, ever, in any file, regardless of whether it's a test file or not. It could be a regex component, or part of a piece of markdown where trailing matters.

@hediet
Copy link
Member

hediet commented Nov 17, 2023

I think this should be behind an opt-in flag.
Also, I think when this flag is enabled, trimming whitespace should wait on the tokenization to complete.

1. Add a new setting to control whether to trim whitespace in multiline strings/regexes. This setting is then piped through to everywhere that is calling trim whitespace, which is file save editing, the trim whitespace command, and notebooks.
2. Force line tokenization to complete when the setting is enabled so that trim is accurate.
@333fred
Copy link
Contributor Author

333fred commented Nov 17, 2023

@hediet I think I've plumbed through a setting for this correctly, but I could still use some pointers on how to test this appropriately.

@333fred
Copy link
Contributor Author

333fred commented Nov 27, 2023

Talked with @hediet offline. With the end of year coming up there isn't a lot of bandwidth to look at a change like this at the moment, given the potential costs of tokenization for the file. I'll ping again in January.

…or a given line and do not format if there are none.
@hediet
Copy link
Member

hediet commented Jan 8, 2024

For testing, please see existing unit tests for tokenization. Basically you should create a fake language with a static tokenizer where you set tokens manually. Then look for tests that test TrimTrailingWhitespaceCommand (or other commands) and try to run the command on the text model with the fake language.

@alexdima alexdima marked this pull request as ready for review March 15, 2024 22:29
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alexdima alexdima added this to the March 2024 milestone Mar 15, 2024
@alexdima alexdima enabled auto-merge (squash) March 15, 2024 22:30
@alexdima alexdima merged commit c7578a3 into microsoft:main Mar 16, 2024
6 checks passed
chen-ky pushed a commit to chen-ky/vscode that referenced this pull request Mar 18, 2024
)

* Do not trim whitespace when part of strings or regexes

Fixes microsoft#195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important.

* Address initial feedback:

1. Add a new setting to control whether to trim whitespace in multiline strings/regexes. This setting is then piped through to everywhere that is calling trim whitespace, which is file save editing, the trim whitespace command, and notebooks.
2. Force line tokenization to complete when the setting is enabled so that trim is accurate.

* Don't force tokenization; instead, check to see if there are tokens for a given line and do not format if there are none.

* Look for syntactical tokens

* Fix compilation errors

* Add a test

---------

Co-authored-by: Alex Dima <alexdima@microsoft.com>
@333fred
Copy link
Contributor Author

333fred commented Mar 18, 2024

@alexdima thanks for picking this up! Really appreciate it, I've been very busy with other things and was unlikely to get back to this for another few months.

@333fred 333fred deleted the trimwhitespace-strings-and-regexes branch March 18, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trim Trailing Whitespace shouldn't affect multiline strings
5 participants