New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not trim whitespace when part of strings or regexes #198164
Do not trim whitespace when part of strings or regexes #198164
Conversation
Fixes #195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important.
Not sure how viable this is, but I actually prefer trailing whitespace in multiline strings to be trimmed most of the time (in JS), but in some cases it's not desirable (specifically for tests). Perhaps there is a way to alter this behavior just for test files? |
@hediet or @alexdima, thoughts on my questions? Given the comment from @coreyward on just this draft PR, I'm definitely leaning towards "needs an option switch". Personally, I do not want whitespace trimming in a multiline string, ever, in any file, regardless of whether it's a test file or not. It could be a regex component, or part of a piece of markdown where trailing |
I think this should be behind an opt-in flag. |
1. Add a new setting to control whether to trim whitespace in multiline strings/regexes. This setting is then piped through to everywhere that is calling trim whitespace, which is file save editing, the trim whitespace command, and notebooks. 2. Force line tokenization to complete when the setting is enabled so that trim is accurate.
@hediet I think I've plumbed through a setting for this correctly, but I could still use some pointers on how to test this appropriately. |
Talked with @hediet offline. With the end of year coming up there isn't a lot of bandwidth to look at a change like this at the moment, given the potential costs of tokenization for the file. I'll ping again in January. |
…or a given line and do not format if there are none.
For testing, please see existing unit tests for tokenization. Basically you should create a fake language with a static tokenizer where you set tokens manually. Then look for tests that test TrimTrailingWhitespaceCommand (or other commands) and try to run the command on the text model with the fake language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
) * Do not trim whitespace when part of strings or regexes Fixes microsoft#195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important. * Address initial feedback: 1. Add a new setting to control whether to trim whitespace in multiline strings/regexes. This setting is then piped through to everywhere that is calling trim whitespace, which is file save editing, the trim whitespace command, and notebooks. 2. Force line tokenization to complete when the setting is enabled so that trim is accurate. * Don't force tokenization; instead, check to see if there are tokens for a given line and do not format if there are none. * Look for syntactical tokens * Fix compilation errors * Add a test --------- Co-authored-by: Alex Dima <alexdima@microsoft.com>
@alexdima thanks for picking this up! Really appreciate it, I've been very busy with other things and was unlikely to get back to this for another few months. |
Fixes #195010. If the token to be removed is part of a string or a regex, then we do not want to remove the character, as it is very likely semantically important.
Before I mark this as ready to review, I need some help with how to approach testing this. Presumably, tests need to go in
trimTrailingWhitespaceCommand.test.ts
, but I'm going to need a language in the unit test suite with standard token types. Is there an example of this that isn't creating an entire language provider just for the test? Every example oftestTextModel.createTextModel
that I can see that has a language makes an entire custom one just for the test; if this is the only approach, I could use someone to point out what things I'll need to mock to test this. I would also like to add some integration tests, particularly around whitespace after the end a string, to ensure that it works with real tokenization as well.The other open question I have is whether this behavior should be on by default, or if we need to update the trim whitespace option to have a "trim except in strings and regexes" option, as this behavior may not be desired by some?