Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicated code format #126183

Merged
merged 2 commits into from Jun 23, 2021
Merged

Conversation

qchateau
Copy link
Contributor

When formatting multiple ranges, there may be duplicated TextEdits.
Make sure to dedupe them to avoid encountering the error
"Overlapping ranges are not allowed!"

@jrieken
Copy link
Member

jrieken commented Jun 15, 2021

When formatting multiple ranges, there may be duplicated TextEdits.

Who says that? Isn't that a bug of the formatter and we shouldn't cover up?

@jrieken jrieken added the info-needed Issue requires more information from poster label Jun 15, 2021
@qchateau
Copy link
Contributor Author

The formatting provider is given the ranges one by one, so it process each one without any knowledge of the list of ranges. In some cases, it makes sense for the formatter to provide an edit spans further than the input range to avoid awkward formatting.
Problems arise when there 2 of the input ranges are close to each other, and when the resulting edit for each one includes the other. That is totally fine as long as both edits are identical.

See this example, some basic C++, formatting provider is clangd:

#include <vector>

int main(int, char**)
{
    std::vector<int> v = {
                1, // A
            2, // B
                 3, // C
    };
    return 0;
}

I'll use the comments A, B and C to reference lines.
For reference, clangd wants to format like this (not lines A, B and C changed):

#include <vector>

int main(int, char**)
{
    std::vector<int> v = {
        1, // A
        2, // B
        3, // C
    };
    return 0;
}

Let's say I use git and I modified lines A and C, I use "Format modified lines". VSCode will ask clangd to format 2 ranges independently: one is line A, and the second is line C. For both, clangd will format lines A, B and C because it considers that these 3 lines form a single block that should be indented consistently. This results in 2 identical edits. Both what clangd and the user expect is pretty trivial, yet vscode will reject the formatting.

Basically you have to make a choice, either you:

  • do nothing
    • users (me) cannot format the document at all, and are spammed with pop-ups
  • require formatters to be strict about range
    • users will be mad because they end up with an awkward formatting, which may be even worse than no formatting at all
  • allow and merge strictly identical edits
    • users are happy and will send you love

Note that I draw a line as to what should be allowed: edits must be strictly identical. It's not up to the IDE to fix an inconsistent formatter.

@jrieken
Copy link
Member

jrieken commented Jun 16, 2021

users are happy and will send you love

That's a goal we all strive for but I think you meant "I am happy" because cases where edits overlap but aren't identical aren't covered. Different alternatives:

  1. Add new API like DocumentRange_S_FormattingEditProvider which is always called with an array of ranges to format. This would be the best solution because all the control and power is with extensions. However, it will be a new API and it is unlikely that extensions adopt this and we still stuck with how to use the other, single range, API

  2. A different idea is this: Check for what ranges the returned edits overlap, then discard those edits, merge the ranges into a single, bigger range, and ask for edits again. In your sample above that would mean:

    1. make format request for line A, make format request for line C
    2. detect overlap between the edits for A and C
    3. merge the ranges into a single range A,B,C and ask again for edits
    4. repeat until edits are free of overlap or when only a single range is left

I believe that would work and is a more complete and pragmatic approach to the problem

@jrieken jrieken added formatting Source formatter issues and removed info-needed Issue requires more information from poster labels Jun 16, 2021
@qchateau
Copy link
Contributor Author

I like number 2., I'll implement that when I find some time

@qchateau
Copy link
Contributor Author

Here's my take on this

model.getFormattingOptions(),
cts.token
);
return (await workerService.computeMoreMinimalEdits(model.uri, rawEdits)) || [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeMoreMinimalEdits could be costly to compute for very large documents (at least the prettier formatter always replaces the entire document).

Maybe you should check if the token is still alive after the provideDocumentRangeFormattingEdits call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added token check and called computeMoreMinimalEdits only at the end

@jrieken jrieken added this to the June 2021 milestone Jun 21, 2021
Comment on lines +207 to +208
if (hasIntersectingEdit(rawEditsList[i], rawEditsList[j])) {
// Merge ranges i and j into a single range, recompute the associated edits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 1000 modified ranges and the formatter reports 1000 edits for each format range request (which at least prettier does sometimes even for simple format range requests), you run 1000^4 operations here.

I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also compute more minimal edits when you are done. There should be a guarantee that computing more minimal edits doesn't out grow the range (and therefore doesn't create overlap). Usually, extensions compute very few edits and making them more minimal often creates many, many edits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now compute minimal edits only at the end.
I also added a quick exit to the hasIntersectingEdit function.

I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).

It's not that easy, not only do I need to know that 2 edits intersect, but I also need to know which 2 ranges are at the original of these, and the list of all edits produced by these 2 ranges so I can merged them.
So I need to iterate all combination of ranges in any case. That's already O(n^2). Then I need to know if the edits intersect. That's O(m^2). I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).
I don't think I can do much better while retaining correctness and completeness, at least not at the cost of an overly complicated algorithm (usefulness vs maintainability).

Now that minimal edits are only computed at the end, I expect the problem size to stay rather small, which means complexity should not be an issue that much.
If you're worried I still have a few tricks:

  1. Quick exit on hasIntersectingEdits, brings down O(m^2) down to O(m) in most cases (not really O then, agreed), worst case is not affected, but we improve most use cases while retaining correctness. (already implemented)
  2. Assume only consecutive ranges can produce intersecting edits: although not guaranteed, I don't see any case where a sane formatter would not respect this. That would bring O(n^2) down to O(n) at the cost of potentially not merging 2 ranges...which would be just as good as what we're doing before this PR
  3. Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits. Again, strictly not worse than the current state of things. The problem with this is finding the correct threshold.

Let me know what you think, or if you have any other ideas

Copy link
Member

@hediet hediet Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).

Yes, if the number of edits equals the number of modified lines, we have n = m. Can you try out your algorithm on checker.ts when you edit roughly 10% of lines? Can you compare the time with how long it takes without this change (though I guess just formatting a single range in that file with prettier might be already slow)?

Quick exit on hasIntersectingEdits, brings down O(m^2) down to O(m) in most cases

That is a good idea!

Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits.

I think we should do this if it becomes slow for really large files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy trick would be to count of often we have spliced and merged ranges and simply stop in case a threshold has been reached. Tho, I would suggest we wait for that bug to be reported. I am happy with the changes and we can still improve it later.

@jrieken jrieken merged commit 7a06a53 into microsoft:main Jun 23, 2021
@hediet
Copy link
Member

hediet commented Jun 23, 2021

@qchateau Thanks for contributing this PR! ❤️ It also fixes an issue of the prettier extension when formatting only modified lines.

@jrieken
Copy link
Member

jrieken commented Jun 23, 2021

Yes, many thanks to @qchateau. You creating the PR was a lucky coincidence because @hediet and me had brainstormed about this exact problem a day earlier.

@qchateau
Copy link
Contributor Author

Nice to see we all have the same problems :D

@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
formatting Source formatter issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants