Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[task] Provide facility (lsp-get-errors-stats-for path-or-dir) which returns in constant time #2178

Closed
yyoncho opened this issue Sep 14, 2020 · 16 comments · Fixed by #2410
Closed
Assignees

Comments

@yyoncho
Copy link
Member

yyoncho commented Sep 14, 2020

We need that for lsp-treemacs-errors-list, modeline, lsp-headerline, treemacs in tree error display. We should be able to answer one more query - give me all files under PATH which have warnings.

This could be achieved by building pre-calculated results when diagnostics are received from the server. The simplest solution seems to be to have a hash-table with path as a key and diagnostics stats as a value. When new diagnostics is received we whether errors/warnings/infos are increased/decreased compared to the current value and then applying that to the cache.

We should be able to answer one more query - give me all files under PATH which have warnings. This means that maybe it will be better to have a tree to store the errors.

@leungbk
Copy link
Member

leungbk commented Sep 15, 2020

Are diagnostics sent only when we're accessing a particular file? If module A depends on module B, and we delete some exported function (used by A) in module B without introducing any new errors in module B, then we won't necessarily know about module A's new errors until we access A.

@yyoncho
Copy link
Member Author

yyoncho commented Sep 15, 2020

Are diagnostics sent only when we're accessing a particular file?

Most of the servers send messages for all files in the project. I think that what you are describing might happen in ccls/clangd but we have to live with it (I believe that ccls/clangd users are expecting it).

@leungbk
Copy link
Member

leungbk commented Nov 1, 2020

We need that for lsp-treemacs-errors-list

The runtime of the current implementation of lsp-treemacs-errors-list appears to depend linearly on the number of files reported in the publishDiagnostics notification. Wouldn't it be slower to have to maintain a cache like what you describe?

Given the following project structure:

my-project
--> src/
----> file1.c (*)
----> file2.c
----> subdir/
------> nested_file.c (*)

where (*) denotes files for which we've received new diagnostics.

The most obvious way of updating the cache would be to update all the ancestor directories for each (*) file. For most nodes, we can abort the ancestor-wise navigation quickly since the diagnostics won't have changed. The height of the file tree is typically a small constant, but it's still greater than 1. In the best case, this is still slower than what lsp-treemacs-errors-list (linear in the number of files for which we are receiving diagnostics) is doing now. And in the worst case, a single change could force updates in diagnostics for multiple files.

We could alternatively do a topological sort to prevent the ancestor nodes from being traversed more often than needed, though I wouldn't expect this to be much faster, if at all, since project file trees are typically not very deep.

I can see how lsp-headerline would make use of this. What other end-user features would this cache enable?

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

The runtime of the current implementation of lsp-treemacs-errors-list appears
to depend linearly on the number of files reported in the publishDiagnostics
notification. Wouldn't it be slower to have to maintain a cache like what you
describe?

Why do you think it will be slow?

In the best case, this is still slower than what lsp-treemacs-errors-list (linear in
the number of files for which we are receiving diagnostics) is doing now.

lsp-treemacs-errors-list is linear to the number of files with errors. Now consider that we have 400 files with errors. That means that we will have to recalculate the tree each time we receive a diagnostics change. Also, the implementation of lsp-treemacs-errors-list is slow, because it has slow operations due to the fact that it works on much more elements and has a lot of string operations.

On the other hand, the proposed implementation in bug description has only fast operations or operations working on relatively small set of elements.

I can see how lsp-headerline would make use of this. What other end-user
features would this cache enable?

Headerline displays path segments, we may underline them depending on whether they have errors/warnings/infos.

We could alternatively do a topological sort to prevent the ancestor nodes from
being traversed more often than needed, though I wouldn't expect this to be much
faster, if at all, since project file trees are typically not very deep.

this will most likely result in speedup but again - for modeline/headerline we need implementation which is close to being constant. Even for much simpler caclulations like the one we do for lsp-diagnostics-modeline-mode we had to perform caching otherwise it is too slow.

@leungbk
Copy link
Member

leungbk commented Nov 1, 2020

Why do you think it will be slow?

I explained my reasoning above.

After receiving a series of publishDiagnostics notifications, I suggested that we could either

a) for each diagnostic, navigate parent-wise in the tree defined by the cache if there is a change in the number of diagnostics, or skip parent-wise navigation if the number of diagnostics for that file is unchanged.

In the best case, when no file has its diagnostics change, this still takes time linear in the number of files for which we have received diagnostics. In the worst case, we receive diagnostics for a bunch of files and, for each of these files, need to update parents until reaching the root.

Of course, I may be assuming something incorrect about the server behavior.

An alternative approach is

b) topologically sort the file tree so that we don't have to traverse ancestor nodes more often than needed, which we have to do with a). I don't know if this is necessarily better in practice given the shallow file trees.

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

Hm, I am not sure we are on the same page. The proposed algorithm is:

  1. Diagnostics received, e. g. 1 error 2 warnings 0 info for file /foo/bar/baz
  2. We count them and represent them as a vector like this: [1 2 0 0]
  3. We calculate the difference compared them to the vector from the previous report, lets assume it is [3 1 0 0]
  4. The vector we have will now be [-1 1 0 0] (the difference from the previous state).
  5. We update the cache under key "/foo/bar/baz" by adding the vector from 4(it can be a replacement for the leaves)
  6. We update the cache under key "/foo/bar/" by adding the vector from 4
  7. Ditto for "/foo/" and "/"

I would say that this will be faster than json deserialization.

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

Just want to mention that ATM the work that lsp-treemacs-errors-list do is not N but it is MxN where M is the number of updates. This is true for each of the components for which we calculate the diagnostics count. At the same time the proposed solution should be relatively cheap

@leungbk
Copy link
Member

leungbk commented Nov 1, 2020

Hm, I am not sure we are on the same page. The proposed algorithm is:

It sounds like we have been in agreement on what the algorithm is. But is this not still proportional to the number of files that have just received new diagnostic info (but not necessarily different from before) from the server? Even for those files for which the change in diagnostic info is [0 0 0 0], we still have to count the reported errors for that file to make that conclusion.

I would say that this will be faster than json deserialization.

We always have to parse json to determine the error counts coming in from the server. Is there some other json-handling step that caching would allow us to bypass?

Just want to mention that ATM the work that lsp-treemacs-errors-list do is not N but it is MxN where M is the number of updates.

Is this because the :render-action gets called once for each error in each file?

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

Even for those files for which the change in diagnostic info is [0 0 0 0], we still have to count the reported errors for that file to make that conclusion

Yes. Iterating a list + increasing numbers is a fast operation so that should be fine. Also, this operation will be insignificant compared to json parsing.

We always have to parse json to determine the error counts coming in from the server. Is there some other json-handling step that caching would allow us to bypass?

My point is that if something is (much) faster than json parsing then we should be fine regarding the performance.

Is this because the :render-action gets called once for each error in each file?

This is caused by the fact that we re-render the whole tree each time we receive diagnostic(there is a debounce but still...). Also, for the headerline it is not fine at all to perform those calculations on the fly.

@leungbk
Copy link
Member

leungbk commented Nov 1, 2020

My point is that if something is (much) faster than json parsing then we should be fine regarding the performance.

Yes, but don't we have to parse the incoming diagnostics json (to count the errors) whether or not we pursue the caching approach? Or is there some other parsing that we can skip as a result of caching?

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

Yes, but don't we have to parse the incoming diagnostics json (to count the errors) whether or not we pursue the caching approach?

Yes, we will parse it. What I am saying is that if parsing takes 10ms then it is fine to perform an additional operation that takes 1ms. The goal is to avoid recounting all diags over and over again when we receive a new one.

@leungbk
Copy link
Member

leungbk commented Nov 1, 2020

Yes, we will parse it. What I am saying is that if parsing takes 10ms then it is fine to perform an additional operation that takes 1ms. The goal is to avoid recounting all diags over and over again when we receive a new one.

My point is that it appears we have to do an equal amount of json parsing either way: even for files that were unchanged since the previous publishDiagnostics notification, we have to parse the json before we determine its delta to be [0 0 0 0] and conclude that we can skip the parent-wise traversal. We would end up repeating this parsing-and-maybe-parent-wise-traversal for all the files for which we've received new (but not necessarily different) diagnostics, since servers send diagnostics for all files.

@yyoncho
Copy link
Member Author

yyoncho commented Nov 1, 2020

Let's benchmark if the proposed algorithm is fast enough.

@yyoncho
Copy link
Member Author

yyoncho commented Dec 1, 2020

@leungbk I am seeing that you have assigned this bug to you, are you working on that?

@leungbk leungbk removed their assignment Dec 1, 2020
@leungbk
Copy link
Member

leungbk commented Dec 1, 2020

Sorry, I haven't been working on it much; I had a look at Treemacs' API a few weeks ago, and then started wondering about the internals, including how Treemacs nodes can be cached to avoid re-rendering. It seems to me that the caching that lsp-mode would have to do would be implemented as you've stated in this thread, but without simultaneously rewriting the relevant parts of lsp-treemacs, we'd end up seeing a drop in performance.

I was planning on looking into it more this weekend, but if someone else wants to work on it, that's fine.

@yyoncho
Copy link
Member Author

yyoncho commented Dec 1, 2020

No worries, I just wanted to know if the issue is available.

@yyoncho yyoncho self-assigned this Dec 13, 2020
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 15, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit to yyoncho/lsp-mode that referenced this issue Dec 18, 2020
Fixes emacs-lsp#2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of emacs-lsp#2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in emacs-lsp#2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
yyoncho added a commit that referenced this issue Dec 19, 2020
Fixes #2178

- implemented `(lsp-diagnostics-stats-for path)` method as part of #2178

- Performed replacement of `format` with `concat` and caching on a few places in
`lsp-headerline.el` to speedup the code(we need that since it is on the hot path).

- as per discussion in #2178 I performed tests and they show that we handle 1000
diagnostics per file within 1ms. We could still optimize if we cache the stats
per file per workspace to avoid recounting them when updating `lsp-diagnostics-stats`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants