Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watched file globbing trouble #602

Closed
danp opened this issue Jan 26, 2021 · 26 comments
Closed

Watched file globbing trouble #602

danp opened this issue Jan 26, 2021 · 26 comments
Labels
emacs-bug Something to be solved mostly in Emacs

Comments

@danp
Copy link

danp commented Jan 26, 2021

eglot seems to be having the same issue with watched file globbing that I reported for lsp-mode in emacs-lsp/lsp-mode#1979.

A summary is that gopls uses patterns such as **/*.{go,mod,sum} which I don't thinkfile-expand-wildcards supports.

The repro for emacs-lsp/lsp-mode#1979 seems to apply for eglot as well: I can't change between the main and change branches of https://github.com/danp/go-40456 and have errors due to changed function/method signatures and such go away.

@joaotavora
Copy link
Owner

I think that eshell function you paste there is a good candidate to somehow plug instead of file-expand-wildcards, maybe? Or maybe you could open an Emacs bug to expand file-expand-wildcards 's capabilities.

@joaotavora joaotavora added the emacs-bug Something to be solved mostly in Emacs label Jan 26, 2021
@danp
Copy link
Author

danp commented Jan 26, 2021

Thanks for eglot, by the way!

To clarify: eshell-glob-regexp was what lsp-mode was using when I opened that bug and my comparison there was to show how its rules differed from what the LSP spec requires.

If eglot were to use eshell-glob-regexp what are you thinking that would look like?

I could open bugs against eshell-glob-regexp or file-expand-wildcards if you think that would help but I do wonder if it makes sense to force/change them to support the LSP spec. Especially if the LSP spec could change at some point in this regard.

lsp-mode ended up implementing lsp-glob-to-regexps and related functions.

@leungbk
Copy link
Contributor

leungbk commented Jan 28, 2021

eshell-glob-regexp does not handle the globstar pattern; eshell-glob-entries handles it instead, presumably for performance reasons:

> (eshell-glob-regexp "/**/*.go")
"\\`/.*.*/.*\\.go\\'"

So it might be better to have some glob-to-regexp machinery specifically for Eglot.

The lsp-mode glob code is simply a port of the same code in VSCode. A near-correct implementation of the globbing requires some knowledge of the grammar (specifically, priority of certain meta-characters) that is not specified in the protocol, knowledge that I could only glean from reading the VSCode source and that I had forgotten until just now when I looked at my commit again.

Since I was the one who did the port in lsp-mode, I'm not sure if that forces me to recuse from making a related PR here (since the FSF doesn't own the copyright assignment to VSCode's source, and since my mind has been adulterated from having read it). If it doesn't, then I could submit a PR within the next few days. Handling "sane" globs (with balanced brackets, etc.) for sane file names (specifically, file names that don't have special characters like brackets or commas) should be easy, given the (possibly disqualifying for me, given how I acquired it) knowledge of meta-character priorities.

@joaotavora
Copy link
Owner

joaotavora commented Jan 28, 2021

Thanks @leungbk I had a look at the lsp port. Iit's quite... big, let's put it like that :-) I wonder if some grammar-parsing engine in Emacs wouldn't cut out on all those lines. I don't see a need to convert to regexp, for example. We just need a function that is given a glob string and list of pathname components and tells if the second "matches" the former.

@joaotavora
Copy link
Owner

Anyway, can you post a direct link to MS's glob spec here?

@leungbk
Copy link
Contributor

leungbk commented Jan 28, 2021

export interface FileSystemWatcher {
	/**
	 * The  glob pattern to watch.
	 *
	 * Glob patterns can have the following syntax:
	 * - `*` to match one or more characters in a path segment
	 * - `?` to match on one character in a path segment
	 * - `**` to match any number of path segments, including none
	 * - `{}` to group conditions (e.g. `**​/*.{ts,js}` matches all TypeScript
	 *   and JavaScript files)
	 * - `[]` to declare a range of characters to match in a path segment
	 *   (e.g., `example.[0-9]` to match on `example.0`, `example.1`, …)
	 * - `[!...]` to negate a range of characters to match in a path segment
	 *   (e.g., `example.[!0-9]` to match on `example.a`, `example.b`, but not
	 *   `example.0`)
	 */
	globPattern: string;

	/**
	 * The kind of events of interest. If omitted it defaults
	 * to WatchKind.Create | WatchKind.Change | WatchKind.Delete
	 * which is 7.
	 */
	kind?: uinteger;
}

Simply using recursive-descent parsing with knowledge of the priorities of the meta-characters would lead to a reasonably short glob-to-regexp function(s). All the other stuff I added was there to handle corner cases that would probably never arise in settings where the user has sane file names and the server is behaving.

@joaotavora
Copy link
Owner

user has sane file names and the server is behaving.

Right. But it's not the user that has the file names, it's the file system. :-) And if the server is misbehaving, it's the server's fault. I think a good old state machine (with backtracking, I suppose) is what's needed here. I don't have knowledge of parsers handy, but there's tons of literature on it, and Lisp seems like a good language for a terse implementation (that should probably migrate to Emacs).

@joaotavora
Copy link
Owner

joaotavora commented Jan 28, 2021

Thinking about this a bit more, we need to implement a compiler of glob expressions into an elisp functions, not into regexps. I suppose we can even byte-compile the resulting lambda at runtime to speed up the matching.

joaotavora added a commit that referenced this issue Jan 31, 2021
* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.
@joaotavora
Copy link
Owner

So, I hacked a 50~ sloc LSP glob-to-elisp compiler, with byte-compilation support. It passes the existing RLS watch-files tests, but I might be missing some. I put it in a branch scratch/issue-602-glob-heroics. I suppose you guys @leungbk or @danp would be interesting in testing this?

@leungbk
Copy link
Contributor

leungbk commented Jan 31, 2021

Thanks for looking into this @joaotavora. Here are some tests taken from VSCode that should pass, but don't.

(ert-deftest my/eglot--glob-test ()
  (should (eglot--glob-match "**/.*" ".git"))
  (should (eglot--glob-match ".?" ".o"))
  (should (eglot--glob-match "**/.*" ".hidden.txt"))

  (should (eglot--glob-match "**/.*" "path/.git"))
  (should (eglot--glob-match "**/.*" "path/.hidden.txt"))

  (should (eglot--glob-match "**/node_modules/**" "node_modules"))
  (should (eglot--glob-match "**/node_modules/**" "node_modules/"))

  (should (eglot--glob-match "{foo,bar}/**" "foo"))
  (should (eglot--glob-match "{foo,bar}/**" "bar"))
  (should (eglot--glob-match "{foo,bar}/**" "foo/test"))
  (should (eglot--glob-match "{foo,bar}/**" "bar/test"))

  (should (eglot--glob-match "{**/*.d.ts,**/*.js}" "/testing/foo.js"))
  (should (eglot--glob-match "{**/*.d.ts,**/*.js}" "testing/foo.d.ts"))
  (should (eglot--glob-match "{**/*.d.ts,**/*.js,foo.[0-9]}" "foo.5"))

  (should (eglot--glob-match "some/**/*" "some/foo.js"))
  (should (eglot--glob-match "some/**/*" "some/folder/foo.js"))

  (should (eglot--glob-match "prefix/{**/*.d.ts,**/*.js,foo.[0-9]}" "prefix/foo.8")))

It seems that braced groups should be able to contain / characters as well as meta-characters. And it seems that depending on its location in the glob pattern, globstar as it is currently implemented here doesn't always consume zero path segments.

@joaotavora
Copy link
Owner

Thanks for these tests

Most of them seem simple. I'll see what can be done. I'm mostly interested in actual globs returned by actual servers.

(should (eglot--glob-match "{/*.d.ts,/.js}" "/testing/foo.js"))
(should (eglot--glob-match "{**/
.d.ts,/*.js}" "testing/foo.d.ts"))
(should (eglot--glob-match "{
/.d.ts,**/.js,foo.[0-9]}" "foo.5"))

I can't derive from the spec that this is allowed, which is basically globs inside globs. Not that it's stupidly difficult to implement, but are also globs inside globs inside globs inside globs supported? Amateur stuff, this LSP spec.

joaotavora added a commit that referenced this issue Jan 31, 2021
* eglot.el (eglot--glob-parse): Fix some grammar.
(eglot--glob-compile): Assert eobp as final state.
(eglot--glob-emit-**, eglot--glob-emit-*): Backtracking means
actually backtracking.

* eglot-tests.el (eglot--glob-test): Add some tests by Brian Leung.
@joaotavora
Copy link
Owner

I fixed most of the tests. I'm not sure about tests like this.

(should (eglot--glob-match "/node_modules/" "node_modules"))

In my view, this should match "node_modules/" but not "node_modules". The / is a literal here. I could change that, but should I? The spec is ambiguous.

Also, I didn't implement nested blobs (though it isn't super-hard). What servers are working with nested blobs?

joaotavora added a commit that referenced this issue Jan 31, 2021
* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.
joaotavora added a commit that referenced this issue Jan 31, 2021
* eglot.el (eglot--glob-parse): Fix some grammar.
(eglot--glob-compile): Assert eobp as final state.
(eglot--glob-emit-**, eglot--glob-emit-*): Backtracking means
actually backtracking.

* eglot-tests.el (eglot--glob-test): Add some tests by Brian Leung.
@danp
Copy link
Author

danp commented Jan 31, 2021

I think that branch fixes my repro at https://github.com/danp/go-40456, thanks!

Re what to support or not, I don't have a strong opinion but it might reduce the chance of later surprise if any patterns VSCode supports (in its tests, etc) are supported by eglot.

@leungbk
Copy link
Contributor

leungbk commented Feb 1, 2021

I fixed most of the tests. I'm not sure about tests like this.

(should (eglot--glob-match "/node_modules/" "node_modules"))

In my view, this should match "node_modules/" but not "node_modules". The / is a literal here. I could change that, but should I? The spec is ambiguous.

I don't feel strongly about it either.

While I don't see any harm in using the tests to inform the implementation, I'm not sure if it's appropriate to include the tests since I simply stole them from VSCode? I'm not very familiar with the specifics of the FSF's rules though.

Finally,

(defun eglot--glob-emit-self (text self next)
  `(,self () (re-search-forward ,(concat "\\=" text)) (,next)))

I mentioned this in the corresponding commit, but I think the above would be better as

(defun eglot--glob-emit-self (text self next)
  `(,self () (re-search-forward ,(concat "\\=" (regexp-quote text))) (,next)))

since the . character holds no special meaning in globs, and thus must be escaped when used as part of a regexp.

@joaotavora
Copy link
Owner

I simply stole them from VSCode? I'm not very familiar with the specifics of the FSF's rules though.

I'm not either. Tests cases aren't the program though, they're expectations about the inputs and outputs of said program.

since the . character holds no special meaning in globs, and thus must be escaped when used as part of a regexp.

Yes. Can you come up with a test that shows this?

@joaotavora
Copy link
Owner

Re what to support or not, I don't have a strong opinion but it might reduce the chance of later surprise if any patterns VSCode supports (in its tests, etc) are supported by eglot.

Eglot is built against the LSP not against VSCode. Unfortunately, the M$ folks do mix one and the other, but at least let's pretend they don't.

@leungbk
Copy link
Contributor

leungbk commented Feb 1, 2021

Yes. Can you come up with a test that shows this?

---
 eglot-tests.el | 7 ++++---
 eglot.el       | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/eglot-tests.el b/eglot-tests.el
index fab560e..be192ae 100644
--- a/eglot-tests.el
+++ b/eglot-tests.el
@@ -1071,9 +1071,10 @@ will assume it exists."
   (should-not (eglot--glob-match "example.[0-9]" "example.a"))
   (should (eglot--glob-match "**/bar/" "foo/bar/"))
 
-  ;; Some tests provided by Brian Leung
+  ;; Some tests from VSCode
   (should (eglot--glob-match "**/.*" ".git"))
   (should (eglot--glob-match ".?" ".o"))
+  (should-not (eglot--glob-match "foo.hs" "fooxhs"))
   (should (eglot--glob-match "**/.*" ".hidden.txt"))
   (should (eglot--glob-match "**/.*" "path/.git"))
   (should (eglot--glob-match "**/.*" "path/.hidden.txt"))
@@ -1083,14 +1084,14 @@ will assume it exists."
   (should (eglot--glob-match "some/**/*" "some/foo.js"))
   (should (eglot--glob-match "some/**/*" "some/folder/foo.js"))
 
-  ;; These tests also by Brian.  We could support them but I'm not
+  ;; These tests also from VSCode.  We could support them but I'm not
   ;; sure we should.
   ;;
   ;; (should (eglot--glob-match "**/node_modules/**" "node_modules"))
   ;; (should (eglot--glob-match "{foo,bar}/**" "foo"))
   ;; (should (eglot--glob-match "{foo,bar}/**" "bar"))
 
-  ;; Also by Brian.  We don't support nested blobs.  We could, but do
+  ;; Also from VSCode.  We don't support nested blobs.  We could, but do
   ;; we care?
   ;;
   ;; (should (eglot--glob-match "{**/*.d.ts,**/*.js}" "/testing/foo.js"))
diff --git a/eglot.el b/eglot.el
index 8a3314c..d05d7e1 100644
--- a/eglot.el
+++ b/eglot.el
@@ -2701,7 +2701,7 @@ If PREDICATE, return boolean predicate, else erroring function."
     (if byte-compile (byte-compile form) form)))
 
 (defun eglot--glob-emit-self (text self next)
-  `(,self () (re-search-forward ,(concat "\\=" text)) (,next)))
+  `(,self () (re-search-forward ,(concat "\\=" (regexp-quote text))) (,next)))
 
 (defun eglot--glob-emit-** (_ self next)
   `(,self () (or (ignore-errors (save-excursion (,next)))
-- 
2.30.0

joaotavora added a commit that referenced this issue Feb 1, 2021
with-temp-buffer was taking a lot of time, presumably because it kills
the buffer.  Since emacs is single-threaded, we can safely reuse a
single buffer.

* eglot.el (eglot--glob-parse): Simplify grammar.
(eglot--glob-compile): Don't with-temp-buffer.
@joaotavora
Copy link
Owner

I did a little testing and optimization and the glob matcher is now 2x faster. It also seems to be around 2x faster than the lsp-mode convert-to-regexp code referenced here

@leungbk
Copy link
Contributor

leungbk commented Feb 2, 2021

@joaotavora I think Eglot's glob grammar needs to be adjusted to permit globs within braces. Here's what I got from using gopls within the direnv repo:

[server-request] (id:3) Tue Feb  2 12:05:57 2021:
(:jsonrpc "2.0" :method "client/registerCapability" :params
          (:registrations
           [(:id "workspace/didChangeWatchedFiles-0" :method "workspace/didChangeWatchedFiles" :registerOptions
                 (:watchers
                  [(:globPattern "**/*.{go,mod,sum}" :kind 7)
                   (:globPattern "/home/brian/mygit/direnv/**/*.{go,mod,sum}" :kind 7)
                   (:globPattern "{/home/brian/mygit/direnv/gzenv,/home/brian/mygit/direnv/script,/home/brian/mygit/direnv/script/issue-command,/home/brian/mygit/direnv/script/release-changelog,/home/brian/mygit/direnv/sri,/home/brian/mygit/direnv/xdg}" :kind 7)]))])
          :id 3)
[client-reply] (id:3) ERROR Tue Feb  2 12:05:57 2021:
(:jsonrpc "2.0" :id 3 :error
          (:code -32603 :message "Internal error"))

The last of the three glob patterns isn't presently handled correctly:

(eglot--glob-parse "{/home/brian/mygit/direnv/gzenv,/home/brian/mygit/direnv/script,/home/brian/mygit/direnv/script/issue-command,/home/brian/mygit/direnv/script/release-changelog,/home/brian/mygit/direnv/sri,/home/brian/mygit/direnv/xdg}" )

yields *** Eval error *** Glob ’{/home/brian/mygit/direnv/gzenv,/home/brian/mygit/direnv/script,/home/brian/mygit/direnv/script/issue-command,/home/brian/mygit/direnv/script/release-changelog,/home/brian/mygit/direnv/sri,/home/brian/mygit/direnv/xdg}’ invalid at 1

@joaotavora
Copy link
Owner

joaotavora commented Feb 2, 2021

There aren't really globs inside the braces, they are all literals. That's a one-char change:

diff --git a/eglot.el b/eglot.el
index 9944b18..90d973c 100644
--- a/eglot.el
+++ b/eglot.el
@@ -2671,7 +2671,7 @@ at point.  With prefix argument, prompt for ACTION-KIND."
      with grammar = '((:**      "\\*\\*/?"              eglot--glob-emit-**)
                       (:*       "\\*"                   eglot--glob-emit-*)
                       (:?       "\\?"                   eglot--glob-emit-?)
-                      (:{}      "{[^][/*{}]+}"          eglot--glob-emit-{})
+                      (:{}      "{[^][*{}]+}"           eglot--glob-emit-{})
                       (:range   "\\[\\^?[^][/,*{}]+\\]" eglot--glob-emit-range)
                       (:literal "[^][,*?{}]+"           eglot--glob-emit-self))
      until (eobp)

This will compile your example into:

(lambda
  (string)
  (with-current-buffer
      (get-buffer-create " *eglot-glob-matcher*")
    (erase-buffer)
    (save-excursion
      (insert string))
    (cl-labels
        ((state-2238 nil
                     (or
                      (re-search-forward "\\=/home/brian/mygit/direnv/gzenv" nil t)
                      (re-search-forward "\\=/home/brian/mygit/direnv/script" nil t)
                      (re-search-forward "\\=/home/brian/mygit/direnv/script/issue-command" nil t)
                      (re-search-forward "\\=/home/brian/mygit/direnv/script/release-changelog" nil t)
                      (re-search-forward "\\=/home/brian/mygit/direnv/sri" nil t)
                      (re-search-forward "\\=/home/brian/mygit/direnv/xdg" nil t)
                      (error "Failed matching any of %s"
                             '("/home/brian/mygit/direnv/gzenv" "/home/brian/mygit/direnv/script" "/home/brian/mygit/direnv/script/issue-command" "/home/brian/mygit/direnv/script/release-changelog" "/home/brian/mygit/direnv/sri" "/home/brian/mygit/direnv/xdg")))
                     (eobp)))
      (or
       (state-2238)
       (error "Glob done but more unmatched text: '%s'"
              (buffer-substring
               (point)
               (point-max)))))))

Handling nested globs is also moderately easy, but I'd skip on that complexity if I could.

joaotavora added a commit that referenced this issue Feb 3, 2021
Alternative groups {} don't bork on forward slash.

* eglot.el (eglot--glob-parse): Tweak {} grammar.
@leungbk
Copy link
Contributor

leungbk commented Feb 3, 2021

There aren't really globs inside the braces, they are all literals. That's a one-char change:

You'd have to make a somewhat longer change here as well, to reflect the fact that we are not allowed to delete the } in {/home/alice}.

I also noticed that while eglot--directories-recursively returns directories with terminating slashes (/home/), the braced pattern I posted above doesn't have terminating slashes in any of the comma-separated forms (they all read like /home).

Do you think it's worth patching Eglot for these two cases (1. literal patterns within braces, and 2. having non-slash-terminated patterns match the slash-terminated directories returned by eglot--directories-recursively)?

joaotavora added a commit that referenced this issue Feb 3, 2021
Instead of massaging the globPattern to match directories instead of
files, which is fragile, gather the list of directoris to watch by
matching the globPattern against every file recursively (except hidden
files and dirs).

This is still not 100% correct, but should do the right thing is most
cases.  Notably, if the correct dirs are being watched, the glob
pattern is matched against all existing and new files in those
directories, which does include hidden files.

* eglot.el (eglot-register-capability): match file globs against
files only.
(eglot--files-recursively): Rename from eglot--directories-recursively.
@joaotavora
Copy link
Owner

You'd have to make a somewhat longer change here as well, to reflect the fact that we are not allowed to delete the } in {/home/alice}.

I rewrote that, it was fragile anyway. There's a different way to find the directories to place the FS watcher on now. I hope you can test it.

Do you think it's worth patching Eglot for these two cases (1. literal patterns within braces, and 2. having non-slash-terminated patterns match the slash-terminated directories returned by eglot--directories-recursively)?

Hopefully my tweaks fixed both these cases, but let me know if you find more problems.

Thanks!

@leungbk
Copy link
Contributor

leungbk commented Feb 4, 2021

Thanks for looking into this. I noticed one issue unrelated to the grammar.

This line can yield infinite recursion when symlinks are arranged in a certain way:

eglot/eglot.el

Line 2728 in 398b81e

if (file-name-directory f) append (eglot--files-recursively f)

Not sure if you want to prevent symlink-expansion in all cases, or if it's better to allow symlink-expansion and maintain a set of seen entries in the cl-loop to avoid visiting something more than once. Sometimes build targets can generate symlinks in a project to weird places like the Guix/Nix store that may not necessarily be of interest to Eglot.

@joaotavora
Copy link
Owner

This line can yield infinite recursion when symlinks are arranged in a certain way:

Maybe yes. My next project is working on improving this so that the glob can be integrated in eglot--files-recursively. Instead of generating a zillion files and testing each one to see which one matches, I'd like to somehow figure out directories that i don't need to descend to. If you have some thoughts about this, I'd like to hear them.

Then I'll look into the symlink business. I haven't seen a project with looping symlinks for a while though. Actually, I don't think I've ever seen one. It's possible to do, I guess, but kind of a bad idea.

@joaotavora
Copy link
Owner

This line can yield infinite recursion when symlinks are arranged in a certain way:

Maybe yes.

Actuallly, to be precise, it won't "yield infinite recursion", it'll just stack overflow, which won't freeze or crash your Emacs.

@leungbk
Copy link
Contributor

leungbk commented Feb 5, 2021

Then I'll look into the symlink business. I haven't seen a project with looping symlinks for a while though. Actually, I don't think I've ever seen one. It's possible to do, I guess, but kind of a bad idea.

As a matter of project architecture I agree it's a bad idea, but the project I used to test the file-watching (direnv) uses a looping symlink solely as a test fixture, which caused a stack overflow in the recursive file-gathering when opening any file in the project.

joaotavora added a commit that referenced this issue Apr 10, 2021
In #602, not only a new glob processing system was implemented, but
also a new, more correct, way to look for directories that might hold
files matched by one of these globs.

Answering this question is important because the file watchers for
'workspace/didChangeWatchedFiles' are placed on a per-directory basis.
Previously, a glob such as /foo/**/bar/*.el would fail to produce
practical file-watching effects because /foo/**/bar/ isn't really a
directory.

However, answering this question is also expensive, as the globs sent
by the LSP server are meant to match files, not directories.  The only
way is to list all files under the project's root directory and test
each glob on each one.  If it matches at least one file, that file's
directory is meant to be watched.

We suspect that in #645 and #633 we are falling victim to LSP server
who serve a tremendous unoptimized number of globs, one for each file.
So instead of sending just '/foo/**/bar/*.el' they send
'/foo/**/bar/quux.el', '/foo/**/bar/quuz.el', etc...  which would
tremendeously slow down the process.  But this is only a suspicion.

This commit tries some simple optimizations: if a directory is known
to be watch-worthy becasue one of its files matched a single glob, no
more files under that directory are tried.  This should help somewhat.

Also fixed a bug in 'eglot--files-recursively', though I suspect that
doesn't make that much of a difference.

* eglot.el (eglot--directories-matched-by-globs): New helper.
(eglot--files-recursively): Fix bug.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
Thanks to Brian Leung and Dan Peterson for testing and helping me spot
bugs.

* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
with-temp-buffer was taking a lot of time, presumably because it kills
the buffer.  Since emacs is single-threaded, we can safely reuse a
single buffer.

* eglot.el (eglot--glob-parse): Simplify grammar.
(eglot--glob-compile): Don't with-temp-buffer.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
Alternative groups {} don't bork on forward slash.

* eglot.el (eglot--glob-parse): Tweak {} grammar.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
…geWatchedFiles

Instead of massaging the globPattern to match directories instead of
files, which is fragile, gather the list of directoris to watch by
matching the globPattern against every file recursively (except hidden
files and dirs).

This is still not 100% correct, but should do the right thing is most
cases.  Notably, if the correct dirs are being watched, the glob
pattern is matched against all existing and new files in those
directories, which does include hidden files.

* eglot.el (eglot-register-capability): match file globs against
files only.
(eglot--files-recursively): Rename from eglot--directories-recursively.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 18, 2022
…correspondence

In joaotavora/eglot#602, not only a new glob processing system was implemented, but
also a new, more correct, way to look for directories that might hold
files matched by one of these globs.

Answering this question is important because the file watchers for
'workspace/didChangeWatchedFiles' are placed on a per-directory basis.
Previously, a glob such as /foo/**/bar/*.el would fail to produce
practical file-watching effects because /foo/**/bar/ isn't really a
directory.

However, answering this question is also expensive, as the globs sent
by the LSP server are meant to match files, not directories.  The only
way is to list all files under the project's root directory and test
each glob on each one.  If it matches at least one file, that file's
directory is meant to be watched.

We suspect that in joaotavora/eglot#645 and joaotavora/eglot#633 we are falling victim to LSP server
who serve a tremendous unoptimized number of globs, one for each file.
So instead of sending just '/foo/**/bar/*.el' they send
'/foo/**/bar/quux.el', '/foo/**/bar/quuz.el', etc...  which would
tremendeously slow down the process.  But this is only a suspicion.

This commit tries some simple optimizations: if a directory is known
to be watch-worthy becasue one of its files matched a single glob, no
more files under that directory are tried.  This should help somewhat.

Also fixed a bug in 'eglot--files-recursively', though I suspect that
doesn't make that much of a difference.

* eglot.el (eglot--directories-matched-by-globs): New helper.
(eglot--files-recursively): Fix bug.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Thanks to Brian Leung and Dan Peterson for testing and helping me spot
bugs.

* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
with-temp-buffer was taking a lot of time, presumably because it kills
the buffer.  Since emacs is single-threaded, we can safely reuse a
single buffer.

* eglot.el (eglot--glob-parse): Simplify grammar.
(eglot--glob-compile): Don't with-temp-buffer.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Alternative groups {} don't bork on forward slash.

* eglot.el (eglot--glob-parse): Tweak {} grammar.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
…geWatchedFiles

Instead of massaging the globPattern to match directories instead of
files, which is fragile, gather the list of directoris to watch by
matching the globPattern against every file recursively (except hidden
files and dirs).

This is still not 100% correct, but should do the right thing is most
cases.  Notably, if the correct dirs are being watched, the glob
pattern is matched against all existing and new files in those
directories, which does include hidden files.

* eglot.el (eglot-register-capability): match file globs against
files only.
(eglot--files-recursively): Rename from eglot--directories-recursively.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
…correspondence

In joaotavora/eglot#602, not only a new glob processing system was implemented, but
also a new, more correct, way to look for directories that might hold
files matched by one of these globs.

Answering this question is important because the file watchers for
'workspace/didChangeWatchedFiles' are placed on a per-directory basis.
Previously, a glob such as /foo/**/bar/*.el would fail to produce
practical file-watching effects because /foo/**/bar/ isn't really a
directory.

However, answering this question is also expensive, as the globs sent
by the LSP server are meant to match files, not directories.  The only
way is to list all files under the project's root directory and test
each glob on each one.  If it matches at least one file, that file's
directory is meant to be watched.

We suspect that in joaotavora/eglot#645 and joaotavora/eglot#633 we are falling victim to LSP server
who serve a tremendous unoptimized number of globs, one for each file.
So instead of sending just '/foo/**/bar/*.el' they send
'/foo/**/bar/quux.el', '/foo/**/bar/quuz.el', etc...  which would
tremendeously slow down the process.  But this is only a suspicion.

This commit tries some simple optimizations: if a directory is known
to be watch-worthy becasue one of its files matched a single glob, no
more files under that directory are tried.  This should help somewhat.

Also fixed a bug in 'eglot--files-recursively', though I suspect that
doesn't make that much of a difference.

* eglot.el (eglot--directories-matched-by-globs): New helper.
(eglot--files-recursively): Fix bug.
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Thanks to Brian Leung and Dan Peterson for testing and helping me spot
bugs.

* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.

#602: joaotavora/eglot#602
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
with-temp-buffer was taking a lot of time, presumably because it kills
the buffer.  Since emacs is single-threaded, we can safely reuse a
single buffer.

* eglot.el (eglot--glob-parse): Simplify grammar.
(eglot--glob-compile): Don't with-temp-buffer.

#602: joaotavora/eglot#602
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Alternative groups {} don't bork on forward slash.

* eglot.el (eglot--glob-parse): Tweak {} grammar.

#602: joaotavora/eglot#602
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
Instead of massaging the globPattern to match directories instead of
files, which is fragile, gather the list of directoris to watch by
matching the globPattern against every file recursively (except hidden
files and dirs).

This is still not 100% correct, but should do the right thing is most
cases.  Notably, if the correct dirs are being watched, the glob
pattern is matched against all existing and new files in those
directories, which does include hidden files.

* eglot.el (eglot-register-capability): match file globs against
files only.
(eglot--files-recursively): Rename from eglot--directories-recursively.

#602: joaotavora/eglot#602
bhankas pushed a commit to bhankas/emacs that referenced this issue Sep 19, 2022
In #602, not only a new glob processing system was implemented, but
also a new, more correct, way to look for directories that might hold
files matched by one of these globs.

Answering this question is important because the file watchers for
'workspace/didChangeWatchedFiles' are placed on a per-directory basis.
Previously, a glob such as /foo/**/bar/*.el would fail to produce
practical file-watching effects because /foo/**/bar/ isn't really a
directory.

However, answering this question is also expensive, as the globs sent
by the LSP server are meant to match files, not directories.  The only
way is to list all files under the project's root directory and test
each glob on each one.  If it matches at least one file, that file's
directory is meant to be watched.

We suspect that in #645 and #633 we are falling victim to LSP server
who serve a tremendous unoptimized number of globs, one for each file.
So instead of sending just '/foo/**/bar/*.el' they send
'/foo/**/bar/quux.el', '/foo/**/bar/quuz.el', etc...  which would
tremendeously slow down the process.  But this is only a suspicion.

This commit tries some simple optimizations: if a directory is known
to be watch-worthy becasue one of its files matched a single glob, no
more files under that directory are tried.  This should help somewhat.

Also fixed a bug in 'eglot--files-recursively', though I suspect that
doesn't make that much of a difference.

* eglot.el (eglot--directories-matched-by-globs): New helper.
(eglot--files-recursively): Fix bug.

#645: joaotavora/eglot#645
#602: joaotavora/eglot#602
#645: joaotavora/eglot#645
#633: joaotavora/eglot#633
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
Thanks to Brian Leung and Dan Peterson for testing and helping me spot
bugs.

* eglot-tests.el (eglot--glob-match): New test.

* eglot.el (eglot--wildcard-to-regexp): Delete.
(eglot-register-capability): Rework.
(eglot--glob-parse, eglot--glob-compile, eglot--glob-emit-self)
(eglot--glob-emit-**, eglot--glob-emit-*, eglot--glob-emit-?)
(eglot--glob-emit-{}, eglot--glob-emit-range)
(eglot--directories-recursively): New helpers.

GitHub-reference: fix joaotavora/eglot#602
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
with-temp-buffer was taking a lot of time, presumably because it kills
the buffer.  Since emacs is single-threaded, we can safely reuse a
single buffer.

* eglot.el (eglot--glob-parse): Simplify grammar.
(eglot--glob-compile): Don't with-temp-buffer.

GitHub-reference: per joaotavora/eglot#602
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
Alternative groups {} don't bork on forward slash.

* eglot.el (eglot--glob-parse): Tweak {} grammar.

GitHub-reference: per joaotavora/eglot#602
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
Instead of massaging the globPattern to match directories instead of
files, which is fragile, gather the list of directoris to watch by
matching the globPattern against every file recursively (except hidden
files and dirs).

This is still not 100% correct, but should do the right thing is most
cases.  Notably, if the correct dirs are being watched, the glob
pattern is matched against all existing and new files in those
directories, which does include hidden files.

* eglot.el (eglot-register-capability): match file globs against
files only.
(eglot--files-recursively): Rename from eglot--directories-recursively.

GitHub-reference: per joaotavora/eglot#602
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this issue Oct 12, 2022
In joaotavora/eglot#602, not only a new glob processing system was implemented, but
also a new, more correct, way to look for directories that might hold
files matched by one of these globs.

Answering this question is important because the file watchers for
'workspace/didChangeWatchedFiles' are placed on a per-directory basis.
Previously, a glob such as /foo/**/bar/*.el would fail to produce
practical file-watching effects because /foo/**/bar/ isn't really a
directory.

However, answering this question is also expensive, as the globs sent
by the LSP server are meant to match files, not directories.  The only
way is to list all files under the project's root directory and test
each glob on each one.  If it matches at least one file, that file's
directory is meant to be watched.

We suspect that in joaotavora/eglot#645 and joaotavora/eglot#633 we are falling victim to LSP server
who serve a tremendous unoptimized number of globs, one for each file.
So instead of sending just '/foo/**/bar/*.el' they send
'/foo/**/bar/quux.el', '/foo/**/bar/quuz.el', etc...  which would
tremendeously slow down the process.  But this is only a suspicion.

This commit tries some simple optimizations: if a directory is known
to be watch-worthy becasue one of its files matched a single glob, no
more files under that directory are tried.  This should help somewhat.

Also fixed a bug in 'eglot--files-recursively', though I suspect that
doesn't make that much of a difference.

* eglot.el (eglot--directories-matched-by-globs): New helper.
(eglot--files-recursively): Fix bug.

GitHub-reference: per joaotavora/eglot#645
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
emacs-bug Something to be solved mostly in Emacs
Projects
None yet
Development

No branches or pull requests

3 participants