Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed background commands hang Emacs #1110

Closed
mmaechler opened this issue Feb 5, 2021 · 62 comments
Closed

Malformed background commands hang Emacs #1110

mmaechler opened this issue Feb 5, 2021 · 62 comments
Labels
Milestone

Comments

@mmaechler
Copy link
Member

This is very new; just from today, git pulling and restarting emacs.
emacs -f R to just start an *R* inferior buffer, no R script for the moment (yes, atypical for me).

This even "kills" the mouse - emacs-menu interaction: If I click on one of the menus, nothing happens .... well about 5 minutes later when I've given up on that emacs, the pull down menu suddenly opens (on top of firefox in this case which has been my 'active window' by then).

I strongly suspect the culprit is some ESS background process -- happening accidentally not in the background -- which takes much longer than for average people: I have ca 20'000 R packages in my .libPaths() libraries (all of CRAN (apart from a few dozen I can't easily install), plus a 100 Bioconductore, ..).
If ESS gets the silly idea that it should prepare completion for all of these namespaces and does so not really in the background .. this is explains which I'm seeing.

Yes, I've also changed my startup back to my previous default of not having tracebug active.
But that is improbably as cause, or .. ???

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

There is no background processes in ESS, only background tasks that are hidden from the user.

Can you please provide debugging info? I've added M-x ess-elisp-trace-mode so we can get an idea of what ESS is doing during hangs.

Also if you suspect this is because of your .libPaths(), can you please confirm it by setting an empty libpath?

Yes, I've also changed my startup back to my previous default of not having tracebug active.
But that is improbably as cause, or .. ???

I don't know but can you please at least try it without tracebug to confirm this?

@mmaechler
Copy link
Member Author

Even emacs key-strokes have no effect.... apart from C-g C-g C-g which still leaves that strange mode. Note that also the "blinking rectangle" which shows the "cursor" in the active buffer disappears. M-x ess-lisp-trace-mode shows some idle-timer before that horrible state starts.
... and that was exactly the behavior I saw: If I wake it up (by repeated C-g) I can type in the *R* buffer and everything looks fine... as soon as I start thinking for a second instead of typing away
the inferior buffer goes into this freeze state, where not even keystrokes or emacs menu actions (which are key strokes for emacs anyway) have any effect.

@mmaechler
Copy link
Member Author

emacs -f R -f ess-elisp-trace-mode &

and then saving the *trace-output* before it gets large, gives something like

======================================================================
1 -> (ess--idle-timer-function)
1 <- ess--idle-timer-function: nil
======================================================================
1 -> (ess-r-package-auto-activate)
1 <- ess-r-package-auto-activate: nil
======================================================================
1 -> (ess-r-package-auto-activate)
1 <- ess-r-package-auto-activate: nil
======================================================================
1 -> (ess-debug-minor-mode -1)
1 <- ess-debug-minor-mode: nil
======================================================================
1 -> (ess-process-live-p)
1 <- ess-process-live-p: (run open listen connect stop)
======================================================================
1 -> (ess-process-get on-error-action)
1 <- ess-process-get: nil
======================================================================
1 -> (ess--idle-timer-function)
1 <- ess--idle-timer-function: nil
======================================================================
1 -> (ess--idle-timer-function)
1 <- ess--idle-timer-function: nil
======================================================================
1 -> (ess-debug-minor-mode -1)
1 <- ess-debug-minor-mode: nil
======================================================================
1 -> (ess-process-live-p)
1 <- ess-process-live-p: (run open listen connect stop)
======================================================================
1 -> (ess-process-get on-error-action)
1 <- ess-process-get: nil
======================================================================
1 -> (ess-r-package-auto-activate)
1 <- ess-r-package-auto-activate: nil

A longer version I try to attach
trace-output_longer.txt

@mmaechler
Copy link
Member Author

mmaechler commented Feb 5, 2021

I don't know but can you please at least try it without tracebug to confirm this?

sure; the above was already with default ess-use-tracebug (t).

@mmaechler
Copy link
Member Author

mmaechler commented Feb 5, 2021

There is no background processes in ESS, only background tasks that are hidden from the user.

Well, yes, I don't know the emacs terminology. So it is such a background task which eats away all emacs resources, even keystroke actions are delayed for many minutes in my case, just 3 x C-g gets me out ... but only as long as I keep busy ... and as soon as I see

=====================================================================
1 -> (ess--idle-timer-function)
1 <- ess--idle-timer-function: nil

in the trace-output buffer, it's "frozen" again (to be thawed by C-g C-g C-g) ...

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

This is strange. You can normally safely ignore the ESS timer, it runs every second and so this is normal output. You can also ignore lines like ess--get-mode-line-indicator which are triggered at a rapid pace in normal behaviour.

Just to confirm, what do you have in ess-idle-timer-functions? This is the hook that is run by the timer.

just 3 x C-g gets me out ... but only as long as I keep busy

3x C-g is indicative of an inhibit-quit. I've just added one here today: https://github.com/emacs-ess/ESS/blob/89c0e127/lisp/ess-inf.el#L1329. However I don't see anything that could block in here. Maybe you could checkout revision a9ee60c prior to that PR to see if you observe any different behaviour.

sure; the above was already with default ess-use-tracebug (t).

Oh then without tracebug. It would be helpful to determine if this config change causes the problem or if it's the recently merged PR.

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

Thank you for the trace. This might be relevant:

| 2 -> (ess-wait-for-process #<process R> nil nil nil 30)
| | 3 -> (inferior-ess-ordinary-filter #<process R> "Error: unexpected symbol in \".ess.command(local({list_templates <- function(user_dirs) {    user_dirs <- c(user_dirs, unlist(lapply(user_dirs, list.dirs, recursive = FALSE)))    list_from_dir\"
> ")
| | | 4 -> (ess--if-verbose-write-process-state #<process R> "Error: unexpected symbol in \".ess.command(local({list_templates <- function(user_dirs) {    user_dirs <- c(user_dirs, unlist(lapply(user_dirs, list.dirs, recursive = FALSE)))    list_from_dir\"
> " "ordinary-filter")
| | | | 5 -> (ess-if-verbose-write "
ordinary-filter:
    --> busy:t busy-end:t sec-prompt:nil interruptable:nil <--
    --> running-async:nil callback:nil suppress-next-output:nil <--
    --> dbg-active:nil is-recover:nil <--
    --> string:Error: unexpected symbol in \".ess.command(local({l .... st.dirs, recursive = FALSE)))    list_from_dir\"
> 
")
| | | | 5 <- ess-if-verbose-write: nil
| | | 4 <- ess--if-verbose-write-process-state: nil
| | | 4 -> (inferior-ess--set-status-sentinel #<process R> #<buffer  *ess-command-output*> "ess-output-delimiter4")
| | | | 5 -> (inferior-ess--sentinel-start-re "ess-output-delimiter4")
| | | | 5 <- inferior-ess--sentinel-start-re: "^\\(ess-output-delimiter4-START$\\)"
| | | 4 <- inferior-ess--set-status-sentinel: nil
| | | 4 -> (inferior-ess-run-callback #<process R> "Error: unexpected symbol in \".ess.command(local({list_templates <- function(user_dirs) {    user_dirs <- c(user_dirs, unlist(lapply(user_dirs, list.dirs, recursive = FALSE)))    list_from_dir\"
> ")
| | | 4 <- inferior-ess-run-callback: nil
| | 3 <- inferior-ess-ordinary-filter: nil
| | 3 -> (ess--idle-timer-function)
| | 3 <- ess--idle-timer-function: nil
| | 3 -> (inferior-ess-fontify-region 1 1854 nil)
| | | 4 -> (ess-r--find-fl-keyword 1854)
| | | | 5 -> (ess-backward-up-list)
| | | | | 6 -> (ess-up-list -1)
| | | | | 6 <- ess-up-list: nil
| | | | 5 <- ess-backward-up-list: nil
| | | | 5 -> (ess-backward-up-list)
| | | | | 6 -> (ess-up-list -1)
| | | | | 6 <- ess-up-list: nil
| | | | 5 <- ess-backward-up-list: nil
| | | | 5 -> (ess-backward-up-list)
| | | | | 6 -> (ess-up-list -1)
| | | | | 6 <- ess-up-list: nil
| | | | 5 <- ess-backward-up-list: nil
| | | 4 <- ess-r--find-fl-keyword: nil
| | 3 <- inferior-ess-fontify-region: (jit-lock-bounds 1 . 1854)
| 2 <- ess-wait-for-process: !non-local\ exit!
| 2 -> (inferior-ess--sentinel-start-re "ess-output-delimiter4")
| 2 <- inferior-ess--sentinel-start-re: "^\\(ess-output-delimiter4-START$\\)"
| 2 -> (ess--interrupt #<process R>)
| 2 <- ess--interrupt: nil
1 <- ess-command: !non-local\ exit!
======================================================================
1 -> (ess-debug-minor-mode -1)
1 <- ess-debug-minor-mode: nil
======================================================================
1 -> (ess-process-live-p)
1 <- ess-process-live-p: (run open listen connect stop)
======================================================================
1 -> (ess-process-get on-error-action)
1 <- ess-process-get: nil
======================================================================
1 -> (inferior-ess-ordinary-filter #<process R> "
> ")
| 2 -> (ess--if-verbose-write-process-state #<process R> "
> " "ordinary-filter")
| | 3 -> (ess-if-verbose-write "
ordinary-filter:
    --> busy:t busy-end:t sec-prompt:nil interruptable:nil <--
    --> running-async:nil callback:nil suppress-next-output:nil <--
    --> dbg-active:nil is-recover:nil <--
    --> string:
> 
")
| | 3 <- ess-if-verbose-write: nil
| 2 <- ess--if-verbose-write-process-state: nil
| 2 -> (inferior-ess--set-status #<process R> "
> ")
| 2 <- inferior-ess--set-status: 1
| 2 -> (inferior-ess-run-callback #<process R> "
> ")
| 2 <- inferior-ess-run-callback: nil
| 2 -> (ess--command-proc-restore #<process R> ((old-pb . #<buffer *R*>) (old-pf . inferior-ess-tracebug-output-filter) (old-pm . 3123)))
| 2 <- ess--command-proc-restore: #<marker at 3123 in *R*>
1 <- inferior-ess-ordinary-filter: #<marker at 3123 in *R*>

Especially this part which suggests a parse error:

| | | 4 -> (inferior-ess-run-callback #<process R> "Error: unexpected symbol in \".ess.command(local({list_templates <- function(user_dirs) {    user_dirs <- c(user_dirs, unlist(lapply(user_dirs, list.dirs, recursive = FALSE)))    list_from_dir\"
> ")

It's trying to evaluate a long background command which starts like this:

local({
list_templates <-
 function(user_dirs) {
    user_dirs <- c(user_dirs, unlist(lapply(user_dirs, list.dirs, recursive = FALSE)))
...

I'm not familiar with this command, do you know what it is about? I see this which might be relevant to your heavy libpath:

    packages <- sort(unlist(lapply(.libPaths(), dir)))

That said, we're now supposed to handle hanging ESS-commands gracefully, I'm disappointed we're not there yet :(

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

3x C-g is indicative of an inhibit-quit

Oh of course anything in a process filter is behind an inhibit quit so that might be from the filter.

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

Analysing your trace, it looks like:

  1. The ess-command is failing to parse (it'd be great if you could figure out what that background command is, it needs to be fixed).

  2. Because it doesn't parse, .ess.command() is never run, so we're waiting undefinitely for the output delimiter. This either triggers a timeout or you trigger a quit with C-g. We used to timeout early on these malformed commands but we recently bumped the timeout to 30s after seeing undesired timeouts with slow-responding remotes.

  3. To protect against this case where the output delimiter is not properly set, we detect the starting delimiter in the process output. If not found, we disable the output delimiter so that the busy-detection mechanism only looks for a prompt and not for a closing delimiter. This is not very clear in the trace so I've just added this action to the dribble log: 963fd97.

  4. The filter is called again, the prompt is detected, the process is marked non-busy and the process buffer is restored to its state prior to the background command.

So it looks like everything is working as expected here, except that you would see a long freeze (that should be interruptible with a single C-g) since the command does not parse. The cause may be elsewhere.

@mmaechler
Copy link
Member Author

Oh then without tracebug. It would be helpful to determine if this config change causes the problem or if it's the recently merged PR.

misunderstanding: You asked me on #973 to see if that issue (ess-use-tracebug nil ==> extra [Enter] generated) is solved now, so I switched back to my previous setting 'nil'. Then I got this issue (but also after updating all packages, notably poly-R ..), git pulling ess, make, etc... so now I have tried both with 'nil' and with 't' and the background task killing interactivity phenomenon is there in both cases..

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

Found the command: https://github.com/polymode/poly-R/blob/51ffeb6ec45dd44eafa4d22ad2d6150cc4b248fc/poly-R.el#L109

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

Thank you for confirming the tracebug filter is not the cause. Can you please checkout this commit and try again? a9ee60c

To determine whether today's PR is at fault.

@mmaechler
Copy link
Member Author

or it is my update of poly-R after many months of never looking at the emacs package updates ?
I'll try the above commit...

@mmaechler
Copy link
Member Author

I did git checkout a9ee60caef8bda8ec69fa8caee8290bf6e2b464f (and make ) and
emacs -f R ... but it is the same bad situation.
So I assume it must be the "bad" poly-R code rather than a very recent ESS change?

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

or it is my update of poly-R after many months of never looking at the emacs package updates ?

Oh that makes sense, I think we're making good progress :)

That bit might be registering a regularly executing command: https://github.com/polymode/poly-R/blob/51ffeb6ec45dd44eafa4d22ad2d6150cc4b248fc/poly-R.el#L217-L224

Since it doesn't parse and since we've bumped the timeout to 30s this explains the hangs as soon as you stop interacting with Emacs.

So I'm going to find a solution for these malformed commands because they are too dangerous. I think I'll send the string to .ess.command() and parse it from there. Then we'll be able to send an informative error instantaneously.

Still one unexplained observation: having to C-g 3 times. This means there is an inhibit-quit somewhere. It's not in ess-command since I've removed it in #1094. It may not be in ESS.

@lionel- lionel- added the process label Feb 5, 2021
@lionel- lionel- added this to the next milestone Feb 5, 2021
@mmaechler
Copy link
Member Author

just on one of your notes: single C-g does not have any effect; it's really the triple one that's needed currently.
Also, that is not a workaround really; I basically can't work with R with current ESS (unless I start to change my R setup ... but that really was not what I would get into at the moment)

@mmaechler
Copy link
Member Author

Still one unexplained observation: having to C-g 3 times. This means there is an inhibit-quit somewhere. It's not in ess-command since I've removed it in #1094. It may not be in ESS.

Ok, only read that now. I will grep around a bit in my emacs setup.

@lionel-
Copy link
Member

lionel- commented Feb 5, 2021

oh it's more likely in the background invokation code, maybe easy-menu? I'm not familiar with this package.

@lionel- lionel- changed the title background evaluation after startup is "killing me" Malformed background commands hang Emacs Feb 5, 2021
@mmaechler
Copy link
Member Author

So, in my *R* buffer and globally inhibit-quit is always nil.
Grepping (recursively!) in my emacs setup (but not the installed packages), the only inhibit-quit I find is in ESS (and some old lisp files which I know are not loaded), notably even in (ess-command ..) there's a part surrounded by (let ((inhibit-quit t)) ....) -- so it's in there where things hang ??

@lionel-
Copy link
Member

lionel- commented Feb 6, 2021

This particular inhibit-quit in ess-command is necessary to avoid leaving the process in an inconsistent state. I've made sure it returns very quickly so that's not the cause of the hang.

@mmaechler
Copy link
Member Author

This particular inhibit-quit in ess-command is necessary to avoid leaving the process in an inconsistent state.

I have thought so

I've made sure it returns very quickly so that's not the cause of the hang.

I did not suggest so, but that the hang would happen in code called from that part of (ess-command ..) ... but maybe that's what you meant.

As a user the difference is not really that large between one C-g and three C-g ...
more important for me would be a simple workaround to stop the hang ... something less extreme than complete uninstalling of the poly-R package.

@vspinu
Copy link
Member

vspinu commented Feb 6, 2021

poly-R hasn't changed in a year. That command which picks Rmd templates for the Rmarkdown menu hasn't changed in 2 years. It should be triggered only when the menu is opened, and the reason it is slow for Martin is because of the network drive and loads of packages. @mmaechler how was the menu working for you before recent ESS changes? Did you see the hang? Did you have to C-g to unhang? Did you see any output in R buffer? And finally, did you have to C-c C-c in the R buffer?

@lionel- you are moving way to fast with the modifications of ess-command. I don't have the breadth to fully follow and understand your experimentation but very quick changes like that is nothing else than playing with fire IMO. That's critical code and I think changing how it works should be carefully considered.

@mmaechler
Copy link
Member Author

I never saw such hangs, I automatically browse some of the menus quite regularly ... so for some reasons the old version of poly-R I had .. could well be even older or was not triggered as easily.
To do something that may hang emacs as soon as people have larger spaces should be optional and off by default in my view, and in any case should run completely in the background under all circumstances.
Note that it hangs for me also when the 20'000 packages are on my fast local disk.

@vspinu
Copy link
Member

vspinu commented Feb 7, 2021

@mmaechler that would mean you had 2years old poly-R, which is pretty unlikely no?

To make sure that it's indeed poly-R, would you mind pulling an older ESS (before ess-command changes), say git checkout 3b38b783 && make clean and see if it still hangs? Thanks!

@lionel-
Copy link
Member

lionel- commented Feb 8, 2021

you are moving way to fast with the modifications of ess-command. I don't have the breadth to fully follow and understand your experimentation but very quick changes like that is nothing else than playing with fire IMO. That's critical code and I think changing how it works should be carefully considered.

Vitalie, this is rather uncalled for. The switch to output delimiters and to .ess.command() happened months ago. That was indeed a fundamental design change but I think we can agree a desired change since it made it possible to fix several longstanding issues and allow for robust detection of command outputs that does not depend on fragile prompt detection.

There has been some fallout from these changes as would be expected. When I see problems with remotes (cf last week issues) or with malformed commands (this issue) I take the time to fix them ASAP to unblock users of the dev version. I also add unit tests, which takes a lot of my time. That's moving too fast and playing with fire? Please be considerate of my work and my time.

The problem with Martin's issue is rather complicated, here is how I understand it:

  • The ESS command, assembled from strings, is not parsing. Maybe a problem with the concatenation logic or the input strings. It's possible it hasn't parsed with Martin's setup for quite a long time. IIRC Martin has not had a great experience with ESS for a while. Errors in ESS commands are currently silently ignored so that might have gone under the radar. I've been planning to fix this in this release cycle and just opened issue Forward process errors in ess-command as Emacs errors #1113.

  • Before the output delimiter, if the command did not parse on the R side, ess-command would return immediately because the prompt after the error message would be detected. After the output delimiter, malformed commands cause hangs because we are looking not only for a prompt but also for the delimiter that is set up by .ess.command(). The problem is that if the command does not parse, .ess.command() never gets a chance to run and display the delimiters.

  • My first workaround for these malformed commands was to set up a short timeout. My reasoning was that blocking the Emacs UI is always a bug so it makes sense to throw an informative error. As we have discovered though, that's not usable with slow remotes, so we've bumped the timeout to 30 seconds. Because of this, malformed commands now hang for 30 seconds. Since they used to return immediately that's a regression that causes poor user experience.

I've started work on a fix for malformed commands in lionel-@3cabdc51. I think this will solve the issue generically.

However it will not solve Martin's problem with the malformed command which I believe is external to ESS. It's also quite possible I'm missing something so I keep an open mind to alternative explanations of Martin's trace. It's also possible there are multiple problems causing his observations.

@vspinu
Copy link
Member

vspinu commented Feb 8, 2021

Please be considerate of my work and my time.

I just stated that you are making changes to critical code way too fast and I don't have time to catch up. Not more, not less. Nothing personal or disrespectful implied or intended. Of course I respect your time and know how crazily hard you work and how many days and weeks of time is invested here. In fact seeing insult where it is not probably means you are badly overworked, so maybe a slower tempo would be indeed beneficial here.

If we take more time and discuss things slowly I am sure we can come up with globally optimal solutions faster. Otherwise one person can pile a lot of new infra and we might end up replacing it anyhow in the future. As an example a lot of issues that you are dealing right now (delimiters, partial accumulation of output, async execution, passing R errors to emacs level et) do overlap and have been considered with the already implemented simple MPI framework that we have. And I don't think you had a look at it so far.

@lionel-
Copy link
Member

lionel- commented Feb 8, 2021

Regarding MPI, I chose to keep things simple because we were dealing with hard to fix bugs. ESS has been quite unstable for a while before this and I did not have a complete picture of how this all worked. MPI seemed to add a layer of complexity that was not desirable to start with. We can switch to MPI for sending messages (sentinels and errors) in the close or far future. I don't think it's incompatible and the overlap is small really. Also we've always accumulated output in ESS commands, that's not new.

@lionel-
Copy link
Member

lionel- commented Feb 10, 2021

@mmaechler Have you updated your branch?

@lionel-
Copy link
Member

lionel- commented Feb 10, 2021

Here is a reprex:

(ess-command "1\n2\n")

This evaluates 12 instead of 1; 2. I'll fix this right away so you can go back to the main dev branch @mmaechler.

@vspinu
Copy link
Member

vspinu commented Feb 10, 2021

Is this about restarting to browser?

I was overthinking probably. My concern was that the command run as part of ess-command would invokeRestarts, but I guess that's too much to allow for anyhow.

if (!is.null(findRestart("browser")))

What about recover and debug are they also installed as browsers?

@lionel-
Copy link
Member

lionel- commented Feb 10, 2021

What about recover and debug are they also installed as browsers?

I don't know the internals of recover but it is not relevant here because the process is not "ready" when recovering. The prompt is looking for a frame number rather than an R expression so there is no > prompt which would allow background commands to be run. As for debug() that's just another way of entering a browser prompt.

Edit: oh I guess you meant a browser prompt after choosing a frame number in recover(). Then we enter a regular browser prompt I believe. To be confirmed.

@lionel-
Copy link
Member

lionel- commented Feb 10, 2021

I introduced this silly bug a while ago :(
I'm surprised this went under the radar for so long.

Still need to fix the hang when commands are actually malformed, so I'm keeping this open.

@mmaechler You should now have a usable ESS if you checkout the latest master.

@mmaechler
Copy link
Member Author

(indeed, it's back to well usable, thank you!)

@lionel- lionel- removed this from the next milestone Mar 26, 2021
@mmaechler
Copy link
Member Author

Well, I'm currently suffering again ... and I think only since I git pull ed a day ago or so:

terrible timeouts, and

Saving file /u/maechler/R/D/r-devel/R/src/library/base/R/grep.R...
Mark set
Error running timer ‘ess--idle-timer-function’: (error "Timeout during background ESS command ‘getwd()’")
Replaced 0 occurrences
Wrote /u/maechler/R/D/r-devel/R/src/library/base/R/grep.R
Quit [2 times]
Error running timer ‘ess--idle-timer-function’: (error "Timeout during background ESS command ‘getwd()’")
Buffer grep.R modified; save it? (y or n) y
Saving file /u/maechler/R/D/r-devel/R/src/library/base/R/grep.R...
Mark set
Quit [7 times]
Mark set [5 times]

in the *Messages* buffer .... (which also shows that I pressed C-g "like crazy" in order to get emacs back from its "hang"...

Still much better than having timeouts and nothing in both *Messages* and *ESS as I get from the poly-R evaluation of all 30'000 packages in my .libPaths() ..

@mmaechler
Copy link
Member Author

It really is about saving src/library/base/R/grep.R in my development version of R-devel.. The save-hooks are triggering the timeout, and I'm even not able to save the file, because I need to interrupt emacs' hang state ... and then it prompts me again if I want to save the file, etc etc.
Probably I can use my old trick to change the major mode of the file to fundamental and then to save it without triggering the bad hook which is starting to run some crazy R job behind my back ...

@lionel-
Copy link
Member

lionel- commented Apr 1, 2021

This is strange, it seems a bit improbable for getwd() to take more than 30 seconds (the timeout threshold). An ess-elisp-trace-mode output might give precious information about what is going on.

Also what do you get when you run (ess-get-working-directory) manually? Do you also get a timeout? A hang for several seconds?

@lionel-
Copy link
Member

lionel- commented Apr 1, 2021

The save-hooks are triggering the timeout

This is unlikely because ESS doesn't register any save hook. Can you look in your save hooks and confirm, if you think this is the case? FWIW the timeouts seem to be triggered by ess-synchronize-dirs which is run from a timer on new process output (see ess-when-new-input). New process output can occur at any time because of background commands.

@mmaechler
Copy link
Member Author

Thank you for mentioning the fact that other timers are likely to be the case.. rather than save-file-hooks.
Indeed, today I see other such timeouts I get out via 3 x C-c and without open files. Also, getwd() etc is typically not at all slow. Some background command is trying to do its job, locks up emacs (or just the two buffers <foo>.R and *R* I have open), and I stop it (via 3x C-c) I can type a few things ... and then it starts again: Emacs and I playing "infinite loop":

  1. emacs "locks" because of some background job
  2. I break that lock
  3. I can work for a few seconds
  4. goto 1.

@lionel-
Copy link
Member

lionel- commented Apr 4, 2021

We can't crack the case with so few information. Here are a few things you could do:

  • Run M-x ess-elisp-trace-mode during a hang.
  • If it used to work, climb the git history to find the faulty commit or PR merge.
  • Inspect the contents of the *ess-command-output* buffer and look for anything unusual (mismatched output delimiter?).
  • Step-debug through ess-command and/or inferior-ess-ordinary-filter to figure out what's going on.

@lionel-
Copy link
Member

lionel- commented Apr 16, 2021

I got the getwd() hang and was able to get a trace and other debug info. Haven't had time to look into it yet but hopefully will make it possible to fix this.

@lionel- lionel- added this to the next milestone Apr 16, 2021
@mmaechler
Copy link
Member Author

Since today, I'm suffering again from complete lockup when I start emacs (which loads many things, notably poly-R and ess-site, etc).

  1. I find it hangs at (require 'poly-R) -- yes, this is the official melpa version, not the github one of polymode.
  2. I disable the loading of poly-R, now I get the non-interruptable hang when emacs tries to load my ess-site (which is uptodate ESS from github master).
  3. .. well, yes, I can use emacs without ESS ... but that's not what I do 99% of the time ...
  4. Ok, I'll try M-x ess-elisp-trace-mode before loading ESS .. and hope to find more to report here...

@mmaechler
Copy link
Member Author

Well, of course M-x ess-elisp-trace-mode is not available before ess-site is loaded.

At the moment, I can just kill emacs from the OS (or the GNOME gui) ...

@lionel-
Copy link
Member

lionel- commented May 18, 2021

If you can't C-g at all it may be a bug in Emacs.

After upgrading to Emacs 27 on macOS I faced systematic freezes. I "fixed" it by installing a fork that uses a different implementation for the macOS glue. It turns out that the built-in macOS layer is less stable than the fork.

The Linux layer should be more stable than the macOS one though.

@mmaechler
Copy link
Member Author

Thank you, @lionel-; this is "stock" emacs 27.1 from Fedora Linux 32 ... and we are soon upgrading to F 34.
But that version of emacs has not been updated since Nov/Dec 2020 ... and I also have been using default
R = R-4.1.0 prerelease (and = R-4.1.0 from today) so I don't think the change in behavior can be attributed to there directly.
It may be the case that our network file system has been particularly slow today, so it may have "hit" one of the timeouts you've been using and which had been long enough till now, but no longer today. OTOH, this bad behavior now has been "happening" consistently for several hours. Here's a screenshot from just now:

Bildschirmfoto von 2021-05-18 17-39-37

@lionel-
Copy link
Member

lionel- commented May 18, 2021

The timeouts are set to 30 seconds, could it really take that long? Just to be sure you could add this to your config to set the timeouts to infinity:

(defvar ess--command-default-timeout most-positive-fixnum)

Well, of course M-x ess-elisp-trace-mode is not available before ess-site is loaded.

This mode is a single block of code and has no dependencies so you could copy it in your config to make it available earlier.

@lionel-
Copy link
Member

lionel- commented May 18, 2021

Do you also have the freezes when ess-can-eval-in-background is set to nil?

@mmaechler
Copy link
Member Author

mmaechler commented May 19, 2021

The problem "went away" (i.e., has no longer been exposed!) after a reboot and the disk-network server being back to speed.
But it has been very frustrating and there is nothing but ESS (+polymode) that react so catastrophically to it..

@lionel-
Copy link
Member

lionel- commented May 19, 2021

The problem likely occurs in one of these steps:

  1. Background commands (an essential piece of ESS infrastructure that powers completions and eldoc) taking a long time due to slow network.
  2. Emacs 27 sending interrupts as you type or move the cursor (this is why old versions of ESS are not compatible with Emacs 27 and we had to review the background command mechanism to account for these interrupts).
  3. The interrupts being forwarded to the R side and then processed asynchronously. They are supposed to return right away and finish in a process filter when R has finished interrupting the background command. Further background commands for this process are disabled in the mean time.

If that's the case, setting ess-can-eval-in-background to nil should be a quick workaround. Maybe we could find a way to detect problems and disable background commands automatically. It feels like slow network access shouldn't be causing this sort of catastrophic hangs though. There must be a bug somewhere.

@bryce-carson
Copy link

It seems I may have opened a partial duplicate of this issue in #1162. The trigger isn't the exact same, but the behaviour enumerated by @mmaechler here is essentially what I encounter and couldn't describe easily.

Thank you for mentioning the fact that other timers are likely to be the case.. rather than save-file-hooks. Indeed, today I see other such timeouts I get out via 3 x C-c and without open files. Also, getwd() etc is typically not at all slow. Some background command is trying to do its job, locks up emacs (or just the two buffers <foo>.R and *R* I have open), and I stop it (via 3x C-c) I can type a few things ... and then it starts again: Emacs and I playing "infinite loop":

1. emacs "locks" because of some background job

2. I break that lock

3. I can work for a few seconds

4. goto 1.

Since setting ess-can-eval-in-background to nil I haven't experience the hang, so the ideas presented in this issue are definitely the same causes of the presentation of errant behaviour I experience and reported in #1162.

@lionel-
Copy link
Member

lionel- commented Apr 16, 2023

Should be fixed by polymode/poly-R#38 which adds a timeout of 200ms to the background command launched by the poly-R menu.

If you see sluggishness instead of hangs because the timeout is too large, you can add this to your config to disable the background eval altogether: (setq poly-r-can-eval-in-background nil).

Does this solve it for you @mmaechler?

@lionel-
Copy link
Member

lionel- commented Apr 17, 2023

Closing since Martin confirmed on polymode/poly-R@8024e85 that this fixed the hang 🎉

@lionel- lionel- closed this as completed Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants