Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource exhaustion with emacs 28's native compilation. #1222

Open
HinTak opened this issue Sep 3, 2022 · 49 comments
Open

Resource exhaustion with emacs 28's native compilation. #1222

HinTak opened this issue Sep 3, 2022 · 49 comments

Comments

@HinTak
Copy link

HinTak commented Sep 3, 2022

Somewhat disappointed with the lack of response with #1207 , so I gave current head ecd8865 a go against emacs 28.1 .

commit ecd8865bbbdf6664b66be5ffd5d4e62d5af78240 (HEAD -> master, origin/master, origin/HEAD)
Author: Vitalie Spinu <spinuvit@gmail.com>
Date:   Fri Sep 2 10:18:55 2022 +0200

    [Fix #1220] Make ess-r-initialize-on-start interactive

It still spawns multiple emacs processes and potentionally lead to resource exhaustion, and lock-ups/crashes.

One /tmp/emacs-async-comp-ess-custom-*.el and 123 /tmp/emacs-int-comp-subr--trampoline-*_delete_char_0-*.el Each correspond to a spawn process, so I had 123 new emacs instances running the latest ess, before I killed it.

looking at the content of the singular /tmp/emacs-async-comp-ess-custom-*.el, it says, minor formatting for readability:

 (require 'comp) ...
... (message "Compiling %s..." "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el")

(comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t)

According to https://koji.fedoraproject.org/koji/buildinfo?buildID=2002348 , fedora's emacs 28 is built with the new native compilation support, in the change log:

Build with Native Compilation support and natively compile all .el files

So there you are, the problem seems to be "(comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t)"

@HinTak
Copy link
Author

HinTak commented Sep 3, 2022

I believe the resource exhaustion comes from emacs trying do do (comp--native-compile "/usr/share/emacs/site-lisp/ess/lisp/ess-custom.el" t). You need a build of emacs 28 with naive compilation enabled to try this.

@lionel-

This comment was marked as resolved.

@lionel-
Copy link
Member

lionel- commented Sep 4, 2022

Sorry I can't reproduce.

@HinTak
Copy link
Author

HinTak commented Sep 11, 2022

Native compilation is on by default for packages located under "/usr/share/emacs/site-lisp/". So you probably won't see it if you put ess under MELPA or your own ~/.emacs .

I did it by doing "git archive ..." , generating a tarball on top of fedora's older rpm and rebuilding it. The rebuild process doesn't seem to do much other than trying byte-compile (successful and uneventful, so this confirms most of the warnings reported in #1207 as redherring, as there are no warnings in the byte-compile). There are two things I haven't thought of: what little stubs the fedora maintainer adds, and how the system copes with trying to native compile to a system location by a non-root user. Maybe one of these two is the problem.

@HinTak
Copy link
Author

HinTak commented Sep 11, 2022

The stub I talked about is just (requite 'ess-site), I think, besides putting ess under /usr/share/emacs/site-lisp

https://src.fedoraproject.org/rpms/emacs-common-ess/blob/rawhide/f/emacs-common-ess.spec#_88

@lionel-
Copy link
Member

lionel- commented Sep 12, 2022

So you probably won't see it if you put ess under MELPA or your own ~/.emacs .

I've launched native compilation manually on these files and it succeeded.

@HinTak
Copy link
Author

HinTak commented Sep 12, 2022

You are sure your emacs is built with it enabled? ./configure... --enable-native-compilation I think.

@lionel-
Copy link
Member

lionel- commented Sep 12, 2022

yes and I see the compiled .eln files in my cache.

@HinTak
Copy link
Author

HinTak commented Sep 12, 2022

I wonder how much of it is fedora-specific - there are 3 factors, how emacs is built, where ess is located (in /usr/share/emacs/site-lisp) and what little additions thar the fedora packager does to ess. Hope somebody answers in the old thread about non-fedora systems.

I haven't used R for a number of years but uses emacs on a daily basis. The logical choice for me is just to uninstall ess until I need R again, if ever. So I'd rather not spend too much time on this... I guess I could tar up my /usr/share/emacs/site-lisp/ess for comparison, and tries a few of the "disable native compilation" directives to see if they work around things and give more clues to the problem.

One thing I thought of, but prefer not to try, is to launch emacs as root - if the cause is a permission problem of non-root user trying to native compile and write to system location and failing over and over somehow, this might work around it. But I'd rather not do that :-(.

@cgorac
Copy link

cgorac commented Oct 17, 2022

I have no problem running emacs as root, at least not on my personal laptop, so I've tried and the problem is still there. Fedora 36, Emacs 28.1, ESS 18.10.2.

@HinTak
Copy link
Author

HinTak commented Oct 17, 2022

@cgorac do you mean emacs works correctly, with ess installed, when run as root?

@juhp
Copy link

juhp commented Oct 18, 2022

Dunno if it helps, but I heard that uim-1.8.9 included some fixes for newer Emacs which supposed/hopefully fixes uim elisp installation for Emacs 28.

(I dunno if there has been any emacs upstream discussion about this general issue?)

@cgorac
Copy link

cgorac commented Oct 18, 2022

@cgorac do you mean emacs works correctly, with ess installed, when run as root?

No, it hangs, just as when run under regular user account.

@juhp
Copy link

juhp commented Oct 24, 2022

Anyone tried setting native-comp-async-jobs-number?

@cgorac
Copy link

cgorac commented Oct 25, 2022

I've tried with (setq native-comp-async-jobs-number 4) in my .emacs, and it doesn't help.

@juhp
Copy link

juhp commented Oct 25, 2022

I've tried with (setq native-comp-async-jobs-number 4) in my .emacs, and it doesn't help.

If you have 8 vcpus then 4 is already the default.

@cgorac
Copy link

cgorac commented Oct 26, 2022

It doesn't matter, tried with 2 too, the same thing happens - Emacs GUI is stuck, in the background loads of Emacs processes are launched, until either interrupted or machine hangs.

@maitra
Copy link

maitra commented Nov 13, 2022

Dunno if it helps, but I heard that uim-1.8.9 included some fixes for newer Emacs which supposed/hopefully fixes uim elisp installation for Emacs 28.

(I dunno if there has been any emacs upstream discussion about this general issue?)

I do not have uim installed, and this is the first time I heard about it. But I was wondering if that would help with fixing the problem here.

@maitra
Copy link

maitra commented Nov 13, 2022

I wonder how much of it is fedora-specific - there are 3 factors, how emacs is built, where ess is located (in /usr/share/emacs/site-lisp) and what little additions thar the fedora packager does to ess. Hope somebody answers in the old thread about non-fedora systems.

I haven't used R for a number of years but uses emacs on a daily basis. The logical choice for me is just to uninstall ess until I need R again, if ever. So I'd rather not spend too much time on this... I guess I could tar up my /usr/share/emacs/site-lisp/ess for comparison, and tries a few of the "disable native compilation" directives to see if they work around things and give more clues to the problem.

One thing I thought of, but prefer not to try, is to launch emacs as root - if the cause is a permission problem of non-root user trying to native compile and write to system location and failing over and over somehow, this might work around it. But I'd rather not do that :-(.

Excellent point. If it is Fedora-specific, then we can have them fix it, though it does appear that Fedora appears to have this issue only (?) with emacs-ess. Btw, here is the emacs spec file on fedora which is how
emacs.zip
emacs is packaged there. Because github does not support .spec files, I have put it in a zip archive. Perhaps that might help in reproducing the issue? Emacs-ESS is essentially useless in Emacs 28.1. I have downgraded to Emacs 27.2, but with Fedora 37 coming out soon, that may not be an option for those of us who want to upgrade. In any case, Fedora 36 will expire in May 2023, and then all users will need to upgrade.

@juhp
Copy link

juhp commented Nov 14, 2022

Excellent point. If it is Fedora-specific, then we can have them fix it, though it does appear that Fedora appears to have this issue only (?) with emacs-ess.

No, some other Fedora elisp packages are also affected (which is why I mentioned uim as an example).

This is the location of Fedora's emacs.spec

@juhp
Copy link

juhp commented Nov 25, 2022

Okay I found that the Fedora emacs-vm package added the following to it vm-init.el file to disable native-comp for itself:

+ ;; For some reason, native compilation breaks VM. As a workaround until the
+ ;; problem is understood and fixed, disable native compilation of all VM lisp files.
+ (eval-after-load "comp"
+     '(if (boundp 'native-comp-deferred-compilation-deny-list)
+         (add-to-list 'native-comp-deferred-compilation-deny-list "/vm.*\.el"))) 

https://src.fedoraproject.org/rpms/emacs-vm/c/909b0bc357976252c51502bf17ed1efc6aeb7b97?branch=rawhide

I suppose similar could be done for ess if you are suffering from this issue.

@HinTak
Copy link
Author

HinTak commented Nov 25, 2022

That's a useful tips - I'll give it a try at some point.

@maitra
Copy link

maitra commented Nov 26, 2022

Okay I found that the Fedora emacs-vm package added the following to it vm-init.el file to disable native-comp for itself:

+ ;; For some reason, native compilation breaks VM. As a workaround until the
+ ;; problem is understood and fixed, disable native compilation of all VM lisp files.
+ (eval-after-load "comp"
+     '(if (boundp 'native-comp-deferred-compilation-deny-list)
+         (add-to-list 'native-comp-deferred-compilation-deny-list "/vm.*\.el"))) 

https://src.fedoraproject.org/rpms/emacs-vm/c/909b0bc357976252c51502bf17ed1efc6aeb7b97?branch=rawhide

I suppose similar could be done for ess if you are suffering from this issue.

Thanks very much for this lead! Can this be done locally by the user? In that case, I guess I put it in my local .emacs file?

@HinTak
Copy link
Author

HinTak commented Nov 26, 2022

About that compilation deny list being in .emacs, I believe so. I intend to give it a try at some point...

@maitra
Copy link

maitra commented Nov 26, 2022

I tried entering the following at the beginning (and separately, the end) of my .emacs file:


;; Startup settings for ESS (this is borrowed from VM)
;; 
;; For some reason, native compilation breaks VM. As a workaround until the
;; problem is understood and fixed, disable native compilation of all VM
;; lisp files.
(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/ess.*\.el")))

Unless I am making a mistake here in making the change from .vm to .ess, I got no different results than before, and the system become unusable.

@HinTak
Copy link
Author

HinTak commented Nov 26, 2022

Pretty sure you did wrong. The "/" at the beginning probably have special meaning, as in if the directive is not from a file in the same directory, probably need full path or something there. Need to consult the actual documentation of using the deny from a config file located elsewhere from the native-compiled file.

@HinTak
Copy link
Author

HinTak commented Nov 26, 2022

Also for ess, you need to get rid of the "." after - ess files are named "ess.el" and "ess-*.el", with a "-".

@maitra
Copy link

maitra commented Nov 26, 2022

Thanks! Sorry, but including the entire path had not much effect. I do get stuck at Loading /usr/share/emacs/site-lisp/site-start.d/ess-init.el

I tried:

;; Startup settings for ESS (this is borrowed from VM)
;;
;; For some reason, native compilation breaks VM. As a workaround until the
;; problem is understood and fixed, disable native compilation of all VM
;; lisp files.
(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")))

Even explicitly including the entire path. Perhaps still not doing something correctly here.

@HinTak
Copy link
Author

HinTak commented Nov 26, 2022

If it is still getting stuck at loading ess-init.el, the obvious thing to try is to insert that fragment (still need to look up the syntax etc for those bits) into the very beginning of that file.

@maitra
Copy link

maitra commented Dec 4, 2022

If it is still getting stuck at loading ess-init.el, the obvious thing to try is to insert that fragment (still need to look up the syntax etc for those bits) into the very beginning of that file.

Which file? I would like to try and see if this can be resolved because as far as I am concerned, emacs has become unusable with 28.2 and emacs-ess.

@HinTak
Copy link
Author

HinTak commented Dec 4, 2022

ess-init.el, of course.

@maitra
Copy link

maitra commented Dec 4, 2022

ess-init.el, of course.

Currently, /usr/share/emacs/site-lisp/site-start.d/ess-init.el only has the following:

;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.

(require 'ess-site)

So, I put that text in here? I am confused.

@HinTak
Copy link
Author

HinTak commented Dec 5, 2022

Why is that confusing? Put some "native-compile-deny..." stuff at the top of ess-init.el, before the "(require..." line, seems the obvious thing to try.

@maitra
Copy link

maitra commented Dec 5, 2022

Why is that confusing? Put some "native-compile-deny..." stuff at the top of ess-init.el, before the "(require..." line, seems the obvious thing to try.

Honestly, I don't quite know what this means, and therefore I am flying blind here. So the file should be:

;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.
native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el"
(require 'ess-site)

Is this correct, or should it be something else? Thanks!

@HinTak
Copy link
Author

HinTak commented Dec 6, 2022

Hmm, I have over-estimated other people's knowledge of lisp. In a nutshell, ";" are comments and ignored, but "()" are meaningful.

So you need to do the whole:

(eval-after-load "comp"
    '(if (boundp 'native-comp-deferred-compilation-deny-list)
        (add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")))

If you are not sure.

In fact you only need this part,

(add-to-list 'native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el")

Since the other two lines are conditionals, and we already know they will be true. The "eval-after-load" part means "insert this and do it when". (Removing it means "do it now"). The "if boundp" part is a typical version check in emacs: instead of doing version checks for emacs, the emacs people recommends that you checks for the actual features you want to use. Thus, that section means "if it is possible to disable native compile by a deny list, please add to the deny list...". There is no need to have the "if it is possible to disable native comple by a deny list," part.

@maitra
Copy link

maitra commented Dec 6, 2022

Thanks, yes, you are over-estimating my knowledge of lisp. However, I tried as suggested which appears to be what I wrote above: my new ess-init.el file reads:


;;; Set up emacs-common-ess for Emacs.
;;;
;;; This file is automatically loaded by emacs's site-start.el
;;; when you start a new emacs session.

native-comp-deferred-compilation-deny-list "/usr/share/emacs/site-lisp/site-start.d/ess*\.el"
(require 'ess-site)

However, I get:

Loading /usr/share/emacs/site-lisp/site-start.d/auctex.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/auto-complete-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/autoconf-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-format.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-include-fixer.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/clang-rename.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/cmake-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/desktop-entry-mode-init.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/emacs-goodies-loaddefs.el (source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/ess-init.el (source)...
load: Symbol’s value as variable is void: native-comp-deferred-compilation-deny-list

Is this last bit what I should be getting? Not sure. Also, ess does not seem to be loading anymore. Thanks!

@HinTak
Copy link
Author

HinTak commented Dec 17, 2022

I tried everything we discussed so far, and nothing seems to stop native compilation. So I think I'll try one very tedious thing next: there is one line you can insert to a *.el file to say, "don't native compile me". The entire ess file set only have 50-60 *.el file. So it is just insert that 60 times. (Unfortunately it needs to be the first line). Some of it (40+ or so) can be programmatic. So probably will take 30 minutes to do it all, auto plus some 15 manual editing.

@HinTak
Copy link
Author

HinTak commented Dec 18, 2022

My barbarism (inserting 40+ "don't compile me" into 40+ *.el files) - seems to work.

@HinTak
Copy link
Author

HinTak commented Dec 18, 2022

I did it programmatically manipulating those lines already with "lexical-binding: t". It is a bit disgusting that "make all" actually goes online and fetch two extra *.el files???

@HinTak
Copy link
Author

HinTak commented Dec 18, 2022

Native compilation still does not like the remaining *.el and tries to native compile them each launch without success, but at least it seems to stop after a short while, instead of going out of control.

I see a few ess files actually do "(require 'compile)", which is probably the source of this problem.

@HinTak
Copy link
Author

HinTak commented Dec 18, 2022

Hurray, I think I understand the bug now, and it is generic to emacs. I have a very simple work-around. The workaround is this:

When you launch emacs, and it starts to eat resources and spawn a lot of process of the form:

/usr/bin/emacs --batch -l /tmp/emacs-int-comp-subr--trampoline-64656c6574652d63686172_delete_char_0-QeNLBe.el

Do a "killall emacs" to kill them all. You should have lots of "emacs-int-comp-subr--trampoline*.el" left overs in /tmp. Pick one, run this:

/usr/bin/emacs -Q --batch -l /tmp/one-of-those-files

Note the -Q there, that's important!!! That's it. Now you launch emacs, it should smoothly native-compile ess (i.e. it would spawn one or two new processes for a while, until it has done about 50 of them, quite gradually).

I think it is some kind of race condition: to do any native compilations at all, a natively compiled trampolline must first be built; Without the "-Q", when emacs tries to build the trampoline, it loads ESS before the build, and thus the probem escalates.

I found this out by scattering a lot of "no-native-compile" into ess's el files. That slows down self-multiplying native compilation of ESS itself enough, that once in a while the trampoline gets built and one of my accounts works afterwards. Deleting the native cache gets me back to the old situation, some accounts (I was trying things out with both root and user) still have a copy of ~/.emacs.d/eln-cache/28.1-b1f2d84a/subr--trampoline-64656c6574652d63686172_delete_char_0.eln, copying it over makes emacs works with another account again. Experimented with a "zero-sized" file for that, it stopped native compilation (I don't have R installed, so don't know if it works that way or not) all together. Then figured out how to make it by hand with -Q.

@HinTak
Copy link
Author

HinTak commented Dec 19, 2022

Filed upstream as https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60208 , '28.1; Resource exhaustion with emacs 28's native compilation; need "-Q" for trampoline'

@HinTak
Copy link
Author

HinTak commented Dec 20, 2022

Just saving people looking at the upstream gnu.org exchange - turn out the fact that Fedora has a "/usr/share/emacs/site-lisp/site-start.d/ess-init.el" , which contains the single line, "(require 'ess-site)", is important. Emacs doesn't do recursive loading via user config (~/.emacs) when native-compiling. But site-wide auto-loading via /usr/share/emacs/site-lisp/site-start.d/ess-init.el is not currently catered for.

At the moment the fix is looking to be a new emacs release .

@HinTak
Copy link
Author

HinTak commented Dec 22, 2022

Fix pushed to emacs 29 branch https://git.savannah.gnu.org/cgit/emacs.git/commit/?h=emacs-29

@juhp
Copy link

juhp commented Dec 22, 2022

I think you mean this commit

@maitra
Copy link

maitra commented Dec 22, 2022

Thanks, Emacs 29.1 is expected to be released in spring 2023: perhaps this means a wait of six months. Hopefully the patch is not big enough to also make it to Emacs 28.2.

@HinTak
Copy link
Author

HinTak commented Dec 22, 2022

Yes, it is just a two-line code change - should be easy enough for backport either way (patching by distro packager, or emacs 28.2). I'll ask if they could do it in 28.x too.

@HinTak
Copy link
Author

HinTak commented Dec 22, 2022

It is knowing where to stick the "-Q" in, that's the hard part. The change is very small. I filed at redhat bugzilla to get it backported anyway. Other affected distribution might want to do that too, before if/when 28.2 includes the diff.

@HinTak
Copy link
Author

HinTak commented Dec 22, 2022

Upstream says there is no plan for a 28.x . I already filed with redhat to get it back ported at the distro packaging level. Any non-redhat people affected by this here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants