Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding remacs goals and design decisions #27

Closed
maplant opened this issue Jan 12, 2017 · 14 comments
Closed

Regarding remacs goals and design decisions #27

maplant opened this issue Jan 12, 2017 · 14 comments

Comments

@maplant
Copy link

maplant commented Jan 12, 2017

I would like to clarify some of remac's design goals here.
I've been using and following the development of emacs for I guess eight years now, and while that tenure pales in comparison to many others I do have a clear understanding of emac's successes and failings.
For one thing, emacs LISP is not particularly fast even with when it's JIT'd.
One thing that needs to be clarified: are we required to preserve emac's bytecode?
I would like to argue against doing that.
One thing that would be nice is leveraging LLVM to compile emac's lisp into bitcode, which can also be JIT'd or AOT'd. The difference here is that we can discard emac's slow bytecode which uses a stack machine for a more efficient closer-to-the-machine format.
It also allows us to be completely language agnostic. Obviously we want to preserve compatibility with emacs lisp but now we can separate our concerns a little better. Remacs could include a FFI and a emacs lisp compiler AND a number of other compilers such as for Scheme or any language that has a compiler to LLVM.
Another problem that comes up here is how closely aligned this project is to the philosophical goals of the emacs and larger GNU project. One long standing argument in the emacs-dev list is that people writing emacs extensions would really like to have the AST for whatever language they are inspecting. For C, that would mean using GCC as some sort of plug-in since GCC is already part of the GNU project. But RMS does not want GCC to be open like that so development for more syntactic aware extensions is at a stalemate. If we ignore the philosophical ideals of emacs we can simply use LLVM or other tools that are not GPL licensed.
I hope that by answering these questions we can clarify the design goals of remacs slightly.

@roobie
Copy link

roobie commented Jan 12, 2017

Would that require both a LLVM front-end and back-end to be implemented? It's a cool idea, notwithstanding.

In my opinion, the project should not be required to adhere to an ideology that is stalling the continued development and evolution of a piece of software. LLVM offers better integration possibilities? Then why not use it?

@maplant
Copy link
Author

maplant commented Jan 12, 2017

@roobie well LLVM is a back-end so no, just a front-end. There are some LLVM crates for rust around I believe.

@Wilfred
Copy link
Collaborator

Wilfred commented Jan 12, 2017

I'm open to exploring changes to the bytecode. We could add a version header so that we don't end up trying to execute old bytecode versions.

I'm also open to exploring combining with other tools, if they're GPL compatible (which MIT/BSD projects are).

Note that Emacs 25 already has an FFI. If you're interested in exploring LLVM bitcode, it's not machine agnostic.

Any modifications to elisp's execution, bytecode/JIT/AOT would still need to preserve lispiness: a user should be able to redefine or inspect any function or dynamically bound variable, and/or step through execution.

@maplant
Copy link
Author

maplant commented Jan 12, 2017

@Wilfred My question is if it's necessary to include a bytecode at all. Consider that emacs offers no real portability guarantees, the bytecode is inherently slow and not a distributed format.
Consider what the email you linked is claiming: that LLVM bitcode is not a good format for storing programs that may run on multiple different architectures. Emacs bytecode does not really solve this goal. Files are compiled locally for a single architecture and are often JIT compiled anyway.
And because this JITing already takes place any change in the architecture we might introduce would certainly be preserving in the lispiness of the system.

@dpzmick
Copy link
Contributor

dpzmick commented Jan 12, 2017

The llvm crates are not wonderful, although it looks like some new ones have popped up since the last time I used one. I wouldn't go for LLVM in this case anyway. The LLVM jit is pretty inflexible and it introduces quite a bit of overhead. I don't think there are good bindings for these either, but dynasm or luajit or libjit or any of the others might be a bit easier to work with.

also, theres some interesting work going on here: https://github.com/burtonsamograd/emacs-jit

@maplant
Copy link
Author

maplant commented Jan 13, 2017

@dpzmick I'm not suggesting LLVM would necessarily be used, I'm just wondering why the architecture of emacs is necessarily bound to executing a language dynamically. It seems to me a that emacs on one level is a GUI editing library which can accept function hooks for input and be introspected and on another level a bytecode interpreter.

I'm wondering: are there cases where bytecode interpreting is really faster than a compiled program in memory? Because it is an editing environment functions are called a lot. I highly doubt there are many cases. It wouldn't just be an improvement in the function speed, but the caching ends up being better in the end as well.

The only time I can think of bytecode being faster is extremely dynamic or polymorphic cases like eval on strings or something. Otherwise I think it's strictly better.

@roobie
Copy link

roobie commented Jan 15, 2017

I think we all agree on that the compiled format is not required to be distributed. Remacs should compile source code for the system it currently runs on.

@DataAnalysisCosby so, this issue is not whether or not to use LLVM, specifically, but rather if Remacs should forgo the current elisp bytecode as its compiled format to another, statically typed compiled format?

@c-nixon
Copy link
Collaborator

c-nixon commented Jan 15, 2017

One very r consideration is that an ahead of time compiled system would have to support emacs lisp's runtime mutability. A lot of guard code would need to be inserted. I suppose other Lisp implementations have had to deal with this before...

@maplant
Copy link
Author

maplant commented Jan 17, 2017

@roobie we are in agreement there.
My question is: why compile elisp to machine code and then bytecode when we could simply compile it directly to machine code and have it be more efficient?

@talwrii
Copy link

talwrii commented Jan 20, 2017

Dynamic nature of lisp and implications for jits

Expanding on c-nixon's comments, elisp allows you to:

  1. Swap out globals using dynamic scoping (I use this a lot when interacting with libraries. I'd go so far as to say it's a feature)
(let ((mode-global-setting x))
    (mode-global-setting))
  1. Swap out function names using dynamic scoping (I've used this once before) see noflet.

In this this context every function call and global lookup becomes a name lookup in a dictionary (late binding), with basically no types being known.

Your jitting then doesn't really do anything -- or your guards must look at every global variable used by, and function call made by, your jit'd function.

Cython as a case study

Cython is a python compiler (rather than jit) where the underlying language is similarly dynamic. It allows the addition of type annotation to allow faster compilation.

An interesting observation is that the compiled code with no annotations is still about twice as fast as the python interpreter... though perhaps this is mostly achieved through the loss of debug info.

Relevant prior work

JavaScript v8 (this uses a jit I think), pypy (does jitting of python)

Caveat

I'm not sure how noflet works it might actually be macro magic, or call time patching rather than dynamic scopes.

@talwrii
Copy link

talwrii commented Jan 20, 2017

The argument for bytecode

Bytecode is simple to debug and provides you with useful information at runtime cheaply at the price of speed

  • Easy debugging of your bytecode compilation step (bytecode tends to be pretty readable)
  • Simple relationship between code and bytecode makes implementing debug features easier. (Though you can leverage others work here, e.g llvm tracing information)
  • Historically you could just wait a year or so for a CPU core to double in speed (this appears to no longer be true)

@jeandudey
Copy link
Contributor

jeandudey commented Jan 22, 2017

What about using Rust MIR? see this.

@JAremko
Copy link

JAremko commented Jan 23, 2017

But should you lock yourself into using %emacs_lisp_something% instead of designing Remacs in a way that interpreter/vm/whatever will be sufficiently decoupled and easy to swap? There was many attempts to replace Emacs Lisp with something better. I don't think that the programming community will simply give up and accept Emacs Lisp as "the one true language". Eventually something like Guile Emacs (or a project with totally different approach) will be good enough to became "Emacs Lisp" for the next few decades. If Remacs project will focus its resources on other Emacs parts and providing as good as humanly possible testing ground for such projects then there will be decent chance that the Emacs of the future with the "New Emacs Lisp" will be based on Remacs.

@Wilfred
Copy link
Collaborator

Wilfred commented Apr 11, 2017

Now we have a Gitter chat room: https://gitter.im/remacs-discuss/Lobby I'm going to close this issue in favour of us having discussion there. I don't think this is an actionable issue.

@Wilfred Wilfred closed this as completed Apr 11, 2017
jeandudey pushed a commit to jeandudey/remacs that referenced this issue May 29, 2017
The recent changes to src/casefiddle.c cause build failure as seen
below:

    Starting program: /home/npostavs/src/emacs/emacs-bootstrapping/src/temacs
	--batch --load loadup bootstrap
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/usr/lib/libthread_db.so.1".
    Loading loadup.el (source)...
    Using load-path (/home/npostavs/src/emacs/emacs-bootstrapping/lisp
	/home/npostavs/src/emacs/emacs-bootstrapping/lisp/emacs-lisp
	/home/npostavs/src/emacs/emacs-bootstrapping/lisp/language
	/home/npostavs/src/emacs/emacs-bootstrapping/lisp/international
	/home/npostavs/src/emacs/emacs-bootstrapping/lisp/textmodes
	/home/npostavs/src/emacs/emacs-bootstrapping/lisp/vc)
    Loading emacs-lisp/byte-run (source)...
    Loading emacs-lisp/backquote (source)...
    Loading subr (source)...
    Loading version (source)...
    Loading widget (source)...
    Loading custom (source)...
    Loading emacs-lisp/map-ynp (source)...
    Loading international/mule (source)...
    Loading international/mule-conf (source)...

    lread.c:3914: Emacs fatal error: assertion failed: !NILP (Vpurify_flag)

    Breakpoint 1, terminate_due_to_signal at emacs.c:363
    363	  signal (sig, SIG_DFL);
    (gdb) bt
    #0  0x0000000000579826 in terminate_due_to_signal at emacs.c:363
    remacs#1  0x000000000060ec33 in die at alloc.c:7352
    remacs#2  0x000000000066db40 in intern_c_string_1 at lread.c:3914
    remacs#3  0x0000000000576884 in intern_c_string at lisp.h:3790
    remacs#4  0x00000000005dc84f in prepare_casing_context at casefiddle.c:69
    remacs#5  0x00000000005dd37f in casify_object at casefiddle.c:311
    remacs#6  0x00000000005dd47f in Fcapitalize at casefiddle.c:356
    remacs#7  0x00000000006325ac in eval_sub at eval.c:2219
    remacs#8  0x0000000000632368 in eval_sub at eval.c:2184
    remacs#9  0x000000000063446c in apply_lambda at eval.c:2875
    remacs#10 0x00000000006329af in eval_sub at eval.c:2294
    remacs#11 0x000000000062d462 in Fprogn at eval.c:449
    remacs#12 0x000000000062d4cf in prog_ignore at eval.c:461
    remacs#13 0x000000000062f19c in Fwhile at eval.c:982
    remacs#14 0x00000000006321f4 in eval_sub at eval.c:2172
    remacs#15 0x000000000062d462 in Fprogn at eval.c:449
    remacs#16 0x000000000062f0c4 in Flet at eval.c:963
    remacs#17 0x00000000006321f4 in eval_sub at eval.c:2172
    remacs#18 0x0000000000632963 in eval_sub at eval.c:2290
    remacs#19 0x000000000062d462 in Fprogn at eval.c:449
    remacs#20 0x000000000062f0c4 in Flet at eval.c:963
    remacs#21 0x00000000006321f4 in eval_sub at eval.c:2172
    remacs#22 0x0000000000668caa in readevalloop at lread.c:1927
    remacs#23 0x0000000000667253 in Fload at lread.c:1332
    remacs#24 0x0000000000632683 in eval_sub at eval.c:2233
    remacs#25 0x0000000000668caa in readevalloop at lread.c:1927
    remacs#26 0x0000000000667253 in Fload at lread.c:1332
    remacs#27 0x0000000000632683 in eval_sub at eval.c:2233
    remacs#28 0x0000000000631be5 in Feval at eval.c:2041
    remacs#29 0x000000000057e1af in top_level_2 at keyboard.c:1121
    remacs#30 0x000000000062ffc7 in internal_condition_case at eval.c:1324
    remacs#31 0x000000000057e1f0 in top_level_1 at keyboard.c:1129
    remacs#32 0x000000000062f51e in internal_catch at eval.c:1091
    remacs#33 0x000000000057e0ea in command_loop at keyboard.c:1090
    remacs#34 0x000000000057d6d5 in recursive_edit_1 at keyboard.c:697
    remacs#35 0x000000000057d8b4 in Frecursive_edit at keyboard.c:768
    remacs#36 0x000000000057b55b in main at emacs.c:1687

    Lisp Backtrace:
    "capitalize" (0xffffcf70)
    "format" (0xffffd130)
    "define-charset" (0xffffd370)
    "while" (0xffffd560)
    "let" (0xffffd7c0)
    "dolist" (0xffffd910)
    "let" (0xffffdb70)
    "load" (0xffffdfe0)
    "load" (0xffffe4a0)

* src/casefiddle.c (syms_of_casefiddle): Declare four new symbols:
Qtitlecase, Qspecial_uppercase, Qspecial_lowercase and
Qspecial_titlecase.
(prepare_casing_context): Use aforementioned symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants