Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] json dynamic module #676

Closed
yyoncho opened this issue Feb 28, 2019 · 18 comments
Closed

[Discussion] json dynamic module #676

yyoncho opened this issue Feb 28, 2019 · 18 comments

Comments

@yyoncho
Copy link
Member

yyoncho commented Feb 28, 2019

The performance bottleneck of lsp-mode is json parsing(50%+ of the time) and we already have a solution for Emacs 27+ via new native parsing. This PR is for tracking the discussion about solving this problem for emacs 25-26 with writing emacs dynamic module - https://www.gnu.org/software/emacs/manual/html_node/elisp/Dynamic-Modules.html .

We have some code in https://github.com/sebastiencs/emacs-json (see sebastiencs/emacs-json#1 )

@innerout volunteered to work on a such module In C/C++(and eventually RUST?) but we need a general agreement for the approach from the team members and the active contributors.

Also, I was thinking about creating a JSONRPC dynamic module which will do the message handling in separate thread(not blocking the Emacs UI thread) and then post the messages to Emacs. (not sure whether this is possible or whether it might cause Elisp GC failures)

(cc @MaskRay @sebastiencs @innerout @vibhavp @danielmartin @alanz)

@alanz
Copy link
Contributor

alanz commented Feb 28, 2019

It sounds like opening up a world of hurt, when we have to have a compiled module available on all the platforms that emacs supports.

Perhaps if it could be something that can be used if available, but falls back to the current one if needed.

And having some sort of wrapper executable that can talk s-exps on the one side and JSON-RPC on the other might provide a solution too.

@yyoncho
Copy link
Member Author

yyoncho commented Feb 28, 2019

@alanz

Perhaps if it could be something that can be used if available, but falls back to the current one if needed.

That's the plan.

And having some sort of wrapper executable that can talk s-exps on the one side and JSON-RPC on the other might provide a solution too.

Yes, this is an option too - we could use a tool to convert the json to the s-exps and then use built-in read-from-string which is written in C and it is very fast.

(Haskell is also on the table via https://github.com/sergv/emacs-module )

@danielmartin
Copy link
Contributor

What are the pros and cons of each option? What tool do you have in mind to convert between JSON to sexps? Is there anything already available? What's the benefit of using Haskell for this instead of C++/Rust?

@yyoncho
Copy link
Member Author

yyoncho commented Mar 1, 2019

@danielmartin

What are the pros and cons of each option?

We need to investigate what will work best given the present limitations. E. g. if we cannot do fast conversion to Elisp structures in emacs dynamic module then we may try the intermediate conversion to s-exps.

What tool do you have in mind to convert between JSON to sexps? Is there anything already available?

I am not aware of such tool. We may write a command line tool or dynamic module which converts the jsons to serializes elisp data structures .

What's the benefit of using Haskell for this instead of C++/Rust?

I added it for completenes. If it is easy to distribute it and if it is fast enough I think that there is no reason stopping us from using haskell...(and if someone wants to do it).

@yyoncho
Copy link
Member Author

yyoncho commented Mar 4, 2019

There is an article on how to write a async dynamic module by @skeeto - https://nullprogram.com/blog/2017/02/14/ .

@cireu
Copy link
Contributor

cireu commented Mar 5, 2019

Will it be possible to port Emacs 27's json.c to dynamic module in a simple way?

@yyoncho
Copy link
Member Author

yyoncho commented Mar 5, 2019

@cireu - it is an option too - IMO the main problem is how to convert effectively the parsed structure to elisp data structure and we cannot use json.c code for that since it has access to the internal emacs api(e. g. it can call the methods directly).

There is also: https://github.com/syohex/emacs-parson (which I havent had a chance to test).

@yyoncho
Copy link
Member Author

yyoncho commented Mar 6, 2019

I found the following post by @Alexander-Miller : https://www.reddit.com/r/emacs/comments/8plc1c/a_recipe_for_creating_parallel_futures_using/

It shows how to do async processing in emacs module which seems like exactly what we want to do.

@Alexander-Miller
Copy link

It shows how to do async processing in emacs module which seems like exactly what we want to do.

I haven't done any real performance testing with this thing, so you should be careful with that. One basic optimization I can think of is not to spawn a new elisp thread for every call that is being made. By the way have you tried async.el for non-blocking parsing? This seems like the kind of use-case it'd be well suited for.

IMO the main problem is how to convert effectively the parsed structure to elisp data structure

IME a native module will do you little good if you're constantly calling back to the emacs runtime to make your conversions. That comes with its own considerable call and marshalling overhead, especially when you're dealing with strings. Whatever you do, you should do it in one go.

Creating a string and feeding it straight into the interpreter is definitely worth trying, and I successfully used this method in treemacs where one of my python scripts directly returns a string like "#s(hash-table ...)".

And a note about languages: I tried to write a small dynamic module for treemacs in both C and Rust. The C version was incredibly verbose and error- and segfault-prone and just plain took a great deal of time and effort to get working (admittedly I don't really know any C). Rust was the complete opposite, its emacs bindings provide a great set of abstractions, so for the most part I didn't even notice that I was writing code for emacs. I didn't cause a single segfault or memory error either. The Rust code was even significantly faster, thanks to the WalkDir library. So yeah, rustc is not as widely available as gcc, but it will make actually writing a working module a whole lot easier.

@yyoncho
Copy link
Member Author

yyoncho commented Mar 6, 2019

@Alexander-Miller thank you for jumping in.

By the way have you tried async.el for non-blocking parsing?

We need non-blocking and fast(fast is more important). Correct me if I am wrong but if we delegate the parsing to emacs-async process this will slow down the parsing.

IME a native module will do you little good if you're constantly calling back to the emacs runtime to make your conversions.

Thanks for the info. We still need to benchmark such code vs json.el parsing but it seems like it may not result in the needed speedup.

@Alexander-Miller
Copy link

Correct me if I am wrong but if we delegate the parsing to emacs-async process this will slow down the parsing.

It's non-blocking, but it won't be faster - you still have the launch and marshal overhead from the secondary emacs process.

We still need to benchmark such code vs json.el parsing but it seems like it may not result in the needed speedup.

You're probably better off transforming the json into sexps. Also depending on the same of the responses you might be better off deserializing into hash tables instead of alists. The cut-off point for the former being faster is at about 20 elements.

@ubolonton
Copy link

Also, I was thinking about creating a JSONRPC dynamic module which will do the message handling in separate thread(not blocking the Emacs UI thread) and then post the messages to Emacs.

I think this is the most promising approach. If we keep doing it in the UI thread, there will still be UI freezes, even with json.c, for large payloads. Moreover, a dynamic module won't have access to Emacs internals, like json.c does, so there will be more overhead (data copies, calls into Emacs runtime, unavoidable string validations, ...).

(not sure whether this is possible or whether it might cause Elisp GC failures)

There shouldn't be an issue here. It's a matter of correctly using the ref-count interface (make_global_ref, free_global_ref) for dynamic module. The Rust binding already handles this, so it should be easier if we use Rust.

IME a native module will do you little good if you're constantly calling back to the emacs runtime to make your conversions. That comes with its own considerable call and marshalling overhead, especially when you're dealing with strings. Whatever you do, you should do it in one go.

This is my worry as well. One solution would be wrapping the parsed native messages in opaque user-ptr types, providing functions for partial, on-demand access. Conversion would only happen for "interesting" fields, or even ranges within lists, when/if needed.

Do we have some automated editing scenarios, e.g. replaying editing command sequences on a particular codebase? They would be useful in prototyping/benchmarking solutions.

@yyoncho
Copy link
Member Author

yyoncho commented Mar 24, 2019

@ubolonton

Moreover, a dynamic module won't have access to Emacs internals, like json.c does, so there will be more overhead (data copies, calls into Emacs runtime, unavoidable string validations, ...).

Both dynamic modules and json.c are considered external code and both perform validation of the strings which at this point is very slow because it may trigger external hooks, see https://lists.gnu.org/archive/html/bug-gnu-emacs/2019-03/msg00722.html .

    [-] Fmessage@0x00005555557901a5 (editfns.c:2861)
    [-] eval_sub@0x000055555579c284 (eval.c:2286)
    [-] Fprogn@0x0000555555798373 (eval.c:478)
    [-] eval_sub@0x000055555579c07d (eval.c:2267)
    [-] Fif@0x000055555579824d (eval.c:433)
    [-] eval_sub@0x000055555579c07d (eval.c:2267)
    [-] Fprogn@0x0000555555798373 (eval.c:478)
    [-] funcall_lambda@0x000055555579e89a (eval.c:3116)
    [-] Ffuncall@0x000055555579dc11 (eval.c:2861)
    [-] funcall_not@0x000055555579ce84 (eval.c:2547)
    [-] run_hook_with_args@0x000055555579d1d9 (eval.c:2655)
    [-] Frun_hook_with_args_until_failure@0x000055555579cede (eval.c:2565)
    [-] Fkill_buffer@0x0000555555710525 (buffer.c:1709)
    [-] code_conversion_restore@0x000055555565802c (coding.c:7849)
    [-] do_one_unbind@0x000055555579f6af (eval.c:3491)
    [-] unbind_to@0x000055555579fa38 (eval.c:3618)
    [-] decode_coding_object@0x0000555555659825 (coding.c:8215)
    [-] code_convert_string@0x000055555565e249 (coding.c:9468)
    [-] code_convert_string_norecord@0x000055555565e30d (coding.c:9488)
    [-] module_make_string@0x00005555557d8f0d (emacs-module.c:597)
    [-] emacs::convert::<impl emacs::IntoLisp<’e> for &’a T>::into_lisp@0x00007fffdeafe8c1 (Unknown source)
    [-] emacs_json_lsp::json_to_lisp@0x00007fffdeb0117e (Unknown source)
    [-] emacs_json_lsp::json_to_lisp@0x00007fffdeb0169f (Unknown source)
    [-] emacs_json_lsp::json_to_lisp@0x00007fffdeb012b3 (Unknown source)
    [-] emacs_json_lsp::json_to_lisp@0x00007fffdeb0169f (Unknown source)
    [-] emacs_json_lsp::json_to_lisp@0x00007fffdeb0169f (Unknown source)
    [-] emacs_json_lsp::parse_string@0x00007fffdeb019a9 (Unknown source)
    [-] std::panicking::try::do_call@0x00007fffdeb02a9f (Unknown source)
    [-] __rust_maybe_catch_panic@0x00007fffdeb2f26a (src/libpanic_unwind/lib.rs:102)
    [-] <emacs::CallEnv as emacs::func::HandleCall>::handle_call@0x00007fffdeb0277a (Unknown source)
    [-] emacs_json_lsp::init::extern_lambda@0x00007fffdeb01cc4 (Unknown source)
    [-] funcall_module@0x00005555557da628 (emacs-module.c:797)
    [-] funcall_lambda@0x000055555579e5d1 (eval.c:3055)
    [-] apply_lambda@0x000055555579e2fa (eval.c:2981)

I wonder whether that conversion could be optional in Emacs e. g. whether emacs could have a flag "trust-this-source and do not do the conversion/validation"(actually whether emacs-devel will accept such a behavour). If we have that we will be able to match the performance of native Emacs code?

There shouldn't be an issue here. It's a matter of correctly using the ref-count interface (make_global_ref, free_global_ref) for dynamic module. The Rust binding already handles this, so it should be easier if we use Rust.

Are you saying that emacs dynamic module could create a thread use the Env to create emacs primitives?

This is my worry as well. One solution would be wrapping the parsed native messages in opaque user-ptr types, providing functions for partial, on-demand access. Conversion would only happen for "interesting" fields, or even ranges within lists, when/if needed.

I was thinking about the same but I wonder whether there is the same performance overhead when you are calling dynamic module from elisp code and we will end up with worse performance when you want to consume the whole input.

Do we have some automated editing scenarios, e.g. replaying editing command sequences on a particular codebase? They would be useful in prototyping/benchmarking solutions.

No, I have few sample responses.

@ubolonton
Copy link

I wonder whether that conversion could be optional in Emacs e. g. whether emacs could have a flag "trust-this-source and do not do the conversion/validation"(actually whether emacs-devel will accept such a behavour). If we have that we will be able to match the performance of native Emacs code?

We can request emacs-devel to add the ability to bypass string validation with a flag, as you suggested, or through new additions to emacs-module.h. Failing that, we can bypass the validation on our own, using internal APIs (which json.c has access to, but doesn't use). We just need to find relevant header files from Emacs source. That's a hack that won't sit well with emacs-devel though.

Are you saying that emacs dynamic module could create a thread use the Env to create emacs primitives?

There are 2 potential approaches:

  1. Native background thread:
    • Native code spawns background thread.
    • Background thread runs native code only.
    • Background thread receives messages, parses, puts native data in shared buffer.
    • Background thread notifies UI thread and continues.
    • Native code in UI thread accesses the shared buffer.
    • Native code in UI thread uses Env to convert native data into Lisp objects (either fully, or lazily+partially (preferably), through opaque user-ptr objects).
  2. Lisp background thread (Emacs 26+):
    • Lisp code spawns background thread.
    • Background thread releases GIL, "switches" to native.
    • Native code in background thread receives messages, parses into native data.
    • Background thread re-acquires GIL.
    • Native code in background thread uses Env to fully convert native data into Lisp objects.
    • Background thread releases GIL.
    • Background thread puts these Lisp objects into the shared buffer.
    • Background thread notifies UI thread and continues.
    • Lisp code in UI thread picks up objects from shared buffer.

I'm not sure yet whether 2. will work. It depends on how functions from emacs-module.h interact with the GIL. I'll experiment with it.

I was thinking about the same but I wonder whether there is the same performance overhead when you are calling dynamic module from elisp code and we will end up with worse performance when you want to consume the whole input.

If we consume the whole output, the total overhead will be the same. Threads and user-ptr allow splitting it into smaller, optional, non-blocking chunks.

@yyoncho
Copy link
Member Author

yyoncho commented Mar 25, 2019

Failing that, we can bypass the validation on our own, using internal APIs (which json.c has access to, but doesn't use). We just need to find relevant header files from Emacs source. That's a hack that won't sit well with emacs-devel though.

Thank you for mentioning that option..

Few comments:

  • Option 1 wont be sufficient without bypassing the validations since most of the time (e. g. 90%+) is spent in conversion from serde structs to elisp structures.

  • Option 2 Can we aim higher by doing the conversion using Env in the native thread? I think that will be possible if the env methods do not call any elisp functions (e. g. if we have Env->make_map, etc) and each of them is backed by a C function so it wont require GIL? I am really speculating though.

If we consume the whole output, the total overhead will be the same. Threads and user-ptr allow splitting it into smaller, optional, non-blocking chunks.

You might be right but that would require lsp-mode to be able to work against two different structures since in case we make the dynamic module optional.

@ubolonton
Copy link

Option 2 Can we aim higher by doing the conversion using Env in the native thread? I think that will be possible if the env methods do not call any elisp functions (e. g. if we have Env->make_map, etc) and each of them is backed by a C function so it wont require GIL? I am really speculating though.

I did some experiment and checked emacs-module.c code. This is not possible. Calling any module functions (including conversion functions) requires holding the GIL. Even worse, "release GIL, switch to native, re-acquire GIL" is not possible. That means if a native-code thread is to be run in parallel with the UI thread, it can never access an Env. Until emacs-module.h exposes an explicit mechanism to acquire/release the GIL (like Python.h), "option 2" is not an option.

You might be right but that would require lsp-mode to be able to work against two different structures since in case we make the dynamic module optional.

Right. I think we should further investigate option 1 with full conversion, and experiment with:

  • Using a validation-less internal conversion API (just to see the actual performance implication).
  • Adding another background Lisp thread to do the native -> Lisp conversion. The goal of this conversion thread is splitting up the latency perceived by the UI thread, by calling thread-yield multiple times during conversion.

@yyoncho
Copy link
Member Author

yyoncho commented Feb 22, 2020

Closing as won't fix - apparently, this is not going to be implemented generally due to the emacs dynamic module limitations. Feel free to reopen it if you want to work on that

@yyoncho yyoncho closed this as completed Feb 22, 2020
@yyoncho
Copy link
Member Author

yyoncho commented Jun 7, 2020

If we consume the whole output, the total overhead will be the same. Threads and user-ptr allow splitting it into smaller, optional, non-blocking chunks.

You might be right but that would require lsp-mode to be able to work against two different structures since in case we make the dynamic module optional.

Just want to resurrect this old idea. With lsp-protocol.el we have almost abstracted all of the access to the protocol data structures so the ability to work against 2 data structures is kind of for free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants