Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt is broken on JVM 20+ #296

Open
bbatsov opened this issue Jul 3, 2023 · 12 comments
Open

Interrupt is broken on JVM 20+ #296

bbatsov opened this issue Jul 3, 2023 · 12 comments

Comments

@bbatsov
Copy link
Contributor

bbatsov commented Jul 3, 2023

nREPL relied on Thread.stop to kill the session thread when needed and the function was finally removed from Java in Java 20 (after being deprecated for ages). The current nREPL implementation tries Tread.interrupt first and fallbacks on Thread.stop if that fails.

Any ideas for fixing this on Java 20+ are welcome!

See clojure-emacs/cider#3351 for more details.

@socksy
Copy link

socksy commented Jul 3, 2023

For reference, this is how it was solved in jshell: openjdk/jdk#10166

Looks like they override the jdk internal(!) MethodVisitor to add a check to a new static method on every single JVM jump instruction. If a new allStop boolean is true, then it generates a ThreadDeath object and throws it. They then instrument all code to be run in their execution environment with it. Then instead of Thread.stop they set this allStop value to be true, then call Thread.interrupt on the original thread. So basically, reimplementing

Few things about that:

  • :\
  • ThreadDeath is explicitly deprecated too, so that doesn't sound like a super long term solution. I'm surprised they didn't use the same mechanism that .interrupt uses, but maybe that's not possible from the bytecode (I'm no JVM expert...)
  • it's not clear what performance impact this might have. I would be personally surprised if I was running code in the repl and it was dramatically slower in running than the final JAR — I would expect it to mess with my code
  • depending on internal classes is something jshell can do because it's part of the JDK, but I guess it's not something nREPL should be doing?
  • perhaps this could be done on a clojure level instead, which would certainly make it easier to reason about, but doesn't help you if you're using java interop and the looping is happening there instead

@bbatsov
Copy link
Contributor Author

bbatsov commented Jul 3, 2023

@socksy Thanks for sharing this! I'm hoping we'd be able to come up with something simpler and more future-proof, although all my research so far indicates that'd be pretty hard. In essence Thread.interrupt expects that the threads you'd be interrupting will be cooperating, but in our case we have no idea what the user is evaluating in the session thread that we'd like to interrupt...

@bsless
Copy link

bsless commented Jul 3, 2023

I found an official JDK guide regarding this deprecation. Anything there you could use?

@bbatsov
Copy link
Contributor Author

bbatsov commented Jul 4, 2023

@bsless I don't see anything that can help us there. As we're running user code on the session eval thread we can't really add code there that would make it interruptible.

@danielcompton
Copy link

@bbatsov Perhaps it's worth bringing this up on the JDK mailing list to see if they would reconsider removing it, perhaps putting it behind a flag?

@bbatsov
Copy link
Contributor Author

bbatsov commented Jul 4, 2023

I'm guessing that ship has sailed if there's already a release without it. I've never heard of the JDK team going back on a removal.

@socksy
Copy link

socksy commented Jul 4, 2023

@bbatsov yeah, .interrupt only works if the user code respects it, which is why jshell is injecting in a boolean check in the jvm bytecode for every looping instruction — would that be possible here?

Alternatively, if you wanted to be (even more) hacky, perhaps we could write a fake Thread.stop library using pthreads directly in a JNI based library? 🙃

@bbatsov
Copy link
Contributor Author

bbatsov commented May 7, 2024

FYI - @alexander-yakushev proposed a potential fix for this in #318

@PEZ and @danielcompton - please, take a look, as the fix will require some client co-operation as it involves loading a JVMTI agent.

@PEZ
Copy link
Contributor

PEZ commented May 7, 2024

Thanks for headsup, @bbatsov!

In the PR you say that CIDER will inject this as jack-in params. Can you share what that would look like. As Calva's jack-in is quite similar, I think we can consider to do whatever CIDER does.

@alexander-yakushev
Copy link
Contributor

alexander-yakushev commented May 7, 2024

@PEZ

For tools.deps, this means adding -J-Djdk.attach.allowAttachSelf to the launch command. For Leiningen, it will be something similar to update :jvm-opts.

@cichli
Copy link
Member

cichli commented May 7, 2024

I looked at this issue recently too and came up with a POC that uses the JDWP agent and JDI to forcibly stop threads. The JDWP native lib has a way of stopping threads and is part of the JVM, so using it avoids the need to build our own native code. Unfortunately none of that lib is accessible via JNI (as far as I can tell), so we need to start the JDWP agent and communicate with it via JDI.

Just like the native agent solution, the JVM must be invoked with -Djdk.attach.allowAttachSelf. It also requires the JDWP agent to be started with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n – it can’t be loaded dynamically after VM startup. Fortunately, running the JDWP agent introduces very little overhead.

One big tradeoff is that JDWP only supports a single transport. As it stands now the POC would prevent the use of JDWP debuggers completely. This could be mitigated by either:

  • Only connecting to the JDWP agent when we need to stop a thread, and then disconnecting afterwards – this would mean a JDWP debugger could still be used, but interrupting REPL evals would not work while said debugger is connected.

  • Keeping the connection persistently but adding an agent to nREPL that acts as a proxy to the JDWP agent using its own socket – anyone who wants to use a debugger would connect to that instead.

It's a bigger imposition on users than #318 but less code and less build complexity. What do you think @alexander-yakushev @bbatsov?

@alexander-yakushev
Copy link
Contributor

Thank you, Michael! If only you had shown your POC earlier, I wouldn't bother with the agent, and now we wouldn't have the trouble of choice 😄.

I like the fact about JDWP approach that we don't have to compile and ship any native code. And I don't like that approach having a runtime impact. If I understand the numbers in the article that you linked correctly, then the runtime overhead from JDWP is ~6% which is significant (not arguing whether it is big or not, but it's not within a measurement error). What also bothers me is that the overhead exists even if the debugger is disabled which means it comes from the capabilities requested by JDWP in its Agent_OnLoad section. Those caps influence how JVM behaves in regard to optimizations, tweak JIT and inliner, so you get a differently behaving environment, even if slightly.

Here's the comparison of two solutions as a table:

Custom agent JDWP
Uses non-conventional Java features Yes Yes
Requires compiling native code for multiple platforms Yes No
Needs new Clojure code to be loaded Yes, ~50 lines Yes, ~30 lines
Requires additional Java options -Djdk.attach.allowAttachSelf -Djdk.attach.allowAttachSelf -agentlib:jdwp=transport=dt_socket,server=y,suspend=n
Has runtime overhead None that I'm aware of Yes
Has impact on VM behavior if not used No, or minimal Yes
Has additional restrictions Yes, only works on platforms that we compiled agent for Yes, conflicts with Java debugger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants