-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CRaC POC #8743
base: main
Are you sure you want to change the base?
[WIP] CRaC POC #8743
Conversation
Workaround for: Error (criu/cr-dump.c:203): 18 has rseq but kernel lacks get_rseq_conf feature Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
Hi @danielkec , I've played with this a bit to let CRaC checkpoint the webserver after it starts: https://github.com/rvansa/helidon/tree/crac-poc |
Referring to my changes ^: Actually, the case where a |
Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
|
||
curl --retry 10 --retry-all-errors --retry-delay 1 http://localhost:7001 | ||
printf "\n==== Warming up ...\n" | ||
wrk -c 16 -t 16 -d 10s http://localhost:7001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rvansa thx for cool fix and sorry for the delay. It seems to be working for me, but when I do a little warmup before the snapshot, snapshot fails with:
An exception during a checkpoint operation:
jdk.internal.crac.mirror.CheckpointException
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=165 type=unknown path=anon_inode:[eventpoll]
at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:299)
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=183 type=unknown path=anon_inode:[eventpoll]
at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:299)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a native component opening those epoll FD; I wonder why this didn't pop up with a single request for warmup. Native FDs ask for investigation through strace
: https://github.com/CRaC/docs/blob/master/debugging.md#file-descriptors-in-native-code
I'll try to reproduce locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielkec, I can confirm this is an issue on JDK (CRaC) side. There are some codepaths in sun.nio
triggered when the socket is created from a virtual threads, and we did not have test coverage for that case.
Helidon MP on CRaC
Coordinated Restore at Checkpoint
Helidon MP Implicit example on CRaC
examples/crac/README.md