Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust agent crashes on shutdown #160

Closed
jodh-intel opened this issue Mar 13, 2020 · 7 comments
Closed

rust agent crashes on shutdown #160

jodh-intel opened this issue Mar 13, 2020 · 7 comments

Comments

@jodh-intel
Copy link
Contributor

Tracking issue for a problem I'm seeing related to adding tracing (which requires a clean shutdown):

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1188:5

This problem is seen calling destroySandbox on master and appears to be caused by a thread (created by a dependent crate?) failing when operating on an mpsc channel. Debugging is complicated by the fact that the crash is random (threads ;) and the generated backtrace is not super-helpful:

stack backtrace:
   0:     0x55555634a734 - <unknown>
   1:     0x555556370c6c - <unknown>
   2:     0x555556345537 - <unknown>
   3:     0x55555634cf4e - <unknown>
   4:     0x55555634cc41 - <unknown>
   5:     0x55555634d62b - <unknown>
   6:     0x55555634d1de - <unknown>
   7:     0x55555636cf0e - <unknown>
   8:     0x55555636d007 - <unknown>
   9:     0x5555560760be - <unknown>
  10:     0x555556091e9c - <unknown>
  11:     0x5555560958c2 - <unknown>
  12:     0x55555608ef03 - <unknown>
  13:     0x5555560962c1 - <unknown>
  14:     0x5555560967d0 - <unknown>
  15:     0x555556355eba - <unknown>
  16:     0x555556096658 - <unknown>
  17:     0x555556096393 - <unknown>
  18:     0x55555608ecee - <unknown>
  19:     0x555556065164 - <unknown>
  20:     0x55555633dacf - <unknown>
  21:     0x555556354f80 - <unknown>
  22:     0x7ffff7da1669 - start_thread
  23:     0x7ffff7ef3323 - clone
  24:                0x0 - <unknown>

@lifupan - I'll add more specific details to this issue on Monday but any thoughts? Can you recreate?

@lifupan
Copy link
Member

lifupan commented Mar 16, 2020

Hi @jodh-intel , from you description, was the panic occur here: https://github.com/kata-containers/kata-containers/blob/master/src/agent/src/grpc.rs#L1234 ?

@jodh-intel
Copy link
Contributor Author

Hi @lifupan - The problem actually seems to be the custom VSOCK grpc-rs/grpc-sys crate we're using: shutdown seems to be unreliable with it, but not with the an upstream version:

I now have a test program that can shutdown its gRPC server cleanly using grpcio = "0.4.6", but with the crate we're using it crashes catastrophically on shutdown intermittently with the error shown at the top of this issue.

I really don't want to have to fix this behaviour in our custom crate as it was only ever supposed to be a temporary fix. So, this is a good time to task about two things:

/cc @amshinde, @bergwolf, @gnawux.

@jodh-intel
Copy link
Contributor Author

@teawater - could you provide any update on grpc/grpc#21121?

@jodh-intel
Copy link
Contributor Author

Ping all 😄

If we are unable to find a way to get the vsock changes landed upstream, there is of course ttrpc.

@bergwolf - is there any more information to share (a branch?) relating to #148?

@lifupan
Copy link
Member

lifupan commented Mar 25, 2020

Ping all 😄

If we are unable to find a way to get the vsock changes landed upstream, there is of course ttrpc.

@bergwolf - is there any more information to share (a branch?) relating to #148?

Hi @jodh-intel

The ttrpc branch https://github.com/lifupan/kata-containers/commits/ttrpc is a poc that we did
two month ago.

@amshinde
Copy link
Member

@lifupan Is the ttrpc code ready to be PR'd /merged?
@jodh-intel Can we make changes in our custom crate instead, to get you unblocked while the grpc vsock changes or the ttrpc code lands?

@bergwolf @gnawux Do you have any insights on the above? @jodh-intel is currently blocked on adding tracing support because of agent crashes seen on shutdown.

chavafg pushed a commit to chavafg/kata-containers that referenced this issue Apr 23, 2020
Fixes kata-containers#160

Signed-off-by: jinda.ljd <q8886888@qq.com>
@bergwolf
Copy link
Member

This has been fixed. Please reopen if you still see it with the rust agent @jodh-intel

dgibson pushed a commit to dgibson/kata-containers that referenced this issue Aug 5, 2021
Some architectures and setups do not support DIMM/NUMA. However, they
can still use memory backends, provided a memory backend of the same ID
is specified under -machine. This was introduced in QEMU 5.0. Enable
this functionality in appendMemoryKnobs.

Fixes: kata-containers#160

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
sprt pushed a commit to sprt/kata-containers that referenced this issue Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants