Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI - timeouts #137

Open
andreeaflorescu opened this issue Nov 29, 2022 · 4 comments
Open

CI - timeouts #137

andreeaflorescu opened this issue Nov 29, 2022 · 4 comments

Comments

@andreeaflorescu
Copy link
Member

Recently tests are taking a really long time to finish (i.e. more than 5 minutes). Because of this the CI for various components is blocked and tests appear as failed because of timeout.

We did not yet identify a root cause, but so far this seems to be the case with all the CI hosts (including the ones running MSHV).

Some issues that we are aware of are:

  • rustvmm/dev container uses a very old linux version (i.e. uses ubuntu 18)
  • threads seem to be blocked doing epoll - more details needed here; initial investigation done by @aghecenco
@vireshk
Copy link

vireshk commented Nov 30, 2022

@andreeaflorescu can we increase the timeout in CI to, lets say, 10-15 minutes ? So they don't time out ? At least until the issue is fixed ?

@andreeaflorescu
Copy link
Member Author

@vireshk yes, we can do that. I have a PR that fixes some other things (rust-vmm/rust-vmm-ci#116) as well that might be causing the timeout, I will try my best to get it ready by the end of the day. In any case, in the same PR I will also be increasing the timeout until we figure out what is going on.

@andreeaflorescu
Copy link
Member Author

andreeaflorescu commented Nov 30, 2022

The problem might be related to the host on which the CI is running. The problem seems oddly similar to the one reported here: https://lore.kernel.org/lkml/Y38h9oe4ZEGNd7Zx@quatroqueijos.cascardo.eti.br/T/#m3efc3916c892c9cca270d3ef6bfea780a2033c8e

What is strange though is that we are not using Linux 5.4, but Linux 5.15 on Ubuntu. This needs further investigation.

andreeaflorescu added a commit to andreeaflorescu/rust-vmm-ci that referenced this issue Nov 30, 2022
We need to increase the timeout for each test to 15 minutes because we
are experiencing timeouts when running the tests. I think this might be
actually related to the hosts running the CI, because some early
investigations by aghecen@amazon.com show that the threads are blocked
on doing epoll_wait. This is odly similar to the issue discussed here
related to epoll: https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/

For more details, check: rust-vmm/community#137

Signed-off-by: Andreea Florescu <fandree@amazon.com>
lauralt pushed a commit to rust-vmm/rust-vmm-ci that referenced this issue Nov 30, 2022
We need to increase the timeout for each test to 15 minutes because we
are experiencing timeouts when running the tests. I think this might be
actually related to the hosts running the CI, because some early
investigations by aghecen@amazon.com show that the threads are blocked
on doing epoll_wait. This is odly similar to the issue discussed here
related to epoll: https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/

For more details, check: rust-vmm/community#137

Signed-off-by: Andreea Florescu <fandree@amazon.com>
@andreeaflorescu
Copy link
Member Author

We temporarily increased the CI timeout per test to 15 minutes, this hopefully is enough to stop seeing the failures. To see the change in timeout, rust-vmm-ci needs to be updated. Updating rust-vmm-ci submodule is done weekly/monthly by dependabot. In case you want to do it manually, you can follow the runbook here: https://github.com/rust-vmm/community/blob/main/CONTRIBUTING.md#updating-the-rust-vmm-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants