Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hercules CI disabled for stuck VM tests #213

Closed
zowoq opened this issue Jul 22, 2023 · 19 comments
Closed

Hercules CI disabled for stuck VM tests #213

zowoq opened this issue Jul 22, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@zowoq
Copy link

zowoq commented Jul 22, 2023

@nix-community/lanzaboote

I've disabled hercules on this repo as your VM tests are getting stuck and causing OOMs on the infra.

https://hercules-ci.com/accounts/github/nix-community/derivations/%2Fnix%2Fstore%2F2j033ap4rs21iw8c90rr6s8sxqgdbi09-vm-test-run-lanzaboote.drv/log?via-job=42b9d7c2-fb82-47f8-9e66-0c616110034d

@RaitoBezarius RaitoBezarius changed the title hercules Hercules CI disabled for stuck VM tests Jul 22, 2023
@RaitoBezarius
Copy link
Member

Thank you @zowoq, we will investigate and put timeouts to avoid this in the future. Can we re-enable Hercules by ourselves and do I need to ping once we are ready again?

@zowoq
Copy link
Author

zowoq commented Jul 22, 2023

No, don't need to ping and you can re-enable it yourself with the Build is repository switch here: https://hercules-ci.com/github/nix-community/lanzaboote.

@nikstur
Copy link
Collaborator

nikstur commented Jul 22, 2023

I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already.

@RaitoBezarius
Copy link
Member

I have a feeling this is because we are re-compiling half the world when we cross compile our stub. Sounds like another reason to go back to crane. I have a PR in the works already.

This is unrelated because by the time we are running VM tests, the system closure is ready.

@zowoq
Copy link
Author

zowoq commented Sep 25, 2023

@RaitoBezarius
Copy link
Member

I see, this is an annoying but in the test framework, I will try to come up with a fix in nixpkgs.

@RaitoBezarius
Copy link
Member

RaitoBezarius commented Sep 27, 2023

I am exploring a proper solution in NixOS/nixpkgs#257535, in the meantime, I think I will get a timeout option in the test driver which is the "easy solution".

@blitz blitz added the bug Something isn't working label Oct 20, 2023
@RaitoBezarius
Copy link
Member

Hopefully NixOS/nixpkgs#262839

@RaitoBezarius
Copy link
Member

Timeout have been merged and lanzaboote master has been updated to use it.
We will need to rebase all PRs to make use of it in PR CI.

@RaitoBezarius
Copy link
Member

I re-enabled CI.

@RaitoBezarius
Copy link
Member

Let's re-open if Hercules CI has to be disabled again.

@zowoq
Copy link
Author

zowoq commented Oct 30, 2023

Timeout have been merged

How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects.

We will need to rebase all PRs to make use of it in PR CI.

I've disabled CI again as I don't want PRs pushed without being rebased and end up with stuck tests again.

Please rebase the PRs first, then re-enable CI.

@RaitoBezarius
Copy link
Member

How long is the timeout? With the number of nixos tests this project has it need to be quite short to not block other projects.

1 hour, by default on any NixOS test AFAIK. It is up on master.

@zowoq
Copy link
Author

zowoq commented Oct 30, 2023

With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout?

@RaitoBezarius
Copy link
Member

With this number of vm tests an hour is too long for our limited resources. They seem to run quick, can you do a 5 - 10 minute timeout?

I can.

I rebased all the PRs, will submit a 10 minutes timeout as a default and then rebase everything again.

@nikstur
Copy link
Collaborator

nikstur commented Oct 30, 2023

Let's try 5 minutes and if we see that that doesn't suffice we can go up to 10.

Edit: In the interest of fair resource sharing.

@RaitoBezarius
Copy link
Member

#250 @nikstur can I let you do a quick review?

@RaitoBezarius
Copy link
Member

Everything has been rebased, I am turning on the CI again.

@zowoq
Copy link
Author

zowoq commented Oct 30, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants