Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation when container OOMKilled #1146

Merged
merged 1 commit into from
May 10, 2023

Conversation

bitoku
Copy link
Contributor

@bitoku bitoku commented Apr 28, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds validation for OOMKilled container.

It checks if exitCode==137 && reason==OOMKilled

Which issue(s) this PR fixes:

Fixes #1102

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 28, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @bitoku!

It looks like this is your first PR to kubernetes-sigs/cri-tools 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cri-tools has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 28, 2023
@haircommander
Copy link
Contributor

thanks @bitoku is this still a draft?

@bitoku
Copy link
Contributor Author

bitoku commented Apr 28, 2023

I wanted to check the result of the tests before making it not-draft.
I thought some tests would be run when I created a PR.

I think I have to check the behaviour on my machine.

@SergeyKanzhelev
Copy link
Member

related to #1102?

@bitoku
Copy link
Contributor Author

bitoku commented Apr 28, 2023

Yes, sorry I put Fixes in a comment section.
I changed the first comment.

@bitoku bitoku marked this pull request as ready for review April 28, 2023 18:20
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2023
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2023
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails because of this change:

  [FAILED] in [It] - github.com/kubernetes-sigs/cri-tools/pkg/validate/container_linux.go:202 @ 05/02/23 07:12:14.529
  STEP: stop PodSandbox @ 05/02/23 07:12:14.529
  STEP: delete PodSandbox @ 05/02/23 07:12:14.692
  << Timeline

  [FAILED] Timed out after 60.000s.
  Expected
      <v1.ContainerState>: 1
  to equal
      <v1.ContainerState>: 2
  In [It] at: github.com/kubernetes-sigs/cri-tools/pkg/validate/container_linux.go:202 @ 05/02/23 07:12:14.529
------------------------------

Summarizing 1 Failure:
  [FAIL] [k8s.io] Container OOM runtime should output OOMKilled reason [It] should terminate with exitCode 137 and reason OOMKilled
  github.com/kubernetes-sigs/cri-tools/pkg/validate/container_linux.go:202

Ran 84 of 89 Specs in 85.944 seconds

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2023
@bitoku
Copy link
Contributor Author

bitoku commented May 2, 2023

I found the failure is because of containerd/containerd#7749.
This change is available from 1.6.13+. https://github.com/containerd/containerd/releases/tag/v1.6.13

Is there any specific reason to stay on the containerd@1.6.12 in critest?
containerd@1.6 is LTS but I think we don't have to hold on a specific patch version.

So there are two solutions I came up with:

  1. update containerd version to 1.6.13+
  2. specify MemorySwapLimitInBytes

Which one do you think is better?

@bitoku
Copy link
Contributor Author

bitoku commented May 3, 2023

I added MemorySwapLimitInBytes to make containerd 1.6.12 kill the container

@bitoku
Copy link
Contributor Author

bitoku commented May 5, 2023

It seems it timed out in the build step.
In my forked repo it worked fine. https://github.com/bitoku/cri-tools/actions/runs/4874757908/jobs/8738852586
what should I do to avoid this?

@SergeyKanzhelev
Copy link
Member

It seems it timed out in the build step. In my forked repo it worked fine. https://github.com/bitoku/cri-tools/actions/runs/4874757908/jobs/8738852586 what should I do to avoid this?

Looking at output

/home/runner/work/cri-tools/cri-tools/src/github.com/kubernetes-sigs/cri-tools/_output/bin/amd64-linux/critest: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=6hH4QIaLpEAi2pPTkXjM/p7rAmmQsKcwXNiSq71Pp/v2qCePMxS8o3LJbNl6vV/DLEqVztH4GdjjQQsnEhh, with debug_info, not stripped
/home/runner/work/cri-tools/cri-tools/src/github.com/kubernetes-sigs/cri-tools/_output/bin/386-linux/critest: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, Go BuildID=8x_9goimsDpvOjYczyGi/TGXIQnu_fsndIJfvAp_3/pTfR-ku9DMEExIlFvF-2/5DzwE1wj07zx3aaWnDlH, with debug_info, not stripped
/home/runner/work/cri-tools/cri-tools/src/github.com/kubernetes-sigs/cri-tools/_output/bin/arm-linux/critest: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, Go BuildID=heqh_XiGFuGUUGyV7ajT/gIItSLVm05wWG-9wzJf1/PE884KcLqSm2IECGu6Yy/gbtPFxCeIcGu6SCATIQI, with debug_info, not stripped
/home/runner/work/cri-tools/cri-tools/src/github.com/kubernetes-sigs/cri-tools/_output/bin/arm64-linux/critest: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=3L_RUvc9CV4LBKdr1LNw/h_3YbKa4dy5amkTPxEgi/YxfWRACuvXTrVaMx3yld/3otM0klQQoX-wq39Qr8y, with debug_info, not stripped
Error: The operation was canceled.

and based on the ordering the windows/amd64 test fails. Does it pass OK for you? Do you use 1.6.2 on your local environment?

@bitoku
Copy link
Contributor Author

bitoku commented May 5, 2023

Yes It passes OK on github actions in my forked repo.
I didn't run it on my local.

Do you use 1.6.2 on your local environment?

Are you mentioning about containerd version?
I don't think containerd version has to do with the build action because it's not dependent on containerd.

I looked over the past commits and some commits failed build action due to timed out.
471c53d
8d781b8
73472f1

Do we need to set build timeout 20 mins?
Can we relax the timeout-minutes?

@SergeyKanzhelev
Copy link
Member

Do we need to set build timeout 20 mins?

Sounds reasonable for me. @saschagrunert?

@saschagrunert
Copy link
Member

Yeah let’s increase the timeout. This test flakes often because of that.

@bitoku bitoku mentioned this pull request May 9, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bitoku, saschagrunert, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit 74beab5 into kubernetes-sigs:master May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

critest: add a test to validate the reason message returned for cgroup OOM-killed container
5 participants