Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the constant pipeline failures #993

Closed
git-hyagi opened this issue Jun 29, 2023 · 1 comment · Fixed by #998
Closed

Investigate the constant pipeline failures #993

git-hyagi opened this issue Jun 29, 2023 · 1 comment · Fixed by #998

Comments

@git-hyagi
Copy link
Collaborator

git-hyagi commented Jun 29, 2023

Verify if the API pods crashing is related to a memory leak (https://discourse.pulpproject.org/t/api-server-memory-leak/851/12).

Some ideas to help with the investigation:

@git-hyagi
Copy link
Collaborator Author

Actually, the API pods were not crashing but getting restarted because of liveness probe failures.
After doing some tests and modifying the show_logs.sh script from CI I could see:

    Warning  Unhealthy  4m15s (x5 over 5m35s)  kubelet  Liveness probe failed: Get "http://10.244.0.9:24817/pulp/api/v3/status/": dial tcp 10.244.0.9:24817: connect: connection refused

from kubectl describe pod <api pod>:

Containers:
    api:
      Container ID:  docker://49178ba159378cb6f76e9f4f8078d1058c7a71029902fdf946746fd37775263e
      Image:         quay.io/pulp/pulp-minimal:nightly
      Image ID:      docker-pullable://quay.io/pulp/pulp-minimal@sha256:16f6c061991fa1418223ddf87fc35c9a1dbb1b1ab74a9d983c8798ee09f759df
      Port:          24817/TCP
      Host Port:     0/TCP
      Args:
        pulp-api
      State:          Running
        Started:      Mon, 03 Jul 2023 18:50:08 +0000
      Last State:     Terminated
        Reason:       Error    <------- NOT OOM
        Exit Code:    137
        Started:      Mon, 03 Jul 2023 18:46:09 +0000
        Finished:     Mon, 03 Jul 2023 18:50:07 +0000

Checking the logs from a restarted pod, it was running migration tasks when the container got terminated.
As a workaround, we can increase the default failure threshold for the liveness probe while #991 is not implemented.

git-hyagi added a commit to git-hyagi/pulp-operator that referenced this issue Jul 5, 2023
git-hyagi added a commit that referenced this issue Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant