-
Notifications
You must be signed in to change notification settings - Fork 220
check-container: {live,ready}ness checks moved from templates #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[test-openshift] |
0e609a0 to
a47a49a
Compare
|
I don't know that a liveness check which simply ensures you can exec into the container, and that pid 1 exists, is worth having. (if pid1 didn't exist, the container would have terminated and been restarted, and if you can't exec into the container the container doesn't exist). |
|
pg_isready is actually used for both checks, at least should be |
my reading of the bash was that if --live is passed, it doesn't really matter what pg_isready returns (whether it returns 0 or 1, you'll exit 0 either way) |
|
pg_isready should return 1 if the server is starting, 2 if the readiness check failed |
ah, ok. lgtm then. |
|
Tried testing out locally, getting: |
|
Oh right that is because of the update test since the official docker images do not have the file in them yet... |
|
I have to look closer, but can we somehow relax the tests (temporarily) so the tests are run only against git-version of the containers? (variable, option for jenkins...) |
|
Well there would not be any reason to run the update test if we only used the git-built container. So the easiest temporary fix is to just not run it at all... Long time we should just use the live Openshift templates instead of the local ones for this test. We have separate tests for local templates so should be ok. |
|
Other test cases ran ok on my local box. |
|
So basically nothing prevent's us to push this... right? Anyways, please during the review -- take attention to the timeouts. Previously, there was the (default) timeout for So to the limits (10s timeout for liveness check, checking after 30s after container start...). It is still not perfect... if the db initialization takes minutes (e.g. s2i, or slower storage), the script waits till we execute /bin/postgres (the /proc/1/exe check).... But openshift runs the script after 30s, gives it 10s time-frame ... and then kills it (somehow). @bparees is there documented what happens with the process, so we could "catch the signal" and return "success" in such case (initdb phase)? |
|
Basically, what I miss is that I'm still not sure that the liveness check won't kill the postgres pod |
you cannot catch the signal. once the liveness check fails, k8s will kill your container, period.
claiming the pod is "live" doesn't serve any purpose i do not believe. readiness probes are what determine routeability. liveness probes are only for determining if the pod should be killed, so a loose liveness check with a long initialdelay just means it won't get killed quickly. |
a47a49a to
c43401a
Compare
|
@praiskup Please remove the openshift update test from test cases that are being run to make sure we have at least some of the openshift tests working. I will look into how to do the test properly in the meantime. Otherwise LGTM. @bparees This change will only work with the centos versions of the images, since the |
I have created a PR that modifies the update test to use the template located in the openshift repository instead in #229 |
c43401a to
36a45c3
Compare
|
[test-openshift] |
No, the template is shared by both centos and rhel users. You need to publish a rhel image that works before changing the template to avoid breaking people who pick up the template change. (yes most people get the template from origin but: 1) it's still not a good idea and 2) i don't want to risk us pulling this change into origin and breaking people using the rhel image) |
I don't really see a good reason to hold #226 so i guess i'd prefer to go ahead and merge it for now. |
36a45c3 to
0dc49c0
Compare
|
[test-openshift] |
This is bad, I would rather prefer not to have such requirements on the source repository. |
|
[test-openshift] |
|
@bparees what's actually the problem with merging this? You are afraid that |
primarily this.
somewhat this. We don't currently have any CI that tests the rhel images I don't think, the CI tests that would run would confirm the template works when centos imagestreams are defined (which would work in this case). All that said, i'm not going to block you guys on it. Moving forward I want to have the following ordering:
(Today we have a situation where we maintain the db-templates directly in origin(periodically updating from the image-repo manually), library pulls from origin). |
|
Especially first step in the workflow is a must from our POV. Others we should do according So I think the workflow described by Ben is fine, and Ben acked the merge, @hhorak |
it's never going to live in library, there's just no way library can be responsible for CIing everyone's content. But yes, having it live in origin is plausible. Even longer term i'd like to get to the point where origin is not even distributing these templates and if you want them, you install them as an "scl_templates_and_imagestreams" package or something. |
|
Just cc'ing @pvalena, does the "two-way" CI setup make sense to you? |
Keeping |
|
@praiskup so when can we expect new rhel images that will include the check-container script so we can resume consuming the template that lives in this repo? |
|
@bparees I will be pulling upstream changes (inc. this one) for the next RHAH release. |
Related: #226