New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add docker_image_availability check #3461
add docker_image_availability check #3461
Conversation
8b7601f
to
0cdadd0
Compare
def rpm_docker_images(): | ||
return [ | ||
"docker.io/openshift/origin", | ||
"registry.access.redhat.com/openshift3/ose-haproxy-router", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this list is correct, also, I realize that since we are using our registry rather than docker.io
, maybe instead of using the ansible docker_image_facts
module, we could run oc get images ...
on the host machine and verify that output instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.
Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be amazing if openshift had a function that would dump the entire list of images possible for a given format string. That way we leave ownership ownership of which images are required for the core product within the openshift binary itself. But that would require changes to origin and we can't rely on that until it's widely available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does sound awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's RFE that idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't cover things like image streams or items from templates like registry-console, metrics, and logging. But it'd be enough to ensure the core platform worked.
Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I have some comments, this is in a good direction.
def rpm_docker_images(): | ||
return [ | ||
"docker.io/openshift/origin", | ||
"registry.access.redhat.com/openshift3/ose-haproxy-router", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.
Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url
.
@staticmethod | ||
def rpm_docker_images(): | ||
return [ | ||
"docker.io/openshift/origin", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect at least 2 different lists, or a sort of templating on the image names depending on OCP(OSE) / Origin.
if len(missing_images) > 0: | ||
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)} | ||
|
||
return {"Changed": False} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In modules, I think "changed" should be all lowercase. Here it doesn't really matter, we don't even need it.
For now we only look for a "failed" key:
openshift-ansible/roles/openshift_health_checker/action_plugins/openshift_health_check.py
Lines 73 to 75 in f0a32af
if r.get("failed", False): | |
result["failed"] = True | |
result["msg"] = "One or more checks failed" |
matched_images.add(tag[0]) | ||
|
||
missing_images = set(self.rpm_docker_images()) - matched_images | ||
if len(missing_images) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, it is idiomatic to write:
if missing_images:
...
|
||
missing_images = set(self.rpm_docker_images()) - matched_images | ||
if len(missing_images) > 0: | ||
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ":\n" suffix seems extraneous.
Instead of printing the list directly, maybe you can convert it to a string with something like ", ".join(missing_images)
.
The difference is:
There are missing docker images ...: ["foo", "bar"]
x
There are missing docker images ...: foo, bar
# if we find any docker images, attempt to do 1-to-1 match between | ||
# each image description name key and items from rpm_docker_images | ||
matched_images = set() | ||
for i in docker_images['images']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't call it i
. i
is a good name for indices in a list... here image
is probably better.
openshift_release = get_var(task_vars, "openshift_release") | ||
|
||
args = { | ||
"name": self.rpm_docker_images(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we attempt to include version/tag in the images?
Without an explicit :tag
, docker_image_facts will look for :latest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhcarvalho what do you think of instead having a method like docker_images_for()
that takes a deployment_type
like origin
or openshift-enterprise
and then a version
? That method could contain the logic to determining if a image uses tags like v3.4
or v3.4.1.40
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
} | ||
|
||
docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the first thing I'd do after calling the module is checking for a "failed" key with value True. If it failed, we can abort immediately.
|
||
docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars) | ||
|
||
if len(docker_images['images']) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not docker_images['images']:
@@ -0,0 +1,49 @@ | |||
# pylint: disable=missing-docstring | |||
from openshift_checks import OpenShiftCheck, get_var | |||
from openshift_checks.mixins import NotContainerized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase, this was renamed NotContainerizedMixin
yesterday.
ff79ebc
to
a3b13bf
Compare
Thanks, went ahead and removed the registry hostname from image urls. With the @sdodson @rhcarvalho Thanks for the feedback, review comments addressed |
ba0fa6f
to
5f93de9
Compare
openshift_release = get_var(task_vars, "openshift_release") | ||
|
||
# attempt to fetch fully qualified image tag | ||
openshift_facts = self.module_executor("openshift_facts", {}, tmp, task_vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because checks commonly depend on variables set by openshift_facts
, it is guaranteed to have run -- it is a dependency of the openshift_health_checker
role.
There is no need to run it once again. All the variables you need are in task_vars
already.
failed_pulls = set() | ||
|
||
# attempt to pull each image and fail on | ||
# the first one that cannot be fetched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like function/method docstring. It would be best to put it on the method definition, not on its call site.
# attempt to pull each image and fail on | ||
# the first one that cannot be fetched | ||
pulled_images = self.pull_images(self.rpm_docker_images(), openshift_release, tmp, task_vars) | ||
if "failed" in pulled_images and "failed_images" in pulled_images: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably enough to check for a single condition? Looks redundant.
if failed_pulls: | ||
return {"failed": True, "msg": "Failed to pull required docker images: %s" % failed_pulls} | ||
|
||
return {"changed": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably mark changed: true
in other code-paths as well (e.g., some images were pulled, but there was an error later on). And then, just like we verify if any check has failed: true
, we need to verify if any check has changed: True
and add that key/value to the result of the openshift_health_checker
action plugin.
If this is not clear, I can help implementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhcarvalho thanks, this makes sense and I definitely agree, however I think this effort should be part of a separate PR
EDIT: at least for this PR, I have gone ahead and updated returned dictionaries to include changed: True
if at least one image succeeded in being pulled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first check the intently makes changes, that's why I mentioned surfacing that in the runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. I added a commit that surfaces this so that a failure message (with at least one image successfully pulled) looks like:
Failure summary:
1. Host: <host>
Play: Run Docker image checks
Task: openshift_health_check
Changed: True
Message: One or more checks failed
Details: {u'docker_image_availability': {'changed': True,
'failed': True,
'msg': u"Failed to pull required docker images: set(['openshift3/ose-deployer:v3.4.2.7', 'openshift3/ose-haproxy-router:v3.4.2.7', 'openshift3/ose-pod:v3.4.2.7', 'openshift3/ose-docker-registry:v3.4.2.7'])"}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhcarvalho if we are going to allow for checks that modify system state, would it also make sense to allow for skipping checks that may mutate state? It might also make sense to make sure we implement check_mode for those checks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we discussed that in a call. I've created a trello card to track being able to disable checks that match a tag / check mode.
Perhaps will be easier to tag checks known to be idempotent with a idempotent
tag.
if "openshift" in facts["ansible_facts"]: | ||
if "common" in facts["ansible_facts"]["openshift"]: | ||
if "version" in facts["ansible_facts"]["openshift"]["common"]: | ||
return facts["ansible_facts"]["openshift"]["common"]["version"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be written more simply as
openshift_image_tag = get_var("openshift", "common", "version", default=None)
(Doesn't even need this method)
"openshift3/ose-pod", | ||
] | ||
|
||
def pull_images(self, images, tag, tmp, task_vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General tip: code is easier to test, understand and compose if you keep in mind the Single Responsibility Principle. Try to make functions / methods that do a single thing. Here, pull_images
do at least two things: pulling images and appending tags to images.
My suggestion is to break this into two "functions": one that appends the tag. One that pulls whatever image it is given.
Appending a tag to a sequence of images:
images_with_tag = [image + ":" + tag for image in images_without_tag]
Pulling:
result = [
self.module_executor("docker_image", {"name": image}, tmp, task_vars)
for image in images_with_tag
]
} | ||
|
||
res = self.module_executor("docker_image", args, tmp, task_vars) | ||
if "failed" in res: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtle bug: "failed"
might be in the result, but it could be failed: False
...
a325632
to
398d69d
Compare
@rhcarvalho Thanks for the feedback, review comments addressed |
5c871e9
to
24134cf
Compare
|
||
# pull each required image that requires a qualified image | ||
# tag (v0.0.0.0) rather than a standard release format tag (v0.0) | ||
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using skopeo make sense here to verify images without pulling them?
Output of skopeo inspect docker://docker.io/openshift/origin
:
{
"Name": "docker.io/openshift/origin",
"Tag": "latest",
"Digest": "sha256:7fe04a3bc9dc0508f846980888c948914663146c7cabc0dc0beb06169d75f21c",
"RepoTags": [
"b596f940a813a660be8982c37a2fdac29641b5f5",
"beta1",
"latest",
"testing",
"v0.2.1",
"v0.2.2",
"v0.3.1",
"v0.3.2",
"v0.3.3",
"v0.3.4",
"v0.3",
"v0.4.1",
"v0.4.2",
"v0.4.3",
"v0.4.4",
"v0.4",
"v0.5.1",
"v0.5.2",
"v0.5.3",
"v0.5.4",
"v0.5",
"v0.6.1",
"v0.6.2",
"v0.6",
"v1.0.0",
"v1.0.1",
"v1.0.2",
"v1.0.3",
"v1.0.4",
"v1.0.5",
"v1.0.6",
"v1.0.7",
"v1.0.8",
"v1.1.1.1",
"v1.1.1",
"v1.1.2",
"v1.1.3",
"v1.1.4",
"v1.1.5",
"v1.1.6",
"v1.1",
"v1.2.0-rc1",
"v1.2.0-rc2",
"v1.2.0",
"v1.2.1",
"v1.2.2",
"v1.3.0-alpha.0",
"v1.3.0-alpha.1",
"v1.3.0-alpha.2",
"v1.3.0-alpha.3",
"v1.3.0-rc1",
"v1.3.0",
"v1.3.1",
"v1.3.2",
"v1.3.3",
"v1.4.0-alpha.0",
"v1.4.0-alpha.1",
"v1.4.0-rc1",
"v1.4.0",
"v1.4.1",
"v1.5.0-alpha.0",
"v1.5.0-alpha.1",
"v1.5.0-alpha.2",
"v1.5.0-alpha.3"
],
"Created": "2017-02-28T01:26:25.908410645Z",
"DockerVersion": "1.12.6",
"Labels": {
"build-date": "20161214",
"io.k8s.description": "OpenShift Origin is a platform for developing, building, and deploying containerized applications. See https://docs.openshift.org/latest for more on running OpenShift Origin.",
"io.k8s.display-name": "OpenShift Origin Application Platform",
"license": "GPLv2",
"name": "CentOS Base Image",
"vendor": "CentOS"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:e24e870d9c831b7ee97c3f763d4118acea93781746b797665ceb4fb0fc327e25",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:6a8f42f1ab1e81a4365cce09ea66bc65e30f436b2754ab879242c1fe9ace11b8",
"sha256:190f15d3619652dbfdbf3a8d666f3b04bfeb27138050aa5fa993601af462c802",
"sha256:8a54e0c0e9d1ebd81c6915f8fdac5a14a6b4c0a0786a4721b879c032d3e4efaa",
"sha256:ebfb90719f35b6847ab381c48b887942c09621a5b4392e82d7f18601bf71922b",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
"sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did talk about using skopeo
, but I think we started with the docker*
modules for they're readily available... Do we install skopeo
as part of the install?
For preflight checks, we cannot assume skopeo is installed. Installing would be one more side effect for the check... something to discuss ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently install skopeo as part of the install, but as we move towards system containers, it will become a hard dependency at some point. Even without skopeo, it may make sense to default to querying the registry API to search tags for images, rather than pulling images. It would also allow us to be a bit fuzzier on finding a matching container image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we asked ourselves yesterday is whether skopeo would to the same registry resolution as a docker pull
would (trying to default to different registries based on config, e.g. docker.io
, registry.access.redhat.com
, ...). @detiber do you happen to know?
And while we could use skopeo
once it is installed, for preflight checks (before installation), we'd need to rely on pre-installed software or install it as part of the check (one more side effect).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber, can you explain what you mean by "allow us to be a bit fuzzier on finding a matching container image"?
@juanvallejo, thanks for digging in to skopeo. We need to find the right balance between idempotence and usefullness. Here's a question, is there already a skopeo container? I really don't want to require our checks to need any sort of subscription-manager
work to register a system to pull RPMs. Until skopeo is somehow built in it seems OK in my mind to pull a container in order to run it.
@rhcarvalho's comments are still valid. We'd have to be completely convinced we aren't going to have skopeo report that the correct images are available only to have a docker pull fail because a misconfigured registry is shadowing registry.access.redhat.com.
@juanvallejo, To test this further I'd be curious to know what happens if the same repository exists in two registries yet a specific tag only exists in a "shadowed" registry configured with docker. To test you could set up a registry using the steps at https://docs.docker.com/registry/deploying/ that exposed an image called /openshift3/ose-pod:latest. Then configure that registry to be used by a docker pull with ADD_REGISTRY
in /etc/sysconfig/docker
. Put it before the entry for registry.access.redhat.com. Then make sure docker pull openshift3/ose-pod:v3.4
still works. It should first check the misconfigured registry and see that the image is missing and precede to check registry.access.redhat.com.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brenton I'm not sure a system container is sufficient in this case either, not all systems will have docker available.
I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check.
@@ -79,6 +79,7 @@ def _format_failure(failure): | |||
(u'Host', host), | |||
(u'Play', play), | |||
(u'Task', task), | |||
(u'Changed', stringc(result._result.get('changed', u'???'), C.COLOR_CHANGED)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output was focused on listing a summary of failures. Not sure about changed True/False being valuable there.
@@ -73,6 +73,7 @@ def run(self, tmp=None, task_vars=None): | |||
if r.get("failed", False): | |||
result["failed"] = True | |||
result["msg"] = "One or more checks failed" | |||
result["changed"] = r.get("changed", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange logic, why would we add the "changed"
key only when a check failed?
We can always add "changed"
, and compute it from the check results:
result["changed"] = any(r.get("changed", False) for r in check_results.values())
from openshift_checks.mixins import NotContainerizedMixin | ||
|
||
|
||
class DockerImageAvailability(NotContainerizedMixin, OpenShiftCheck): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check could work for both containerized and non-containerized installs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and actually it would influence which images are required. For a containerized install you would want the openshift/origin
or openshift3/ose
image to be available in addition to pod/registry/router/etc.
failed_pulls.update(pulled_images["failed_images"]) | ||
|
||
if not openshift_image_tag: | ||
if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic: if we only tried to pull len(self.rpm_docker_images())
images above, and len(self.rpm_qualified_docker_images()) > 0
, then the condition in this if-clause is always true.
By construction we have len(failed_pulls) <= len(self.rpm_docker_images())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, this code seems to be duplicated below...
# pull each required image that requires a qualified image | ||
# tag (v0.0.0.0) rather than a standard release format tag (v0.0) | ||
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars) | ||
if "failed" in pulled_qualified_images and "failed_images" in pulled_qualified_images: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant condition, when "failed" in pulled_qualified_images
, by construction, "failed_images"
is also in pulled_qualified_images
.
@staticmethod | ||
def rpm_qualified_docker_images(): | ||
return [ | ||
"openshift3/ose-haproxy-router", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should be able to support both ose
and origin
depending on the deployment_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'm also kind of thinking about which images we want to test for... pulling all of them is probably excessive, but it seems like we ought to check for more than just one. Maybe the pod
and deployer
images at least?
changed = False | ||
failed_pulls = set() | ||
|
||
pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we first check to see if the required images already exist in the local docker store? That way this could support disconnected installs (https://docs.openshift.com/container-platform/3.4/install_config/install/disconnected_install.html).
83f88db
to
314b464
Compare
@juanvallejo there are some linting problems to be addressed: |
@@ -74,6 +74,8 @@ def run(self, tmp=None, task_vars=None): | |||
result["failed"] = True | |||
result["msg"] = "One or more checks failed" | |||
|
|||
result["changed"] = any(r.get("changed", False) for r in check_results.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation level, this should be outside of the for
loop.
tags = ["preflight"] | ||
|
||
# attempt to pull required docker images and fail if | ||
# any images are missing, or error occurs during pull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation comment would be best in the form of a docstring. I think it could complement the documentation of this class:
class DockerImageAvailability(OpenShiftCheck):
"""Check that required Docker images are available.
This check attempts to pull required docker images and fail if
any image is missing or an error occurs during pull.
"""
For one, that's more idiomatic than having comments on top of a method definition. And while we don't do it today, there is tooling for extracting docstrings and generating documentation -- so, for example, we could generate a page documenting all existing checks in the future.
if "failed_images" in pulled_images: | ||
failed_pulls.update(pulled_images["failed_images"]) | ||
|
||
if not openshift_image_tag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why do we check on openshift_image_tag
?!
And then "if not openshift_image_tag", we always fail?! I don't follow that logic, maybe I need lunch 🍝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wrote this check under the assumption that openshift_image_tag
could be optionally set in the config (so there would almost certainly be cases where this is not set). If it was the case that it was not set, then I would not be able to pull images with qualified tags, so I failed before getting to that part.
However it looks like I can count on openshift_image_tag
to be set by the time this module runs, so I will remove that block.
def qualified_docker_images(is_origin_deployment): | ||
if is_origin_deployment: | ||
return [ | ||
"openshift/origin-haproxy-router" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some places in origin and openshift-ansible we derive image names from a pattern like prefix-component:version
.
Example:
https://github.com/rhcarvalho/origin/blob/bd3e362de2a9f849bd2d7d3ff2a372746f35fb2a/pkg/cmd/util/variable/imagetemplate.go#L30
In Origin, the DefaultImagePrefix
is set to openshift/origin
, and for OCP it is openshift3/ose
. I think it would be natural to follow that pattern, and it would be easier to maintain these lists, since we know the image names follow a pattern.
failed.add(args["name"]) | ||
|
||
if failed: | ||
return {"failed": True, "failed_images": failed} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for setting changed
could be here, instead of repeated in the callers of pull_images
:
@rhcarvalho thanks for the feedback |
We were talking today about how to get skopeo on the systems so we can use
it to check image availability. This is kind of tricky.
1. RPM? Well, there's no guarantee it's there, and we don't really want to
install one, especially not knowing the state of repos. Plus we can't
install RPMs on Atomic hosts.
2. Docker container? That seems to make sense, except it relies on being
able to pull an image with skopeo... in order to check that you can pull
the images you're supposed to be able to... and that's a bit circular.
Plus, disconnected hosts.
It almost seems like we'd have to bundle skopeo itself into an ansible
module.
…On Wed, Mar 1, 2017 at 9:56 PM, Jason DeTiberus ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In roles/openshift_health_checker/openshift_checks/
docker_image_availability.py
<#3461 (comment)>
:
> +
+ changed = False
+ failed_pulls = set()
+
+ pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars)
+ if "failed_images" in pulled_images:
+ failed_pulls.update(pulled_images["failed_images"])
+
+ if not openshift_image_tag:
+ if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()):
+ changed = True
+ return {"changed": changed, "failed": True, "msg": "Unable to retrieve fully qualified image tag. Failed to fetch images: %s" % (set(failed_pulls).update(self.rpm_qualified_docker_images()))}
+
+ # pull each required image that requires a qualified image
+ # tag (v0.0.0.0) rather than a standard release format tag (v0.0)
+ pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)
@brenton <https://github.com/brenton> I'm not sure a system container is
sufficient in this case either, not all systems will have docker available.
I think skopeo, or hitting directly against the search api, gives us the
ability to actually query the list of tags for a given image, so it would
give us the ability to query for the "latest" version for containerized
installs without having to resort to repoquery hacks, pulling and querying
a 'latest' image, or hardcoding default values. It also avoids the need for
pulling images as part of a pre-requisite check.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3461 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABz-uic-Mcqew4_kA72DoSa17XzdmoXks5rhi_DgaJpZM4MJX5D>
.
|
The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo. |
On Atomic hosts skopeo is already present, but I agree to the package being problematic.
I don't think skopeo is a requirement for containerized tests, since the tests do not require restarting services. We could do pre-req tests using just docker. That said, docker isn't guaranteed to be on a minimal RHEL host either. If we need to set a minimum requirement to run, then requiring
This part is tricky, since it's a go binary. We could vendor skopeo in repo to be able to copy it remotely and execute on it, or even copy it as part of the action plugin. If we did that, we would have to have a process for managing the vendored package, though. That said, skopeo is hitting well-defined api endpoints to do the queries, so it might make more sense to just re-implement the logic as opposed to vendoring the binary.
I do like the idea of having a container with all the dependencies for openshift-ansible, It doesn't quite solve the chicken/egg scenario of how that container is run initially on a minimal host, though. |
while len(regs) and len(required_images): | ||
current_reg = regs[0] | ||
inspect = self.inspect_images(self.docker_path, self.skopeo_image, current_reg, required_images, tmp, task_vars) | ||
required_images = inspect.get("failed_images") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is exponential quadratic time at worst, however a successful local check helps the number of images it has to process / skips this completely. Also, if skopeo is not able to be pulled, or doesn't exist ahead of time locally, or the "skopeo image" does not contain the skopeo
bin for whatever reason, the check fails early before getting to this block. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exponential time at worst
Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.
Still, I think it is clear that we want to check images offline first before doing anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.
:) thanks, also I am calling docker_image_facts
in the inspect_local_images
func before this check to do a pass on any images that have already been pulled
Added a check that uses this skopeo image for now, with a local check that precedes it, at least while we decide how to handle |
3124655
to
ff60737
Compare
@rhcarvalho Thanks for the review, comments addressed. All linting issues seem to stem from the vendored https://travis-ci.org/openshift/openshift-ansible/jobs/213938461#L196 module |
# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5 | ||
# It has been temporarily vendored here because of X, Y, Z. | ||
# We can remove this file once openshift-ansible requires ansible >= 2.x.x.x. | ||
# pylint: skip-file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be at the very beginning of the module, right after the shebang, otherwise it has no effect.
https://pylint.readthedocs.io/en/latest/faq.html#how-can-i-tell-pylint-to-never-check-a-given-module
I'd also suggest that our own commentary should be the top-most thing, and everything else should be as-is in the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done!
8011688
to
28c8afb
Compare
docker_info = self.module_executor("docker_info", {}, task_vars).get("info", "") | ||
result = self.module_executor("docker_info", {}, task_vars) | ||
|
||
# do test on module exception - see if this line is useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have time to check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have time to check it?
Yes, at least when a call to the docker client.info()
results in a 404
response from the daemon, the following is returned by the docker_info
module: http://pastebin.test.redhat.com/467479
Both rc=1
and failed=True
are present key-values in the response:
...
'failed': True,
'module_stderr': u'Traceback (most recent call last):\n File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 24, in <module>\n main()\n File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 19, in main\n info=client.info(),\n File "/usr/lib/python2.7/site-packages/docker/api/daemon.py", line 33, in info\n return self._result(self._get(self._url("/inf")), True)\n File "/usr/lib/python2.7/site-packages/docker/client.py", line 178, in _result\n self._raise_for_status(response)\n File "/usr/lib/python2.7/site-packages/docker/client.py", line 173, in _raise_for_status\n raise errors.NotFound(e, response, explanation=explanation)\ndocker.errors.NotFound: 404 Client Error: Not Found ("{"message":"page not found"}")\n',
'module_stdout': u'',
'msg': u'MODULE FAILURE',
'rc': 1}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, in the case of a module error due to the docker daemon being down, the entire check would fail before getting to this point, due to docker_image
failing to connect during the self.update_skopeo_image
step
) | ||
|
||
# check how container name collisions are handled | ||
# make sure container is removed by module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another TODO, did you check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I verified (with docker ps
) that in an enterprise install in an aws ec2 instance, the container is not still there after the check completes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might sound extreme, and it is after midnight here... but if we're writing robust software, we should be thinking about the daemon going down concurrently with the check run? Or some other conditions.
#!/usr/bin/python | ||
# pylint: skip-file | ||
# | ||
# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# | ||
# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5 | ||
# It has been temporarily vendored here due to issue https://github.com/ansible/ansible/issues/22323 | ||
# We can remove this file once openshift-ansible requires ansible > 2.2.1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Please add one or two blank comment lines here to keep it separate from the copyright notice. Maybe some ASCII art to make the distinction clear, use your own judgement:
# TODO: remove .... # # ------------------------------------------------------------------- # # Copyright 2016 ...
-
I suggest marking the TODO with a "TODO" comment, those are easier to find by grepping the code base than generic comments. Hopefully that would increase the likelihood we'll remove this file in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done
ed96d5c
to
0e980ff
Compare
@juanvallejo please rebase / resolve the fixup commit and I'll have a look at this tomorrow morning :-) |
@juanvallejo I sent out a commit to make Travis GREEN ;) |
@@ -0,0 +1,2036 @@ | |||
#!/usr/bin/python | |||
# pylint: skip-file | |||
# flake8: noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juanvallejo FYI this line disable "flake8" (checks formatting and some common problems), while the other one disables "pylint". The two tools have some overlap, but also some unique features. We happen to use both for now.
return { | ||
"failed": True, | ||
"changed": changed, | ||
"msg": "Failed to update Skopeo image.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps here it would be useful to include the original error message from trying to update the image and what the image registry/name was.
Would be useful for us when evaluating a problem report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhcarvalho Thanks, although the registry that the image was attempted to be pulled from is not available as part of the error message (if there is one), I have added the image namespace/name as part of the output.
return {"changed": changed} | ||
|
||
def required_images(self, task_vars): | ||
""" Return a list of all required tagged images """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review all of the docstrings. Should use three double quotes, no spaces between the text and the quotes, end with a full-stop (".").
In particular, this docstring don't tell us much more than the function name alone. We'd be fine nuking it.
|
||
return images | ||
|
||
def local_images(self, required_images, task_vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, s/required_images/images
def is_available_skopeo_image(self, image, registry, task_vars): | ||
"""Uses Skopeo to determine if required image exists in a given registry. | ||
|
||
Returns the set of images not found in the given registry.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This statement is wrong, this function returns a bool...
- There seems to be an extra space here, bad indentation.
- Keep consistency in the file -- put the """ in a line on its own.
https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings
c0f85f2
to
c3d1baf
Compare
This patch adds a check to ensure that required docker images are available in at least one of the registries supplied in an installation host. Images are available if they are either already present locally, or able to be inspected using Skopeo on one of the configured registries.
Due to the use of a restricted name in the core `docker_container` module's result, any standard output of a docker container captured in the module's response was stripped out by ansible. Because of this, we are forced to vendor a patched version of this module, until a new version of ansible is released containing the patched module. This file should be removed once we begin requiring a release of ansible containing the patched `docker_container` module. This patch was taken directly from upstream, with no further changes: 20bf02f6b96356ab5fe68578a3af9462b4ca42a5
c3d1baf
to
a35bd91
Compare
aos-ci-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[merge] thanks @juanvallejo |
a35bd91 - State: success - All Test Contexts: aos-ci-jenkins/OS_unit_tests - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-2-unit-tests-1200/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt |
Evaluated for openshift ansible merge up to a35bd91 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/84/) (Base Commit: adbf60c) |
a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_NOT_containerized, aos-ci-jenkins/OS_3.4_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt |
a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_containerized, aos-ci-jenkins/OS_3.4_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt |
a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt |
a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt |
Something to get a rough idea of how this check might be implemented.
cc @rhcarvalho @brenton