add docker_image_availability check #3461

juanvallejo · 2017-02-23T00:17:44Z

Something to get a rough idea of how this check might be implemented.

juanvallejo · 2017-02-23T00:23:01Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    def rpm_docker_images():
+        return [
+            "docker.io/openshift/origin",
+            "registry.access.redhat.com/openshift3/ose-haproxy-router",


Not sure if this list is correct, also, I realize that since we are using our registry rather than docker.io, maybe instead of using the ansible docker_image_facts module, we could run oc get images ... on the host machine and verify that output instead

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

I think it'd be amazing if openshift had a function that would dump the entire list of images possible for a given format string. That way we leave ownership ownership of which images are required for the core product within the openshift binary itself. But that would require changes to origin and we can't rely on that until it's widely available.

That does sound awesome.

Let's RFE that idea!

It wouldn't cover things like image streams or items from templates like registry-console, metrics, and logging. But it'd be enough to ensure the core platform worked.

sdodson · 2017-02-23T03:10:41Z

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

openshift-ansible/inventory/byo/hosts.origin.example

Line 112 in fb6e445

#oreg_url=example.com/openshift3/ose-${component}:${version}

rhcarvalho

Though I have some comments, this is in a good direction.

rhcarvalho · 2017-02-23T09:13:13Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    def rpm_docker_images():
+        return [
+            "docker.io/openshift/origin",
+            "registry.access.redhat.com/openshift3/ose-haproxy-router",


As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

rhcarvalho · 2017-02-23T09:37:56Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    @staticmethod
+    def rpm_docker_images():
+        return [
+            "docker.io/openshift/origin",


I'd expect at least 2 different lists, or a sort of templating on the image names depending on OCP(OSE) / Origin.

rhcarvalho · 2017-02-23T09:41:07Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        if len(missing_images) > 0:
+            return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}
+
+        return {"Changed": False}


In modules, I think "changed" should be all lowercase. Here it doesn't really matter, we don't even need it.

For now we only look for a "failed" key:

openshift-ansible/roles/openshift_health_checker/action_plugins/openshift_health_check.py

Lines 73 to 75 in f0a32af

if r.get("failed", False):

result["failed"] = True

result["msg"] = "One or more checks failed"

rhcarvalho · 2017-02-23T09:41:59Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+                matched_images.add(tag[0])
+
+        missing_images = set(self.rpm_docker_images()) - matched_images
+        if len(missing_images) > 0:


In Python, it is idiomatic to write:

if missing_images: ...

rhcarvalho · 2017-02-23T09:44:47Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+
+        missing_images = set(self.rpm_docker_images()) - matched_images
+        if len(missing_images) > 0:
+            return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}


The ":\n" suffix seems extraneous.

Instead of printing the list directly, maybe you can convert it to a string with something like ", ".join(missing_images).

The difference is:

There are missing docker images ...: ["foo", "bar"]

x

There are missing docker images ...: foo, bar

rhcarvalho · 2017-02-23T09:50:36Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        # if we find any docker images, attempt to do 1-to-1 match between
+        # each image description name key and items from rpm_docker_images
+        matched_images = set()
+        for i in docker_images['images']:


Please don't call it i. i is a good name for indices in a list... here image is probably better.

rhcarvalho · 2017-02-23T09:54:03Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        openshift_release = get_var(task_vars, "openshift_release")
+
+        args = {
+            "name": self.rpm_docker_images(),


Shall we attempt to include version/tag in the images?
Without an explicit :tag, docker_image_facts will look for :latest.

@rhcarvalho what do you think of instead having a method like docker_images_for() that takes a deployment_type like origin or openshift-enterprise and then a version? That method could contain the logic to determining if a image uses tags like v3.4 or v3.4.1.40.

rhcarvalho · 2017-02-23T09:55:10Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        }
+
+        docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)
+


Perhaps the first thing I'd do after calling the module is checking for a "failed" key with value True. If it failed, we can abort immediately.

rhcarvalho · 2017-02-23T09:55:44Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+
+        docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)
+
+        if len(docker_images['images']) == 0:


if not docker_images['images']:

rhcarvalho · 2017-02-23T09:57:29Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

@@ -0,0 +1,49 @@
+# pylint: disable=missing-docstring
+from openshift_checks import OpenShiftCheck, get_var
+from openshift_checks.mixins import NotContainerized


Please rebase, this was renamed NotContainerizedMixin yesterday.

juanvallejo · 2017-02-23T20:33:53Z

@sdodson

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

Thanks, went ahead and removed the registry hostname from image urls. With the docker_image module, what is there now should be enough to prompt docker to search in all specified registries.

@sdodson @rhcarvalho Thanks for the feedback, review comments addressed

rhcarvalho · 2017-02-27T10:15:37Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        openshift_release = get_var(task_vars, "openshift_release")
+
+        # attempt to fetch fully qualified image tag
+        openshift_facts = self.module_executor("openshift_facts", {}, tmp, task_vars)


Because checks commonly depend on variables set by openshift_facts, it is guaranteed to have run -- it is a dependency of the openshift_health_checker role.

There is no need to run it once again. All the variables you need are in task_vars already.

rhcarvalho · 2017-02-27T10:17:18Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        failed_pulls = set()
+
+        # attempt to pull each image and fail on
+        # the first one that cannot be fetched


This looks like function/method docstring. It would be best to put it on the method definition, not on its call site.

rhcarvalho · 2017-02-27T10:18:13Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        # attempt to pull each image and fail on
+        # the first one that cannot be fetched
+        pulled_images = self.pull_images(self.rpm_docker_images(), openshift_release, tmp, task_vars)
+        if "failed" in pulled_images and "failed_images" in pulled_images:


It is probably enough to check for a single condition? Looks redundant.

rhcarvalho · 2017-02-27T10:22:33Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        if failed_pulls:
+            return {"failed": True, "msg": "Failed to pull required docker images: %s" % failed_pulls}
+
+        return {"changed": True}


We should probably mark changed: true in other code-paths as well (e.g., some images were pulled, but there was an error later on). And then, just like we verify if any check has failed: true, we need to verify if any check has changed: True and add that key/value to the result of the openshift_health_checker action plugin.

If this is not clear, I can help implementing.

@rhcarvalho thanks, this makes sense and I definitely agree, however I think this effort should be part of a separate PR

EDIT: at least for this PR, I have gone ahead and updated returned dictionaries to include changed: True if at least one image succeeded in being pulled

This is the first check the intently makes changes, that's why I mentioned surfacing that in the runner.

Ah, makes sense. I added a commit that surfaces this so that a failure message (with at least one image successfully pulled) looks like:

Failure summary: 1. Host: <host> Play: Run Docker image checks Task: openshift_health_check Changed: True Message: One or more checks failed Details: {u'docker_image_availability': {'changed': True, 'failed': True, 'msg': u"Failed to pull required docker images: set(['openshift3/ose-deployer:v3.4.2.7', 'openshift3/ose-haproxy-router:v3.4.2.7', 'openshift3/ose-pod:v3.4.2.7', 'openshift3/ose-docker-registry:v3.4.2.7'])"}}

@rhcarvalho if we are going to allow for checks that modify system state, would it also make sense to allow for skipping checks that may mutate state? It might also make sense to make sure we implement check_mode for those checks as well.

Yes, we discussed that in a call. I've created a trello card to track being able to disable checks that match a tag / check mode.

Perhaps will be easier to tag checks known to be idempotent with a idempotent tag.

rhcarvalho · 2017-02-27T10:28:12Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+            if "openshift" in facts["ansible_facts"]:
+                if "common" in facts["ansible_facts"]["openshift"]:
+                    if "version" in facts["ansible_facts"]["openshift"]["common"]:
+                        return facts["ansible_facts"]["openshift"]["common"]["version"]


This could be written more simply as

openshift_image_tag = get_var("openshift", "common", "version", default=None)

(Doesn't even need this method)

rhcarvalho · 2017-02-27T10:41:36Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+            "openshift3/ose-pod",
+        ]
+
+    def pull_images(self, images, tag, tmp, task_vars):


General tip: code is easier to test, understand and compose if you keep in mind the Single Responsibility Principle. Try to make functions / methods that do a single thing. Here, pull_images do at least two things: pulling images and appending tags to images.
My suggestion is to break this into two "functions": one that appends the tag. One that pulls whatever image it is given.

Appending a tag to a sequence of images:

images_with_tag = [image + ":" + tag for image in images_without_tag]

Pulling:

result = [ self.module_executor("docker_image", {"name": image}, tmp, task_vars) for image in images_with_tag ]

rhcarvalho · 2017-02-27T10:42:21Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+            }
+
+            res = self.module_executor("docker_image", args, tmp, task_vars)
+            if "failed" in res:


Subtle bug: "failed" might be in the result, but it could be failed: False...

juanvallejo · 2017-02-27T19:30:07Z

@rhcarvalho Thanks for the feedback, review comments addressed

detiber · 2017-02-28T03:41:05Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+
+        # pull each required image that requires a qualified image
+        # tag (v0.0.0.0) rather than a standard release format tag (v0.0)
+        pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)


Would using skopeo make sense here to verify images without pulling them?

Output of skopeo inspect docker://docker.io/openshift/origin:

{ "Name": "docker.io/openshift/origin", "Tag": "latest", "Digest": "sha256:7fe04a3bc9dc0508f846980888c948914663146c7cabc0dc0beb06169d75f21c", "RepoTags": [ "b596f940a813a660be8982c37a2fdac29641b5f5", "beta1", "latest", "testing", "v0.2.1", "v0.2.2", "v0.3.1", "v0.3.2", "v0.3.3", "v0.3.4", "v0.3", "v0.4.1", "v0.4.2", "v0.4.3", "v0.4.4", "v0.4", "v0.5.1", "v0.5.2", "v0.5.3", "v0.5.4", "v0.5", "v0.6.1", "v0.6.2", "v0.6", "v1.0.0", "v1.0.1", "v1.0.2", "v1.0.3", "v1.0.4", "v1.0.5", "v1.0.6", "v1.0.7", "v1.0.8", "v1.1.1.1", "v1.1.1", "v1.1.2", "v1.1.3", "v1.1.4", "v1.1.5", "v1.1.6", "v1.1", "v1.2.0-rc1", "v1.2.0-rc2", "v1.2.0", "v1.2.1", "v1.2.2", "v1.3.0-alpha.0", "v1.3.0-alpha.1", "v1.3.0-alpha.2", "v1.3.0-alpha.3", "v1.3.0-rc1", "v1.3.0", "v1.3.1", "v1.3.2", "v1.3.3", "v1.4.0-alpha.0", "v1.4.0-alpha.1", "v1.4.0-rc1", "v1.4.0", "v1.4.1", "v1.5.0-alpha.0", "v1.5.0-alpha.1", "v1.5.0-alpha.2", "v1.5.0-alpha.3" ], "Created": "2017-02-28T01:26:25.908410645Z", "DockerVersion": "1.12.6", "Labels": { "build-date": "20161214", "io.k8s.description": "OpenShift Origin is a platform for developing, building, and deploying containerized applications. See https://docs.openshift.org/latest for more on running OpenShift Origin.", "io.k8s.display-name": "OpenShift Origin Application Platform", "license": "GPLv2", "name": "CentOS Base Image", "vendor": "CentOS" }, "Architecture": "amd64", "Os": "linux", "Layers": [ "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:e24e870d9c831b7ee97c3f763d4118acea93781746b797665ceb4fb0fc327e25", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:6a8f42f1ab1e81a4365cce09ea66bc65e30f436b2754ab879242c1fe9ace11b8", "sha256:190f15d3619652dbfdbf3a8d666f3b04bfeb27138050aa5fa993601af462c802", "sha256:8a54e0c0e9d1ebd81c6915f8fdac5a14a6b4c0a0786a4721b879c032d3e4efaa", "sha256:ebfb90719f35b6847ab381c48b887942c09621a5b4392e82d7f18601bf71922b", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4", "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" ] }

We did talk about using skopeo, but I think we started with the docker* modules for they're readily available... Do we install skopeo as part of the install?

For preflight checks, we cannot assume skopeo is installed. Installing would be one more side effect for the check... something to discuss ;-)

We don't currently install skopeo as part of the install, but as we move towards system containers, it will become a hard dependency at some point. Even without skopeo, it may make sense to default to querying the registry API to search tags for images, rather than pulling images. It would also allow us to be a bit fuzzier on finding a matching container image.

One thing we asked ourselves yesterday is whether skopeo would to the same registry resolution as a docker pull would (trying to default to different registries based on config, e.g. docker.io, registry.access.redhat.com, ...). @detiber do you happen to know?

And while we could use skopeo once it is installed, for preflight checks (before installation), we'd need to rely on pre-installed software or install it as part of the check (one more side effect).

@detiber, can you explain what you mean by "allow us to be a bit fuzzier on finding a matching container image"?

@juanvallejo, thanks for digging in to skopeo. We need to find the right balance between idempotence and usefullness. Here's a question, is there already a skopeo container? I really don't want to require our checks to need any sort of subscription-manager work to register a system to pull RPMs. Until skopeo is somehow built in it seems OK in my mind to pull a container in order to run it.

@rhcarvalho's comments are still valid. We'd have to be completely convinced we aren't going to have skopeo report that the correct images are available only to have a docker pull fail because a misconfigured registry is shadowing registry.access.redhat.com.

@juanvallejo, To test this further I'd be curious to know what happens if the same repository exists in two registries yet a specific tag only exists in a "shadowed" registry configured with docker. To test you could set up a registry using the steps at https://docs.docker.com/registry/deploying/ that exposed an image called /openshift3/ose-pod:latest. Then configure that registry to be used by a docker pull with ADD_REGISTRY in /etc/sysconfig/docker. Put it before the entry for registry.access.redhat.com. Then make sure docker pull openshift3/ose-pod:v3.4 still works. It should first check the misconfigured registry and see that the image is missing and precede to check registry.access.redhat.com.

@brenton I'm not sure a system container is sufficient in this case either, not all systems will have docker available.

I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check.

rhcarvalho · 2017-02-28T11:37:58Z

roles/openshift_health_checker/callback_plugins/zz_failure_summary.py

@@ -79,6 +79,7 @@ def _format_failure(failure):
        (u'Host', host),
        (u'Play', play),
        (u'Task', task),
+        (u'Changed', stringc(result._result.get('changed', u'???'), C.COLOR_CHANGED)),


This output was focused on listing a summary of failures. Not sure about changed True/False being valuable there.

rhcarvalho · 2017-02-28T11:42:13Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

@@ -73,6 +73,7 @@ def run(self, tmp=None, task_vars=None):
            if r.get("failed", False):
                result["failed"] = True
                result["msg"] = "One or more checks failed"
+                result["changed"] = r.get("changed", False)


Strange logic, why would we add the "changed" key only when a check failed?

We can always add "changed", and compute it from the check results:

result["changed"] = any(r.get("changed", False) for r in check_results.values())

rhcarvalho · 2017-02-28T11:52:07Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+from openshift_checks.mixins import NotContainerizedMixin
+
+
+class DockerImageAvailability(NotContainerizedMixin, OpenShiftCheck):


I think this check could work for both containerized and non-containerized installs?

Agreed, and actually it would influence which images are required. For a containerized install you would want the openshift/origin or openshift3/ose image to be available in addition to pod/registry/router/etc.

rhcarvalho · 2017-02-28T11:56:25Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+            failed_pulls.update(pulled_images["failed_images"])
+
+        if not openshift_image_tag:
+            if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()):


Logic: if we only tried to pull len(self.rpm_docker_images()) images above, and len(self.rpm_qualified_docker_images()) > 0, then the condition in this if-clause is always true.

By construction we have len(failed_pulls) <= len(self.rpm_docker_images()).

Aha, this code seems to be duplicated below...

rhcarvalho · 2017-02-28T11:58:02Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        # pull each required image that requires a qualified image
+        # tag (v0.0.0.0) rather than a standard release format tag (v0.0)
+        pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)
+        if "failed" in pulled_qualified_images and "failed_images" in pulled_qualified_images:


Redundant condition, when "failed" in pulled_qualified_images, by construction, "failed_images" is also in pulled_qualified_images.

rhcarvalho · 2017-02-28T12:01:05Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    @staticmethod
+    def rpm_qualified_docker_images():
+        return [
+            "openshift3/ose-haproxy-router",


I think here we should be able to support both ose and origin depending on the deployment_type.

Yes. I'm also kind of thinking about which images we want to test for... pulling all of them is probably excessive, but it seems like we ought to check for more than just one. Maybe the pod and deployer images at least?

sosiouxme · 2017-02-28T22:09:08Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        changed = False
+        failed_pulls = set()
+
+        pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars)


Can we first check to see if the required images already exist in the local docker store? That way this could support disconnected installs (https://docs.openshift.com/container-platform/3.4/install_config/install/disconnected_install.html).

rhcarvalho · 2017-03-01T10:44:38Z

@juanvallejo there are some linting problems to be addressed:

rhcarvalho · 2017-03-01T10:32:26Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

@@ -74,6 +74,8 @@ def run(self, tmp=None, task_vars=None):
                result["failed"] = True
                result["msg"] = "One or more checks failed"

+            result["changed"] = any(r.get("changed", False) for r in check_results.values())


Wrong indentation level, this should be outside of the for loop.

rhcarvalho · 2017-03-01T10:37:56Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    tags = ["preflight"]
+
+    # attempt to pull required docker images and fail if
+    # any images are missing, or error occurs during pull


This documentation comment would be best in the form of a docstring. I think it could complement the documentation of this class:

class DockerImageAvailability(OpenShiftCheck): """Check that required Docker images are available. This check attempts to pull required docker images and fail if any image is missing or an error occurs during pull. """

For one, that's more idiomatic than having comments on top of a method definition. And while we don't do it today, there is tooling for extracting docstrings and generating documentation -- so, for example, we could generate a page documenting all existing checks in the future.

rhcarvalho · 2017-03-01T10:49:14Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        if "failed_images" in pulled_images:
+            failed_pulls.update(pulled_images["failed_images"])
+
+        if not openshift_image_tag:


I don't understand why do we check on openshift_image_tag?!

And then "if not openshift_image_tag", we always fail?! I don't follow that logic, maybe I need lunch 🍝

Ah, I wrote this check under the assumption that openshift_image_tag could be optionally set in the config (so there would almost certainly be cases where this is not set). If it was the case that it was not set, then I would not be able to pull images with qualified tags, so I failed before getting to that part.

However it looks like I can count on openshift_image_tag to be set by the time this module runs, so I will remove that block.

rhcarvalho · 2017-03-01T10:54:53Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    def qualified_docker_images(is_origin_deployment):
+        if is_origin_deployment:
+            return [
+                "openshift/origin-haproxy-router"


In some places in origin and openshift-ansible we derive image names from a pattern like prefix-component:version.
Example:
https://github.com/rhcarvalho/origin/blob/bd3e362de2a9f849bd2d7d3ff2a372746f35fb2a/pkg/cmd/util/variable/imagetemplate.go#L30

In Origin, the DefaultImagePrefix is set to openshift/origin, and for OCP it is openshift3/ose. I think it would be natural to follow that pattern, and it would be easier to maintain these lists, since we know the image names follow a pattern.

rhcarvalho · 2017-03-01T10:57:30Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+                failed.add(args["name"])
+
+        if failed:
+            return {"failed": True,  "failed_images": failed}


The logic for setting changed could be here, instead of repeated in the callers of pull_images:

https://github.com/openshift/openshift-ansible/pull/3461/files#diff-f6d4c415edd5332159aa1c77eb72b757R35

https://github.com/openshift/openshift-ansible/pull/3461/files#diff-f6d4c415edd5332159aa1c77eb72b757R48

juanvallejo · 2017-03-01T21:38:18Z

@rhcarvalho thanks for the feedback
@sosiouxme will add a local image check using the docker_image_facts ansible module

sosiouxme · 2017-03-03T20:47:45Z

We were talking today about how to get skopeo on the systems so we can use it to check image availability. This is kind of tricky. 1. RPM? Well, there's no guarantee it's there, and we don't really want to install one, especially not knowing the state of repos. Plus we can't install RPMs on Atomic hosts. 2. Docker container? That seems to make sense, except it relies on being able to pull an image with skopeo... in order to check that you can pull the images you're supposed to be able to... and that's a bit circular. Plus, disconnected hosts. It almost seems like we'd have to bundle skopeo itself into an ansible module.

…

On Wed, Mar 1, 2017 at 9:56 PM, Jason DeTiberus ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In roles/openshift_health_checker/openshift_checks/ docker_image_availability.py <#3461 (comment)> : > + + changed = False + failed_pulls = set() + + pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars) + if "failed_images" in pulled_images: + failed_pulls.update(pulled_images["failed_images"]) + + if not openshift_image_tag: + if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()): + changed = True + return {"changed": changed, "failed": True, "msg": "Unable to retrieve fully qualified image tag. Failed to fetch images: %s" % (set(failed_pulls).update(self.rpm_qualified_docker_images()))} + + # pull each required image that requires a qualified image + # tag (v0.0.0.0) rather than a standard release format tag (v0.0) + pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars) @brenton <https://github.com/brenton> I'm not sure a system container is sufficient in this case either, not all systems will have docker available. I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3461 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABz-uic-Mcqew4_kA72DoSa17XzdmoXks5rhi_DgaJpZM4MJX5D> .

brenton · 2017-03-06T13:07:09Z

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

detiber · 2017-03-06T15:19:27Z

We were talking today about how to get skopeo on the systems so we can use
it to check image availability. This is kind of tricky.

RPM? Well, there's no guarantee it's there, and we don't really want to
install one, especially not knowing the state of repos. Plus we can't
install RPMs on Atomic hosts.

On Atomic hosts skopeo is already present, but I agree to the package being problematic.

Docker container? That seems to make sense, except it relies on being
able to pull an image with skopeo... in order to check that you can pull
the images you're supposed to be able to... and that's a bit circular.
Plus, disconnected hosts.

I don't think skopeo is a requirement for containerized tests, since the tests do not require restarting services. We could do pre-req tests using just docker. That said, docker isn't guaranteed to be on a minimal RHEL host either. If we need to set a minimum requirement to run, then requiring atomic instead of just docker as a dependency does not seem bad.

It almost seems like we'd have to bundle skopeo itself into an ansible
module.

This part is tricky, since it's a go binary. We could vendor skopeo in repo to be able to copy it remotely and execute on it, or even copy it as part of the action plugin. If we did that, we would have to have a process for managing the vendored package, though. That said, skopeo is hitting well-defined api endpoints to do the queries, so it might make more sense to just re-implement the logic as opposed to vendoring the binary.

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

I do like the idea of having a container with all the dependencies for openshift-ansible, It doesn't quite solve the chicken/egg scenario of how that container is run initially on a minimal host, though.

juanvallejo · 2017-03-06T22:07:09Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        while len(regs) and len(required_images):
+            current_reg = regs[0]
+            inspect = self.inspect_images(self.docker_path, self.skopeo_image, current_reg, required_images, tmp, task_vars)
+            required_images = inspect.get("failed_images")


This block is ~~exponential~~ quadratic time at worst, however a successful local check helps the number of images it has to process / skips this completely. Also, if skopeo is not able to be pulled, or doesn't exist ahead of time locally, or the "skopeo image" does not contain the skopeo bin for whatever reason, the check fails early before getting to this block. Thoughts?

exponential time at worst

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? ~~Depends on how inspect_images work internally.~~

Still, I think it is clear that we want to check images offline first before doing anything else.

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

:) thanks, also I am calling docker_image_facts in the inspect_local_images func before this check to do a pass on any images that have already been pulled

juanvallejo · 2017-03-06T22:09:45Z

Added a check that uses this skopeo image for now, with a local check that precedes it, at least while we decide how to handle skopeo as a dep

juanvallejo · 2017-03-22T18:43:44Z

@rhcarvalho Thanks for the review, comments addressed. All linting issues seem to stem from the vendored https://travis-ci.org/openshift/openshift-ansible/jobs/213938461#L196 module

rhcarvalho · 2017-03-22T21:05:40Z

roles/openshift_health_checker/library/docker_container.py

+# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5
+# It has been temporarily vendored here because of X, Y, Z.
+# We can remove this file once openshift-ansible requires ansible >= 2.x.x.x.
+# pylint: skip-file


This must be at the very beginning of the module, right after the shebang, otherwise it has no effect.
https://pylint.readthedocs.io/en/latest/faq.html#how-can-i-tell-pylint-to-never-check-a-given-module

I'd also suggest that our own commentary should be the top-most thing, and everything else should be as-is in the original.

Thanks, done!

rhcarvalho · 2017-03-22T22:37:37Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

-        docker_info = self.module_executor("docker_info", {}, task_vars).get("info", "")
+        result = self.module_executor("docker_info", {}, task_vars)
+
+        # do test on module exception - see if this line is useful


Did you have time to check it?

Did you have time to check it?

Yes, at least when a call to the docker client.info() results in a 404 response from the daemon, the following is returned by the docker_info module: http://pastebin.test.redhat.com/467479

Both rc=1 and failed=True are present key-values in the response:

... 'failed': True, 'module_stderr': u'Traceback (most recent call last):\n File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 24, in <module>\n main()\n File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 19, in main\n info=client.info(),\n File "/usr/lib/python2.7/site-packages/docker/api/daemon.py", line 33, in info\n return self._result(self._get(self._url("/inf")), True)\n File "/usr/lib/python2.7/site-packages/docker/client.py", line 178, in _result\n self._raise_for_status(response)\n File "/usr/lib/python2.7/site-packages/docker/client.py", line 173, in _raise_for_status\n raise errors.NotFound(e, response, explanation=explanation)\ndocker.errors.NotFound: 404 Client Error: Not Found ("{"message":"page not found"}")\n', 'module_stdout': u'', 'msg': u'MODULE FAILURE', 'rc': 1}}}

Although, in the case of a module error due to the docker daemon being down, the entire check would fail before getting to this point, due to docker_image failing to connect during the self.update_skopeo_image step

rhcarvalho · 2017-03-22T22:41:50Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        )
+
+        # check how container name collisions are handled
+        # make sure container is removed by module


This was another TODO, did you check it?

Yes, I verified (with docker ps) that in an enterprise install in an aws ec2 instance, the container is not still there after the check completes

Might sound extreme, and it is after midnight here... but if we're writing robust software, we should be thinking about the daemon going down concurrently with the check run? Or some other conditions.

rhcarvalho · 2017-03-22T22:45:55Z

roles/openshift_health_checker/library/docker_container.py

+#!/usr/bin/python
+# pylint: skip-file
+#
+# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5


https://github.com/ansible/ansible/blob/20bf02f/lib/ansible/modules/cloud/docker/docker_container.py

rhcarvalho · 2017-03-22T22:55:43Z

roles/openshift_health_checker/library/docker_container.py

+#
+# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5
+# It has been temporarily vendored here due to issue https://github.com/ansible/ansible/issues/22323
+# We can remove this file once openshift-ansible requires ansible > 2.2.1.0.


Please add one or two blank comment lines here to keep it separate from the copyright notice. Maybe some ASCII art to make the distinction clear, use your own judgement:

# TODO: remove .... # # ------------------------------------------------------------------- # # Copyright 2016 ...

I suggest marking the TODO with a "TODO" comment, those are easier to find by grepping the code base than generic comments. Hopefully that would increase the likelihood we'll remove this file in the future.

Thanks, done

rhcarvalho · 2017-03-22T23:17:42Z

@juanvallejo please rebase / resolve the fixup commit and I'll have a look at this tomorrow morning :-)

rhcarvalho · 2017-03-23T09:33:57Z

@juanvallejo I sent out a commit to make Travis GREEN ;)
Please rebase so that we can start the merge dance.

rhcarvalho · 2017-03-23T09:37:22Z

roles/openshift_health_checker/library/docker_container.py

@@ -0,0 +1,2036 @@
+#!/usr/bin/python
+# pylint: skip-file
+# flake8: noqa


@juanvallejo FYI this line disable "flake8" (checks formatting and some common problems), while the other one disables "pylint". The two tools have some overlap, but also some unique features. We happen to use both for now.

rhcarvalho · 2017-03-23T09:42:35Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+            return {
+                "failed": True,
+                "changed": changed,
+                "msg": "Failed to update Skopeo image.",


Perhaps here it would be useful to include the original error message from trying to update the image and what the image registry/name was.
Would be useful for us when evaluating a problem report.

@rhcarvalho Thanks, although the registry that the image was attempted to be pulled from is not available as part of the error message (if there is one), I have added the image namespace/name as part of the output.

rhcarvalho · 2017-03-23T09:46:59Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+        return {"changed": changed}
+
+    def required_images(self, task_vars):
+        """ Return a list of all required tagged images """


Please review all of the docstrings. Should use three double quotes, no spaces between the text and the quotes, end with a full-stop (".").

In particular, this docstring don't tell us much more than the function name alone. We'd be fine nuking it.

rhcarvalho · 2017-03-23T09:48:16Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+
+        return images
+
+    def local_images(self, required_images, task_vars):


For consistency, s/required_images/images

rhcarvalho · 2017-03-23T09:55:31Z

roles/openshift_health_checker/openshift_checks/docker_image_availability.py

+    def is_available_skopeo_image(self, image, registry, task_vars):
+        """Uses Skopeo to determine if required image exists in a given registry.
+
+         Returns the set of images not found in the given registry."""


This statement is wrong, this function returns a bool...

There seems to be an extra space here, bad indentation.

Keep consistency in the file -- put the """ in a line on its own.
https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings

This patch adds a check to ensure that required docker images are available in at least one of the registries supplied in an installation host. Images are available if they are either already present locally, or able to be inspected using Skopeo on one of the configured registries.

Due to the use of a restricted name in the core `docker_container` module's result, any standard output of a docker container captured in the module's response was stripped out by ansible. Because of this, we are forced to vendor a patched version of this module, until a new version of ansible is released containing the patched module. This file should be removed once we begin requiring a release of ansible containing the patched `docker_container` module. This patch was taken directly from upstream, with no further changes: 20bf02f6b96356ab5fe68578a3af9462b4ca42a5

rhcarvalho · 2017-03-23T17:56:13Z

aos-ci-test

rhcarvalho

LGTM

rhcarvalho · 2017-03-23T17:58:03Z

[merge]

thanks @juanvallejo

openshift-bot · 2017-03-23T18:02:01Z

a35bd91 - State: success - All Test Contexts: aos-ci-jenkins/OS_unit_tests - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-2-unit-tests-1200/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt

openshift-bot · 2017-03-23T18:07:23Z

Evaluated for openshift ansible merge up to a35bd91

openshift-bot · 2017-03-23T18:07:30Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/84/) (Base Commit: adbf60c)

openshift-bot · 2017-03-23T18:27:45Z

a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_NOT_containerized, aos-ci-jenkins/OS_3.4_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt

openshift-bot · 2017-03-23T18:30:11Z

a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_containerized, aos-ci-jenkins/OS_3.4_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt

openshift-bot · 2017-03-23T18:33:26Z

a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt

openshift-bot · 2017-03-23T18:35:42Z

a35bd91 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1203/a35bd91e8f4a3a08dfdd9bb2a68d4023cb389408.txt

juanvallejo force-pushed the jvallejo/add-docker-image-check branch from 8b7601f to 0cdadd0 Compare February 23, 2017 00:19

juanvallejo commented Feb 23, 2017

View reviewed changes

rhcarvalho reviewed Feb 23, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 4 times, most recently from ff79ebc to a3b13bf Compare February 23, 2017 20:30

juanvallejo force-pushed the jvallejo/add-docker-image-check branch from ba0fa6f to 5f93de9 Compare February 24, 2017 16:44

rhcarvalho reviewed Feb 27, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 3 times, most recently from a325632 to 398d69d Compare February 27, 2017 19:27

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 3 times, most recently from 5c871e9 to 24134cf Compare February 27, 2017 20:45

detiber reviewed Feb 28, 2017

View reviewed changes

rhcarvalho reviewed Feb 28, 2017

View reviewed changes

sosiouxme reviewed Feb 28, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch from 83f88db to 314b464 Compare March 1, 2017 00:11

rhcarvalho suggested changes Mar 1, 2017

View reviewed changes

juanvallejo commented Mar 6, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from 3124655 to ff60737 Compare March 22, 2017 18:12

rhcarvalho reviewed Mar 22, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from 8011688 to 28c8afb Compare March 22, 2017 21:31

rhcarvalho reviewed Mar 22, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from ed96d5c to 0e980ff Compare March 22, 2017 23:12

rhcarvalho reviewed Mar 23, 2017

View reviewed changes

juanvallejo force-pushed the jvallejo/add-docker-image-check branch from c0f85f2 to c3d1baf Compare March 23, 2017 15:53

juanvallejo added 2 commits March 23, 2017 12:01

juanvallejo force-pushed the jvallejo/add-docker-image-check branch from c3d1baf to a35bd91 Compare March 23, 2017 16:01

rhcarvalho approved these changes Mar 23, 2017

View reviewed changes

openshift-bot merged commit 8751f88 into openshift:master Mar 23, 2017

juanvallejo deleted the jvallejo/add-docker-image-check branch March 23, 2017 19:31

rhcarvalho mentioned this pull request Mar 24, 2017

Add unit tests for health checks #3745

Merged

	if r.get("failed", False):
	result["failed"] = True
	result["msg"] = "One or more checks failed"

		}

		docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)


		docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)

		if len(docker_images['images']) == 0:

		from openshift_checks.mixins import NotContainerizedMixin


		class DockerImageAvailability(NotContainerizedMixin, OpenShiftCheck):


		return images

		def local_images(self, required_images, task_vars):

add docker_image_availability check #3461

add docker_image_availability check #3461

Conversation

juanvallejo commented Feb 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdodson Feb 23, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdodson commented Feb 23, 2017

rhcarvalho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo commented Feb 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo Feb 27, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo commented Feb 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sosiouxme Feb 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhcarvalho commented Mar 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo commented Mar 1, 2017

sosiouxme commented Mar 3, 2017 via email

brenton commented Mar 6, 2017

detiber commented Mar 6, 2017

juanvallejo Mar 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo commented Mar 6, 2017

juanvallejo commented Mar 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanvallejo Mar 22, 2017 • edited

Choose a reason for hiding this comment

sdodson Feb 23, 2017 •

edited

juanvallejo Feb 27, 2017 •

edited

sosiouxme Feb 28, 2017 •

edited

juanvallejo Mar 6, 2017 •

edited

juanvallejo Mar 22, 2017 •

edited