Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docker_image_availability check #3461

Merged

Conversation

juanvallejo
Copy link
Contributor

Something to get a rough idea of how this check might be implemented.

cc @rhcarvalho @brenton

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch from 8b7601f to 0cdadd0 Compare February 23, 2017 00:19
def rpm_docker_images():
return [
"docker.io/openshift/origin",
"registry.access.redhat.com/openshift3/ose-haproxy-router",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this list is correct, also, I realize that since we are using our registry rather than docker.io, maybe instead of using the ansible docker_image_facts module, we could run oc get images ... on the host machine and verify that output instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

Copy link
Member

@sdodson sdodson Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be amazing if openshift had a function that would dump the entire list of images possible for a given format string. That way we leave ownership ownership of which images are required for the core product within the openshift binary itself. But that would require changes to origin and we can't rely on that until it's widely available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound awesome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's RFE that idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't cover things like image streams or items from templates like registry-console, metrics, and logging. But it'd be enough to ensure the core platform worked.

@sdodson
Copy link
Member

sdodson commented Feb 23, 2017

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

#oreg_url=example.com/openshift3/ose-${component}:${version}

Copy link
Contributor

@rhcarvalho rhcarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I have some comments, this is in a good direction.

def rpm_docker_images():
return [
"docker.io/openshift/origin",
"registry.access.redhat.com/openshift3/ose-haproxy-router",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

@staticmethod
def rpm_docker_images():
return [
"docker.io/openshift/origin",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect at least 2 different lists, or a sort of templating on the image names depending on OCP(OSE) / Origin.

if len(missing_images) > 0:
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}

return {"Changed": False}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In modules, I think "changed" should be all lowercase. Here it doesn't really matter, we don't even need it.

For now we only look for a "failed" key:

if r.get("failed", False):
result["failed"] = True
result["msg"] = "One or more checks failed"

matched_images.add(tag[0])

missing_images = set(self.rpm_docker_images()) - matched_images
if len(missing_images) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, it is idiomatic to write:

if missing_images:
    ...


missing_images = set(self.rpm_docker_images()) - matched_images
if len(missing_images) > 0:
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ":\n" suffix seems extraneous.

Instead of printing the list directly, maybe you can convert it to a string with something like ", ".join(missing_images).

The difference is:

There are missing docker images ...: ["foo", "bar"]

x

There are missing docker images ...: foo, bar

# if we find any docker images, attempt to do 1-to-1 match between
# each image description name key and items from rpm_docker_images
matched_images = set()
for i in docker_images['images']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't call it i. i is a good name for indices in a list... here image is probably better.

openshift_release = get_var(task_vars, "openshift_release")

args = {
"name": self.rpm_docker_images(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we attempt to include version/tag in the images?
Without an explicit :tag, docker_image_facts will look for :latest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhcarvalho what do you think of instead having a method like docker_images_for() that takes a deployment_type like origin or openshift-enterprise and then a version? That method could contain the logic to determining if a image uses tags like v3.4 or v3.4.1.40.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

}

docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the first thing I'd do after calling the module is checking for a "failed" key with value True. If it failed, we can abort immediately.


docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)

if len(docker_images['images']) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not docker_images['images']:

@@ -0,0 +1,49 @@
# pylint: disable=missing-docstring
from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase, this was renamed NotContainerizedMixin yesterday.

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 4 times, most recently from ff79ebc to a3b13bf Compare February 23, 2017 20:30
@juanvallejo
Copy link
Contributor Author

@sdodson

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

Thanks, went ahead and removed the registry hostname from image urls. With the docker_image module, what is there now should be enough to prompt docker to search in all specified registries.

@sdodson @rhcarvalho Thanks for the feedback, review comments addressed

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch from ba0fa6f to 5f93de9 Compare February 24, 2017 16:44
openshift_release = get_var(task_vars, "openshift_release")

# attempt to fetch fully qualified image tag
openshift_facts = self.module_executor("openshift_facts", {}, tmp, task_vars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because checks commonly depend on variables set by openshift_facts, it is guaranteed to have run -- it is a dependency of the openshift_health_checker role.

There is no need to run it once again. All the variables you need are in task_vars already.

failed_pulls = set()

# attempt to pull each image and fail on
# the first one that cannot be fetched
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like function/method docstring. It would be best to put it on the method definition, not on its call site.

# attempt to pull each image and fail on
# the first one that cannot be fetched
pulled_images = self.pull_images(self.rpm_docker_images(), openshift_release, tmp, task_vars)
if "failed" in pulled_images and "failed_images" in pulled_images:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably enough to check for a single condition? Looks redundant.

if failed_pulls:
return {"failed": True, "msg": "Failed to pull required docker images: %s" % failed_pulls}

return {"changed": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably mark changed: true in other code-paths as well (e.g., some images were pulled, but there was an error later on). And then, just like we verify if any check has failed: true, we need to verify if any check has changed: True and add that key/value to the result of the openshift_health_checker action plugin.

If this is not clear, I can help implementing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhcarvalho thanks, this makes sense and I definitely agree, however I think this effort should be part of a separate PR

EDIT: at least for this PR, I have gone ahead and updated returned dictionaries to include changed: True if at least one image succeeded in being pulled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first check the intently makes changes, that's why I mentioned surfacing that in the runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. I added a commit that surfaces this so that a failure message (with at least one image successfully pulled) looks like:

Failure summary:

  1. Host:     <host>
     Play:     Run Docker image checks
     Task:     openshift_health_check
     Changed:  True
     Message:  One or more checks failed
     Details:  {u'docker_image_availability': {'changed': True,
                                               'failed': True,
                                               'msg': u"Failed to pull required docker images: set(['openshift3/ose-deployer:v3.4.2.7', 'openshift3/ose-haproxy-router:v3.4.2.7', 'openshift3/ose-pod:v3.4.2.7', 'openshift3/ose-docker-registry:v3.4.2.7'])"}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhcarvalho if we are going to allow for checks that modify system state, would it also make sense to allow for skipping checks that may mutate state? It might also make sense to make sure we implement check_mode for those checks as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed that in a call. I've created a trello card to track being able to disable checks that match a tag / check mode.

Perhaps will be easier to tag checks known to be idempotent with a idempotent tag.

if "openshift" in facts["ansible_facts"]:
if "common" in facts["ansible_facts"]["openshift"]:
if "version" in facts["ansible_facts"]["openshift"]["common"]:
return facts["ansible_facts"]["openshift"]["common"]["version"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written more simply as

openshift_image_tag = get_var("openshift", "common", "version", default=None)

(Doesn't even need this method)

"openshift3/ose-pod",
]

def pull_images(self, images, tag, tmp, task_vars):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General tip: code is easier to test, understand and compose if you keep in mind the Single Responsibility Principle. Try to make functions / methods that do a single thing. Here, pull_images do at least two things: pulling images and appending tags to images.
My suggestion is to break this into two "functions": one that appends the tag. One that pulls whatever image it is given.

Appending a tag to a sequence of images:

images_with_tag = [image + ":" + tag for image in images_without_tag]

Pulling:

result = [
    self.module_executor("docker_image", {"name": image}, tmp, task_vars)
    for image in images_with_tag
]

}

res = self.module_executor("docker_image", args, tmp, task_vars)
if "failed" in res:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle bug: "failed" might be in the result, but it could be failed: False...

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 3 times, most recently from a325632 to 398d69d Compare February 27, 2017 19:27
@juanvallejo
Copy link
Contributor Author

@rhcarvalho Thanks for the feedback, review comments addressed

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 3 times, most recently from 5c871e9 to 24134cf Compare February 27, 2017 20:45

# pull each required image that requires a qualified image
# tag (v0.0.0.0) rather than a standard release format tag (v0.0)
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using skopeo make sense here to verify images without pulling them?

Output of skopeo inspect docker://docker.io/openshift/origin:

{
    "Name": "docker.io/openshift/origin",
    "Tag": "latest",
    "Digest": "sha256:7fe04a3bc9dc0508f846980888c948914663146c7cabc0dc0beb06169d75f21c",
    "RepoTags": [
        "b596f940a813a660be8982c37a2fdac29641b5f5",
        "beta1",
        "latest",
        "testing",
        "v0.2.1",
        "v0.2.2",
        "v0.3.1",
        "v0.3.2",
        "v0.3.3",
        "v0.3.4",
        "v0.3",
        "v0.4.1",
        "v0.4.2",
        "v0.4.3",
        "v0.4.4",
        "v0.4",
        "v0.5.1",
        "v0.5.2",
        "v0.5.3",
        "v0.5.4",
        "v0.5",
        "v0.6.1",
        "v0.6.2",
        "v0.6",
        "v1.0.0",
        "v1.0.1",
        "v1.0.2",
        "v1.0.3",
        "v1.0.4",
        "v1.0.5",
        "v1.0.6",
        "v1.0.7",
        "v1.0.8",
        "v1.1.1.1",
        "v1.1.1",
        "v1.1.2",
        "v1.1.3",
        "v1.1.4",
        "v1.1.5",
        "v1.1.6",
        "v1.1",
        "v1.2.0-rc1",
        "v1.2.0-rc2",
        "v1.2.0",
        "v1.2.1",
        "v1.2.2",
        "v1.3.0-alpha.0",
        "v1.3.0-alpha.1",
        "v1.3.0-alpha.2",
        "v1.3.0-alpha.3",
        "v1.3.0-rc1",
        "v1.3.0",
        "v1.3.1",
        "v1.3.2",
        "v1.3.3",
        "v1.4.0-alpha.0",
        "v1.4.0-alpha.1",
        "v1.4.0-rc1",
        "v1.4.0",
        "v1.4.1",
        "v1.5.0-alpha.0",
        "v1.5.0-alpha.1",
        "v1.5.0-alpha.2",
        "v1.5.0-alpha.3"
    ],
    "Created": "2017-02-28T01:26:25.908410645Z",
    "DockerVersion": "1.12.6",
    "Labels": {
        "build-date": "20161214",
        "io.k8s.description": "OpenShift Origin is a platform for developing, building, and deploying containerized applications. See https://docs.openshift.org/latest for more on running OpenShift Origin.",
        "io.k8s.display-name": "OpenShift Origin Application Platform",
        "license": "GPLv2",
        "name": "CentOS Base Image",
        "vendor": "CentOS"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:e24e870d9c831b7ee97c3f763d4118acea93781746b797665ceb4fb0fc327e25",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:6a8f42f1ab1e81a4365cce09ea66bc65e30f436b2754ab879242c1fe9ace11b8",
        "sha256:190f15d3619652dbfdbf3a8d666f3b04bfeb27138050aa5fa993601af462c802",
        "sha256:8a54e0c0e9d1ebd81c6915f8fdac5a14a6b4c0a0786a4721b879c032d3e4efaa",
        "sha256:ebfb90719f35b6847ab381c48b887942c09621a5b4392e82d7f18601bf71922b",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
    ]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did talk about using skopeo, but I think we started with the docker* modules for they're readily available... Do we install skopeo as part of the install?

For preflight checks, we cannot assume skopeo is installed. Installing would be one more side effect for the check... something to discuss ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently install skopeo as part of the install, but as we move towards system containers, it will become a hard dependency at some point. Even without skopeo, it may make sense to default to querying the registry API to search tags for images, rather than pulling images. It would also allow us to be a bit fuzzier on finding a matching container image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we asked ourselves yesterday is whether skopeo would to the same registry resolution as a docker pull would (trying to default to different registries based on config, e.g. docker.io, registry.access.redhat.com, ...). @detiber do you happen to know?

And while we could use skopeo once it is installed, for preflight checks (before installation), we'd need to rely on pre-installed software or install it as part of the check (one more side effect).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber, can you explain what you mean by "allow us to be a bit fuzzier on finding a matching container image"?

@juanvallejo, thanks for digging in to skopeo. We need to find the right balance between idempotence and usefullness. Here's a question, is there already a skopeo container? I really don't want to require our checks to need any sort of subscription-manager work to register a system to pull RPMs. Until skopeo is somehow built in it seems OK in my mind to pull a container in order to run it.

@rhcarvalho's comments are still valid. We'd have to be completely convinced we aren't going to have skopeo report that the correct images are available only to have a docker pull fail because a misconfigured registry is shadowing registry.access.redhat.com.

@juanvallejo, To test this further I'd be curious to know what happens if the same repository exists in two registries yet a specific tag only exists in a "shadowed" registry configured with docker. To test you could set up a registry using the steps at https://docs.docker.com/registry/deploying/ that exposed an image called /openshift3/ose-pod:latest. Then configure that registry to be used by a docker pull with ADD_REGISTRY in /etc/sysconfig/docker. Put it before the entry for registry.access.redhat.com. Then make sure docker pull openshift3/ose-pod:v3.4 still works. It should first check the misconfigured registry and see that the image is missing and precede to check registry.access.redhat.com.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brenton I'm not sure a system container is sufficient in this case either, not all systems will have docker available.

I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check.

@@ -79,6 +79,7 @@ def _format_failure(failure):
(u'Host', host),
(u'Play', play),
(u'Task', task),
(u'Changed', stringc(result._result.get('changed', u'???'), C.COLOR_CHANGED)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output was focused on listing a summary of failures. Not sure about changed True/False being valuable there.

@@ -73,6 +73,7 @@ def run(self, tmp=None, task_vars=None):
if r.get("failed", False):
result["failed"] = True
result["msg"] = "One or more checks failed"
result["changed"] = r.get("changed", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange logic, why would we add the "changed" key only when a check failed?

We can always add "changed", and compute it from the check results:

result["changed"] = any(r.get("changed", False) for r in check_results.values())

from openshift_checks.mixins import NotContainerizedMixin


class DockerImageAvailability(NotContainerizedMixin, OpenShiftCheck):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check could work for both containerized and non-containerized installs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and actually it would influence which images are required. For a containerized install you would want the openshift/origin or openshift3/ose image to be available in addition to pod/registry/router/etc.

failed_pulls.update(pulled_images["failed_images"])

if not openshift_image_tag:
if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic: if we only tried to pull len(self.rpm_docker_images()) images above, and len(self.rpm_qualified_docker_images()) > 0, then the condition in this if-clause is always true.

By construction we have len(failed_pulls) <= len(self.rpm_docker_images()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this code seems to be duplicated below...

# pull each required image that requires a qualified image
# tag (v0.0.0.0) rather than a standard release format tag (v0.0)
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)
if "failed" in pulled_qualified_images and "failed_images" in pulled_qualified_images:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant condition, when "failed" in pulled_qualified_images, by construction, "failed_images" is also in pulled_qualified_images.

@staticmethod
def rpm_qualified_docker_images():
return [
"openshift3/ose-haproxy-router",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should be able to support both ose and origin depending on the deployment_type.

Copy link
Member

@sosiouxme sosiouxme Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm also kind of thinking about which images we want to test for... pulling all of them is probably excessive, but it seems like we ought to check for more than just one. Maybe the pod and deployer images at least?

changed = False
failed_pulls = set()

pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first check to see if the required images already exist in the local docker store? That way this could support disconnected installs (https://docs.openshift.com/container-platform/3.4/install_config/install/disconnected_install.html).

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch from 83f88db to 314b464 Compare March 1, 2017 00:11
@rhcarvalho
Copy link
Contributor

@@ -74,6 +74,8 @@ def run(self, tmp=None, task_vars=None):
result["failed"] = True
result["msg"] = "One or more checks failed"

result["changed"] = any(r.get("changed", False) for r in check_results.values())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation level, this should be outside of the for loop.

tags = ["preflight"]

# attempt to pull required docker images and fail if
# any images are missing, or error occurs during pull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation comment would be best in the form of a docstring. I think it could complement the documentation of this class:

class DockerImageAvailability(OpenShiftCheck):
    """Check that required Docker images are available.

    This check attempts to pull required docker images and fail if
    any image is missing or an error occurs during pull.
    """

For one, that's more idiomatic than having comments on top of a method definition. And while we don't do it today, there is tooling for extracting docstrings and generating documentation -- so, for example, we could generate a page documenting all existing checks in the future.

if "failed_images" in pulled_images:
failed_pulls.update(pulled_images["failed_images"])

if not openshift_image_tag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why do we check on openshift_image_tag?!

And then "if not openshift_image_tag", we always fail?! I don't follow that logic, maybe I need lunch 🍝

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wrote this check under the assumption that openshift_image_tag could be optionally set in the config (so there would almost certainly be cases where this is not set). If it was the case that it was not set, then I would not be able to pull images with qualified tags, so I failed before getting to that part.

However it looks like I can count on openshift_image_tag to be set by the time this module runs, so I will remove that block.

def qualified_docker_images(is_origin_deployment):
if is_origin_deployment:
return [
"openshift/origin-haproxy-router"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some places in origin and openshift-ansible we derive image names from a pattern like prefix-component:version.
Example:
https://github.com/rhcarvalho/origin/blob/bd3e362de2a9f849bd2d7d3ff2a372746f35fb2a/pkg/cmd/util/variable/imagetemplate.go#L30

In Origin, the DefaultImagePrefix is set to openshift/origin, and for OCP it is openshift3/ose. I think it would be natural to follow that pattern, and it would be easier to maintain these lists, since we know the image names follow a pattern.

failed.add(args["name"])

if failed:
return {"failed": True, "failed_images": failed}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juanvallejo
Copy link
Contributor Author

@rhcarvalho thanks for the feedback
@sosiouxme will add a local image check using the docker_image_facts ansible module

@sosiouxme
Copy link
Member

sosiouxme commented Mar 3, 2017 via email

@brenton
Copy link
Contributor

brenton commented Mar 6, 2017

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

@detiber
Copy link
Contributor

detiber commented Mar 6, 2017

We were talking today about how to get skopeo on the systems so we can use
it to check image availability. This is kind of tricky.

  1. RPM? Well, there's no guarantee it's there, and we don't really want to
    install one, especially not knowing the state of repos. Plus we can't
    install RPMs on Atomic hosts.

On Atomic hosts skopeo is already present, but I agree to the package being problematic.

  1. Docker container? That seems to make sense, except it relies on being
    able to pull an image with skopeo... in order to check that you can pull
    the images you're supposed to be able to... and that's a bit circular.
    Plus, disconnected hosts.

I don't think skopeo is a requirement for containerized tests, since the tests do not require restarting services. We could do pre-req tests using just docker. That said, docker isn't guaranteed to be on a minimal RHEL host either. If we need to set a minimum requirement to run, then requiring atomic instead of just docker as a dependency does not seem bad.

It almost seems like we'd have to bundle skopeo itself into an ansible
module.

This part is tricky, since it's a go binary. We could vendor skopeo in repo to be able to copy it remotely and execute on it, or even copy it as part of the action plugin. If we did that, we would have to have a process for managing the vendored package, though. That said, skopeo is hitting well-defined api endpoints to do the queries, so it might make more sense to just re-implement the logic as opposed to vendoring the binary.

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

I do like the idea of having a container with all the dependencies for openshift-ansible, It doesn't quite solve the chicken/egg scenario of how that container is run initially on a minimal host, though.

while len(regs) and len(required_images):
current_reg = regs[0]
inspect = self.inspect_images(self.docker_path, self.skopeo_image, current_reg, required_images, tmp, task_vars)
required_images = inspect.get("failed_images")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is exponential quadratic time at worst, however a successful local check helps the number of images it has to process / skips this completely. Also, if skopeo is not able to be pulled, or doesn't exist ahead of time locally, or the "skopeo image" does not contain the skopeo bin for whatever reason, the check fails early before getting to this block. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exponential time at worst

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

Still, I think it is clear that we want to check images offline first before doing anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

:) thanks, also I am calling docker_image_facts in the inspect_local_images func before this check to do a pass on any images that have already been pulled

@juanvallejo
Copy link
Contributor Author

Added a check that uses this skopeo image for now, with a local check that precedes it, at least while we decide how to handle skopeo as a dep

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from 3124655 to ff60737 Compare March 22, 2017 18:12
@juanvallejo
Copy link
Contributor Author

@rhcarvalho Thanks for the review, comments addressed. All linting issues seem to stem from the vendored https://travis-ci.org/openshift/openshift-ansible/jobs/213938461#L196 module

# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5
# It has been temporarily vendored here because of X, Y, Z.
# We can remove this file once openshift-ansible requires ansible >= 2.x.x.x.
# pylint: skip-file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be at the very beginning of the module, right after the shebang, otherwise it has no effect.
https://pylint.readthedocs.io/en/latest/faq.html#how-can-i-tell-pylint-to-never-check-a-given-module

I'd also suggest that our own commentary should be the top-most thing, and everything else should be as-is in the original.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done!

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from 8011688 to 28c8afb Compare March 22, 2017 21:31
docker_info = self.module_executor("docker_info", {}, task_vars).get("info", "")
result = self.module_executor("docker_info", {}, task_vars)

# do test on module exception - see if this line is useful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have time to check it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have time to check it?

Yes, at least when a call to the docker client.info() results in a 404 response from the daemon, the following is returned by the docker_info module: http://pastebin.test.redhat.com/467479

Both rc=1 and failed=True are present key-values in the response:

...
'failed': True,
                                                       'module_stderr': u'Traceback (most recent call last):\n  File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 24, in <module>\n    main()\n  File "/tmp/ansible_mKSMn8/ansible_module_docker_info.py", line 19, in main\n    info=client.info(),\n  File "/usr/lib/python2.7/site-packages/docker/api/daemon.py", line 33, in info\n    return self._result(self._get(self._url("/inf")), True)\n  File "/usr/lib/python2.7/site-packages/docker/client.py", line 178, in _result\n    self._raise_for_status(response)\n  File "/usr/lib/python2.7/site-packages/docker/client.py", line 173, in _raise_for_status\n    raise errors.NotFound(e, response, explanation=explanation)\ndocker.errors.NotFound: 404 Client Error: Not Found ("{"message":"page not found"}")\n',
                                                       'module_stdout': u'',
                                                       'msg': u'MODULE FAILURE',
                                                       'rc': 1}}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, in the case of a module error due to the docker daemon being down, the entire check would fail before getting to this point, due to docker_image failing to connect during the self.update_skopeo_image step

)

# check how container name collisions are handled
# make sure container is removed by module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was another TODO, did you check it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I verified (with docker ps) that in an enterprise install in an aws ec2 instance, the container is not still there after the check completes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might sound extreme, and it is after midnight here... but if we're writing robust software, we should be thinking about the daemon going down concurrently with the check run? Or some other conditions.

#!/usr/bin/python
# pylint: skip-file
#
# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#
# This file is a copy of https://github.com/ansible/ansible/commit/20bf02f6b96356ab5fe68578a3af9462b4ca42a5
# It has been temporarily vendored here due to issue https://github.com/ansible/ansible/issues/22323
# We can remove this file once openshift-ansible requires ansible > 2.2.1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add one or two blank comment lines here to keep it separate from the copyright notice. Maybe some ASCII art to make the distinction clear, use your own judgement:

    # TODO: remove ....
    #
    # -------------------------------------------------------------------
    #
    # Copyright 2016 ...
  • I suggest marking the TODO with a "TODO" comment, those are easier to find by grepping the code base than generic comments. Hopefully that would increase the likelihood we'll remove this file in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch 2 times, most recently from ed96d5c to 0e980ff Compare March 22, 2017 23:12
@rhcarvalho
Copy link
Contributor

@juanvallejo please rebase / resolve the fixup commit and I'll have a look at this tomorrow morning :-)

@rhcarvalho
Copy link
Contributor

@juanvallejo I sent out a commit to make Travis GREEN ;)
Please rebase so that we can start the merge dance.

@@ -0,0 +1,2036 @@
#!/usr/bin/python
# pylint: skip-file
# flake8: noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juanvallejo FYI this line disable "flake8" (checks formatting and some common problems), while the other one disables "pylint". The two tools have some overlap, but also some unique features. We happen to use both for now.

return {
"failed": True,
"changed": changed,
"msg": "Failed to update Skopeo image.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps here it would be useful to include the original error message from trying to update the image and what the image registry/name was.
Would be useful for us when evaluating a problem report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhcarvalho Thanks, although the registry that the image was attempted to be pulled from is not available as part of the error message (if there is one), I have added the image namespace/name as part of the output.

return {"changed": changed}

def required_images(self, task_vars):
""" Return a list of all required tagged images """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review all of the docstrings. Should use three double quotes, no spaces between the text and the quotes, end with a full-stop (".").

In particular, this docstring don't tell us much more than the function name alone. We'd be fine nuking it.


return images

def local_images(self, required_images, task_vars):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, s/required_images/images

def is_available_skopeo_image(self, image, registry, task_vars):
"""Uses Skopeo to determine if required image exists in a given registry.

Returns the set of images not found in the given registry."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This statement is wrong, this function returns a bool...
  2. There seems to be an extra space here, bad indentation.
  3. Keep consistency in the file -- put the """ in a line on its own.
    https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings

@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch from c0f85f2 to c3d1baf Compare March 23, 2017 15:53
This patch adds a check to ensure that required docker images are
available in at least one of the registries supplied in an installation
host. Images are available if they are either already present locally,
or able to be inspected using Skopeo on one of the configured
registries.
Due to the use of a restricted name in the core `docker_container`
module's result, any standard output of a docker container captured
in the module's response was stripped out by ansible.

Because of this, we are forced to vendor a patched version of this
module, until a new version of ansible is released containing the
patched module.

This file should be removed once we begin requiring a release of ansible
containing the patched `docker_container` module.

This patch was taken directly from upstream, with no further changes:
20bf02f6b96356ab5fe68578a3af9462b4ca42a5
@juanvallejo juanvallejo force-pushed the jvallejo/add-docker-image-check branch from c3d1baf to a35bd91 Compare March 23, 2017 16:01
@rhcarvalho
Copy link
Contributor

aos-ci-test

Copy link
Contributor

@rhcarvalho rhcarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhcarvalho
Copy link
Contributor

[merge]

thanks @juanvallejo

@openshift-bot
Copy link

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to a35bd91

@openshift-bot
Copy link

openshift-bot commented Mar 23, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/84/) (Base Commit: adbf60c)

@openshift-bot openshift-bot merged commit 8751f88 into openshift:master Mar 23, 2017
@juanvallejo juanvallejo deleted the jvallejo/add-docker-image-check branch March 23, 2017 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants