New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docker_image_availability check #3461

Merged
merged 2 commits into from Mar 23, 2017

Conversation

Projects
None yet
7 participants
@juanvallejo
Member

juanvallejo commented Feb 23, 2017

Something to get a rough idea of how this check might be implemented.

cc @rhcarvalho @brenton

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
def rpm_docker_images():
return [
"docker.io/openshift/origin",
"registry.access.redhat.com/openshift3/ose-haproxy-router",

This comment has been minimized.

@juanvallejo

juanvallejo Feb 23, 2017

Member

Not sure if this list is correct, also, I realize that since we are using our registry rather than docker.io, maybe instead of using the ansible docker_image_facts module, we could run oc get images ... on the host machine and verify that output instead

@juanvallejo

juanvallejo Feb 23, 2017

Member

Not sure if this list is correct, also, I realize that since we are using our registry rather than docker.io, maybe instead of using the ansible docker_image_facts module, we could run oc get images ... on the host machine and verify that output instead

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

This comment has been minimized.

@sdodson

sdodson Feb 23, 2017

Member

I think it'd be amazing if openshift had a function that would dump the entire list of images possible for a given format string. That way we leave ownership ownership of which images are required for the core product within the openshift binary itself. But that would require changes to origin and we can't rely on that until it's widely available.

@sdodson

sdodson Feb 23, 2017

Member

I think it'd be amazing if openshift had a function that would dump the entire list of images possible for a given format string. That way we leave ownership ownership of which images are required for the core product within the openshift binary itself. But that would require changes to origin and we can't rely on that until it's widely available.

This comment has been minimized.

@brenton

brenton Feb 23, 2017

Contributor

That does sound awesome.

@brenton

brenton Feb 23, 2017

Contributor

That does sound awesome.

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Let's RFE that idea!

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Let's RFE that idea!

This comment has been minimized.

@sdodson

sdodson Feb 23, 2017

Member

It wouldn't cover things like image streams or items from templates like registry-console, metrics, and logging. But it'd be enough to ensure the core platform worked.

@sdodson

sdodson Feb 23, 2017

Member

It wouldn't cover things like image streams or items from templates like registry-console, metrics, and logging. But it'd be enough to ensure the core platform worked.

@sdodson

This comment has been minimized.

Show comment
Hide comment
@sdodson

sdodson Feb 23, 2017

Member

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

#oreg_url=example.com/openshift3/ose-${component}:${version}

Member

sdodson commented Feb 23, 2017

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

#oreg_url=example.com/openshift3/ose-${component}:${version}

@rhcarvalho

Though I have some comments, this is in a good direction.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
def rpm_docker_images():
return [
"docker.io/openshift/origin",
"registry.access.redhat.com/openshift3/ose-haproxy-router",

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

As @sdodson said in #3461 (comment), we need to support the case when the repository prefix is something else.

Perhaps we can get the "something else" precisely from the variable he pointed us at: oreg_url.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
@staticmethod
def rpm_docker_images():
return [
"docker.io/openshift/origin",

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

I'd expect at least 2 different lists, or a sort of templating on the image names depending on OCP(OSE) / Origin.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

I'd expect at least 2 different lists, or a sort of templating on the image names depending on OCP(OSE) / Origin.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
if len(missing_images) > 0:
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}
return {"Changed": False}

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

In modules, I think "changed" should be all lowercase. Here it doesn't really matter, we don't even need it.

For now we only look for a "failed" key:

if r.get("failed", False):
result["failed"] = True
result["msg"] = "One or more checks failed"

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

In modules, I think "changed" should be all lowercase. Here it doesn't really matter, we don't even need it.

For now we only look for a "failed" key:

if r.get("failed", False):
result["failed"] = True
result["msg"] = "One or more checks failed"

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
matched_images.add(tag[0])
missing_images = set(self.rpm_docker_images()) - matched_images
if len(missing_images) > 0:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

In Python, it is idiomatic to write:

if missing_images:
    ...
@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

In Python, it is idiomatic to write:

if missing_images:
    ...
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
missing_images = set(self.rpm_docker_images()) - matched_images
if len(missing_images) > 0:
return {"failed": True, "msg": "There are missing docker images or images that did not match the current OpenShift version (%s): %s:\n" % (openshift_release, missing_images)}

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

The ":\n" suffix seems extraneous.

Instead of printing the list directly, maybe you can convert it to a string with something like ", ".join(missing_images).

The difference is:

There are missing docker images ...: ["foo", "bar"]

x

There are missing docker images ...: foo, bar
@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

The ":\n" suffix seems extraneous.

Instead of printing the list directly, maybe you can convert it to a string with something like ", ".join(missing_images).

The difference is:

There are missing docker images ...: ["foo", "bar"]

x

There are missing docker images ...: foo, bar
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
# if we find any docker images, attempt to do 1-to-1 match between
# each image description name key and items from rpm_docker_images
matched_images = set()
for i in docker_images['images']:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Please don't call it i. i is a good name for indices in a list... here image is probably better.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Please don't call it i. i is a good name for indices in a list... here image is probably better.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
openshift_release = get_var(task_vars, "openshift_release")
args = {
"name": self.rpm_docker_images(),

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Shall we attempt to include version/tag in the images?
Without an explicit :tag, docker_image_facts will look for :latest.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Shall we attempt to include version/tag in the images?
Without an explicit :tag, docker_image_facts will look for :latest.

This comment has been minimized.

@brenton

brenton Feb 23, 2017

Contributor

@rhcarvalho what do you think of instead having a method like docker_images_for() that takes a deployment_type like origin or openshift-enterprise and then a version? That method could contain the logic to determining if a image uses tags like v3.4 or v3.4.1.40.

@brenton

brenton Feb 23, 2017

Contributor

@rhcarvalho what do you think of instead having a method like docker_images_for() that takes a deployment_type like origin or openshift-enterprise and then a version? That method could contain the logic to determining if a image uses tags like v3.4 or v3.4.1.40.

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Yes

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Yes

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
}
docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Perhaps the first thing I'd do after calling the module is checking for a "failed" key with value True. If it failed, we can abort immediately.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Perhaps the first thing I'd do after calling the module is checking for a "failed" key with value True. If it failed, we can abort immediately.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
docker_images = self.module_executor("docker_image_facts", args, tmp, task_vars)
if len(docker_images['images']) == 0:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor
if not docker_images['images']:
@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor
if not docker_images['images']:
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
@@ -0,0 +1,49 @@
# pylint: disable=missing-docstring
from openshift_checks import OpenShiftCheck, get_var
from openshift_checks.mixins import NotContainerized

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Please rebase, this was renamed NotContainerizedMixin yesterday.

@rhcarvalho

rhcarvalho Feb 23, 2017

Contributor

Please rebase, this was renamed NotContainerizedMixin yesterday.

@juanvallejo

This comment has been minimized.

Show comment
Hide comment
@juanvallejo

juanvallejo Feb 23, 2017

Member

@sdodson

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

Thanks, went ahead and removed the registry hostname from image urls. With the docker_image module, what is there now should be enough to prompt docker to search in all specified registries.

@sdodson @rhcarvalho Thanks for the feedback, review comments addressed

Member

juanvallejo commented Feb 23, 2017

@sdodson

Not sure if it's in scope for your work, but it'd be nice if non standard image format strings were handled. For disconnected installs they're pretty much a sure thing.

Thanks, went ahead and removed the registry hostname from image urls. With the docker_image module, what is there now should be enough to prompt docker to search in all specified registries.

@sdodson @rhcarvalho Thanks for the feedback, review comments addressed

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
openshift_release = get_var(task_vars, "openshift_release")
# attempt to fetch fully qualified image tag
openshift_facts = self.module_executor("openshift_facts", {}, tmp, task_vars)

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

Because checks commonly depend on variables set by openshift_facts, it is guaranteed to have run -- it is a dependency of the openshift_health_checker role.

There is no need to run it once again. All the variables you need are in task_vars already.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

Because checks commonly depend on variables set by openshift_facts, it is guaranteed to have run -- it is a dependency of the openshift_health_checker role.

There is no need to run it once again. All the variables you need are in task_vars already.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
failed_pulls = set()
# attempt to pull each image and fail on
# the first one that cannot be fetched

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This looks like function/method docstring. It would be best to put it on the method definition, not on its call site.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This looks like function/method docstring. It would be best to put it on the method definition, not on its call site.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
# attempt to pull each image and fail on
# the first one that cannot be fetched
pulled_images = self.pull_images(self.rpm_docker_images(), openshift_release, tmp, task_vars)
if "failed" in pulled_images and "failed_images" in pulled_images:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

It is probably enough to check for a single condition? Looks redundant.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

It is probably enough to check for a single condition? Looks redundant.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
if failed_pulls:
return {"failed": True, "msg": "Failed to pull required docker images: %s" % failed_pulls}
return {"changed": True}

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

We should probably mark changed: true in other code-paths as well (e.g., some images were pulled, but there was an error later on). And then, just like we verify if any check has failed: true, we need to verify if any check has changed: True and add that key/value to the result of the openshift_health_checker action plugin.

If this is not clear, I can help implementing.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

We should probably mark changed: true in other code-paths as well (e.g., some images were pulled, but there was an error later on). And then, just like we verify if any check has failed: true, we need to verify if any check has changed: True and add that key/value to the result of the openshift_health_checker action plugin.

If this is not clear, I can help implementing.

This comment has been minimized.

@juanvallejo

juanvallejo Feb 27, 2017

Member

@rhcarvalho thanks, this makes sense and I definitely agree, however I think this effort should be part of a separate PR

EDIT: at least for this PR, I have gone ahead and updated returned dictionaries to include changed: True if at least one image succeeded in being pulled

@juanvallejo

juanvallejo Feb 27, 2017

Member

@rhcarvalho thanks, this makes sense and I definitely agree, however I think this effort should be part of a separate PR

EDIT: at least for this PR, I have gone ahead and updated returned dictionaries to include changed: True if at least one image succeeded in being pulled

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This is the first check the intently makes changes, that's why I mentioned surfacing that in the runner.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This is the first check the intently makes changes, that's why I mentioned surfacing that in the runner.

This comment has been minimized.

@juanvallejo

juanvallejo Feb 27, 2017

Member

Ah, makes sense. I added a commit that surfaces this so that a failure message (with at least one image successfully pulled) looks like:

Failure summary:

  1. Host:     <host>
     Play:     Run Docker image checks
     Task:     openshift_health_check
     Changed:  True
     Message:  One or more checks failed
     Details:  {u'docker_image_availability': {'changed': True,
                                               'failed': True,
                                               'msg': u"Failed to pull required docker images: set(['openshift3/ose-deployer:v3.4.2.7', 'openshift3/ose-haproxy-router:v3.4.2.7', 'openshift3/ose-pod:v3.4.2.7', 'openshift3/ose-docker-registry:v3.4.2.7'])"}}
@juanvallejo

juanvallejo Feb 27, 2017

Member

Ah, makes sense. I added a commit that surfaces this so that a failure message (with at least one image successfully pulled) looks like:

Failure summary:

  1. Host:     <host>
     Play:     Run Docker image checks
     Task:     openshift_health_check
     Changed:  True
     Message:  One or more checks failed
     Details:  {u'docker_image_availability': {'changed': True,
                                               'failed': True,
                                               'msg': u"Failed to pull required docker images: set(['openshift3/ose-deployer:v3.4.2.7', 'openshift3/ose-haproxy-router:v3.4.2.7', 'openshift3/ose-pod:v3.4.2.7', 'openshift3/ose-docker-registry:v3.4.2.7'])"}}

This comment has been minimized.

@detiber

detiber Feb 28, 2017

Contributor

@rhcarvalho if we are going to allow for checks that modify system state, would it also make sense to allow for skipping checks that may mutate state? It might also make sense to make sure we implement check_mode for those checks as well.

@detiber

detiber Feb 28, 2017

Contributor

@rhcarvalho if we are going to allow for checks that modify system state, would it also make sense to allow for skipping checks that may mutate state? It might also make sense to make sure we implement check_mode for those checks as well.

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Yes, we discussed that in a call. I've created a trello card to track being able to disable checks that match a tag / check mode.

Perhaps will be easier to tag checks known to be idempotent with a idempotent tag.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Yes, we discussed that in a call. I've created a trello card to track being able to disable checks that match a tag / check mode.

Perhaps will be easier to tag checks known to be idempotent with a idempotent tag.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
if "openshift" in facts["ansible_facts"]:
if "common" in facts["ansible_facts"]["openshift"]:
if "version" in facts["ansible_facts"]["openshift"]["common"]:
return facts["ansible_facts"]["openshift"]["common"]["version"]

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This could be written more simply as

openshift_image_tag = get_var("openshift", "common", "version", default=None)

(Doesn't even need this method)

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

This could be written more simply as

openshift_image_tag = get_var("openshift", "common", "version", default=None)

(Doesn't even need this method)

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
"openshift3/ose-pod",
]
def pull_images(self, images, tag, tmp, task_vars):

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

General tip: code is easier to test, understand and compose if you keep in mind the Single Responsibility Principle. Try to make functions / methods that do a single thing. Here, pull_images do at least two things: pulling images and appending tags to images.
My suggestion is to break this into two "functions": one that appends the tag. One that pulls whatever image it is given.

Appending a tag to a sequence of images:

images_with_tag = [image + ":" + tag for image in images_without_tag]

Pulling:

result = [
    self.module_executor("docker_image", {"name": image}, tmp, task_vars)
    for image in images_with_tag
]
@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

General tip: code is easier to test, understand and compose if you keep in mind the Single Responsibility Principle. Try to make functions / methods that do a single thing. Here, pull_images do at least two things: pulling images and appending tags to images.
My suggestion is to break this into two "functions": one that appends the tag. One that pulls whatever image it is given.

Appending a tag to a sequence of images:

images_with_tag = [image + ":" + tag for image in images_without_tag]

Pulling:

result = [
    self.module_executor("docker_image", {"name": image}, tmp, task_vars)
    for image in images_with_tag
]
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
}
res = self.module_executor("docker_image", args, tmp, task_vars)
if "failed" in res:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

Subtle bug: "failed" might be in the result, but it could be failed: False...

@rhcarvalho

rhcarvalho Feb 27, 2017

Contributor

Subtle bug: "failed" might be in the result, but it could be failed: False...

@juanvallejo

This comment has been minimized.

Show comment
Hide comment
@juanvallejo

juanvallejo Feb 27, 2017

Member

@rhcarvalho Thanks for the feedback, review comments addressed

Member

juanvallejo commented Feb 27, 2017

@rhcarvalho Thanks for the feedback, review comments addressed

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
# pull each required image that requires a qualified image
# tag (v0.0.0.0) rather than a standard release format tag (v0.0)
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)

This comment has been minimized.

@detiber

detiber Feb 28, 2017

Contributor

Would using skopeo make sense here to verify images without pulling them?

Output of skopeo inspect docker://docker.io/openshift/origin:

{
    "Name": "docker.io/openshift/origin",
    "Tag": "latest",
    "Digest": "sha256:7fe04a3bc9dc0508f846980888c948914663146c7cabc0dc0beb06169d75f21c",
    "RepoTags": [
        "b596f940a813a660be8982c37a2fdac29641b5f5",
        "beta1",
        "latest",
        "testing",
        "v0.2.1",
        "v0.2.2",
        "v0.3.1",
        "v0.3.2",
        "v0.3.3",
        "v0.3.4",
        "v0.3",
        "v0.4.1",
        "v0.4.2",
        "v0.4.3",
        "v0.4.4",
        "v0.4",
        "v0.5.1",
        "v0.5.2",
        "v0.5.3",
        "v0.5.4",
        "v0.5",
        "v0.6.1",
        "v0.6.2",
        "v0.6",
        "v1.0.0",
        "v1.0.1",
        "v1.0.2",
        "v1.0.3",
        "v1.0.4",
        "v1.0.5",
        "v1.0.6",
        "v1.0.7",
        "v1.0.8",
        "v1.1.1.1",
        "v1.1.1",
        "v1.1.2",
        "v1.1.3",
        "v1.1.4",
        "v1.1.5",
        "v1.1.6",
        "v1.1",
        "v1.2.0-rc1",
        "v1.2.0-rc2",
        "v1.2.0",
        "v1.2.1",
        "v1.2.2",
        "v1.3.0-alpha.0",
        "v1.3.0-alpha.1",
        "v1.3.0-alpha.2",
        "v1.3.0-alpha.3",
        "v1.3.0-rc1",
        "v1.3.0",
        "v1.3.1",
        "v1.3.2",
        "v1.3.3",
        "v1.4.0-alpha.0",
        "v1.4.0-alpha.1",
        "v1.4.0-rc1",
        "v1.4.0",
        "v1.4.1",
        "v1.5.0-alpha.0",
        "v1.5.0-alpha.1",
        "v1.5.0-alpha.2",
        "v1.5.0-alpha.3"
    ],
    "Created": "2017-02-28T01:26:25.908410645Z",
    "DockerVersion": "1.12.6",
    "Labels": {
        "build-date": "20161214",
        "io.k8s.description": "OpenShift Origin is a platform for developing, building, and deploying containerized applications. See https://docs.openshift.org/latest for more on running OpenShift Origin.",
        "io.k8s.display-name": "OpenShift Origin Application Platform",
        "license": "GPLv2",
        "name": "CentOS Base Image",
        "vendor": "CentOS"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:e24e870d9c831b7ee97c3f763d4118acea93781746b797665ceb4fb0fc327e25",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:6a8f42f1ab1e81a4365cce09ea66bc65e30f436b2754ab879242c1fe9ace11b8",
        "sha256:190f15d3619652dbfdbf3a8d666f3b04bfeb27138050aa5fa993601af462c802",
        "sha256:8a54e0c0e9d1ebd81c6915f8fdac5a14a6b4c0a0786a4721b879c032d3e4efaa",
        "sha256:ebfb90719f35b6847ab381c48b887942c09621a5b4392e82d7f18601bf71922b",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
    ]
}
@detiber

detiber Feb 28, 2017

Contributor

Would using skopeo make sense here to verify images without pulling them?

Output of skopeo inspect docker://docker.io/openshift/origin:

{
    "Name": "docker.io/openshift/origin",
    "Tag": "latest",
    "Digest": "sha256:7fe04a3bc9dc0508f846980888c948914663146c7cabc0dc0beb06169d75f21c",
    "RepoTags": [
        "b596f940a813a660be8982c37a2fdac29641b5f5",
        "beta1",
        "latest",
        "testing",
        "v0.2.1",
        "v0.2.2",
        "v0.3.1",
        "v0.3.2",
        "v0.3.3",
        "v0.3.4",
        "v0.3",
        "v0.4.1",
        "v0.4.2",
        "v0.4.3",
        "v0.4.4",
        "v0.4",
        "v0.5.1",
        "v0.5.2",
        "v0.5.3",
        "v0.5.4",
        "v0.5",
        "v0.6.1",
        "v0.6.2",
        "v0.6",
        "v1.0.0",
        "v1.0.1",
        "v1.0.2",
        "v1.0.3",
        "v1.0.4",
        "v1.0.5",
        "v1.0.6",
        "v1.0.7",
        "v1.0.8",
        "v1.1.1.1",
        "v1.1.1",
        "v1.1.2",
        "v1.1.3",
        "v1.1.4",
        "v1.1.5",
        "v1.1.6",
        "v1.1",
        "v1.2.0-rc1",
        "v1.2.0-rc2",
        "v1.2.0",
        "v1.2.1",
        "v1.2.2",
        "v1.3.0-alpha.0",
        "v1.3.0-alpha.1",
        "v1.3.0-alpha.2",
        "v1.3.0-alpha.3",
        "v1.3.0-rc1",
        "v1.3.0",
        "v1.3.1",
        "v1.3.2",
        "v1.3.3",
        "v1.4.0-alpha.0",
        "v1.4.0-alpha.1",
        "v1.4.0-rc1",
        "v1.4.0",
        "v1.4.1",
        "v1.5.0-alpha.0",
        "v1.5.0-alpha.1",
        "v1.5.0-alpha.2",
        "v1.5.0-alpha.3"
    ],
    "Created": "2017-02-28T01:26:25.908410645Z",
    "DockerVersion": "1.12.6",
    "Labels": {
        "build-date": "20161214",
        "io.k8s.description": "OpenShift Origin is a platform for developing, building, and deploying containerized applications. See https://docs.openshift.org/latest for more on running OpenShift Origin.",
        "io.k8s.display-name": "OpenShift Origin Application Platform",
        "license": "GPLv2",
        "name": "CentOS Base Image",
        "vendor": "CentOS"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:e24e870d9c831b7ee97c3f763d4118acea93781746b797665ceb4fb0fc327e25",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:6a8f42f1ab1e81a4365cce09ea66bc65e30f436b2754ab879242c1fe9ace11b8",
        "sha256:190f15d3619652dbfdbf3a8d666f3b04bfeb27138050aa5fa993601af462c802",
        "sha256:8a54e0c0e9d1ebd81c6915f8fdac5a14a6b4c0a0786a4721b879c032d3e4efaa",
        "sha256:ebfb90719f35b6847ab381c48b887942c09621a5b4392e82d7f18601bf71922b",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
        "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
    ]
}

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

We did talk about using skopeo, but I think we started with the docker* modules for they're readily available... Do we install skopeo as part of the install?

For preflight checks, we cannot assume skopeo is installed. Installing would be one more side effect for the check... something to discuss ;-)

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

We did talk about using skopeo, but I think we started with the docker* modules for they're readily available... Do we install skopeo as part of the install?

For preflight checks, we cannot assume skopeo is installed. Installing would be one more side effect for the check... something to discuss ;-)

This comment has been minimized.

@detiber

detiber Mar 1, 2017

Contributor

We don't currently install skopeo as part of the install, but as we move towards system containers, it will become a hard dependency at some point. Even without skopeo, it may make sense to default to querying the registry API to search tags for images, rather than pulling images. It would also allow us to be a bit fuzzier on finding a matching container image.

@detiber

detiber Mar 1, 2017

Contributor

We don't currently install skopeo as part of the install, but as we move towards system containers, it will become a hard dependency at some point. Even without skopeo, it may make sense to default to querying the registry API to search tags for images, rather than pulling images. It would also allow us to be a bit fuzzier on finding a matching container image.

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

One thing we asked ourselves yesterday is whether skopeo would to the same registry resolution as a docker pull would (trying to default to different registries based on config, e.g. docker.io, registry.access.redhat.com, ...). @detiber do you happen to know?

And while we could use skopeo once it is installed, for preflight checks (before installation), we'd need to rely on pre-installed software or install it as part of the check (one more side effect).

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

One thing we asked ourselves yesterday is whether skopeo would to the same registry resolution as a docker pull would (trying to default to different registries based on config, e.g. docker.io, registry.access.redhat.com, ...). @detiber do you happen to know?

And while we could use skopeo once it is installed, for preflight checks (before installation), we'd need to rely on pre-installed software or install it as part of the check (one more side effect).

This comment has been minimized.

@brenton

brenton Mar 1, 2017

Contributor

@detiber, can you explain what you mean by "allow us to be a bit fuzzier on finding a matching container image"?

@juanvallejo, thanks for digging in to skopeo. We need to find the right balance between idempotence and usefullness. Here's a question, is there already a skopeo container? I really don't want to require our checks to need any sort of subscription-manager work to register a system to pull RPMs. Until skopeo is somehow built in it seems OK in my mind to pull a container in order to run it.

@rhcarvalho's comments are still valid. We'd have to be completely convinced we aren't going to have skopeo report that the correct images are available only to have a docker pull fail because a misconfigured registry is shadowing registry.access.redhat.com.

@juanvallejo, To test this further I'd be curious to know what happens if the same repository exists in two registries yet a specific tag only exists in a "shadowed" registry configured with docker. To test you could set up a registry using the steps at https://docs.docker.com/registry/deploying/ that exposed an image called /openshift3/ose-pod:latest. Then configure that registry to be used by a docker pull with ADD_REGISTRY in /etc/sysconfig/docker. Put it before the entry for registry.access.redhat.com. Then make sure docker pull openshift3/ose-pod:v3.4 still works. It should first check the misconfigured registry and see that the image is missing and precede to check registry.access.redhat.com.

@brenton

brenton Mar 1, 2017

Contributor

@detiber, can you explain what you mean by "allow us to be a bit fuzzier on finding a matching container image"?

@juanvallejo, thanks for digging in to skopeo. We need to find the right balance between idempotence and usefullness. Here's a question, is there already a skopeo container? I really don't want to require our checks to need any sort of subscription-manager work to register a system to pull RPMs. Until skopeo is somehow built in it seems OK in my mind to pull a container in order to run it.

@rhcarvalho's comments are still valid. We'd have to be completely convinced we aren't going to have skopeo report that the correct images are available only to have a docker pull fail because a misconfigured registry is shadowing registry.access.redhat.com.

@juanvallejo, To test this further I'd be curious to know what happens if the same repository exists in two registries yet a specific tag only exists in a "shadowed" registry configured with docker. To test you could set up a registry using the steps at https://docs.docker.com/registry/deploying/ that exposed an image called /openshift3/ose-pod:latest. Then configure that registry to be used by a docker pull with ADD_REGISTRY in /etc/sysconfig/docker. Put it before the entry for registry.access.redhat.com. Then make sure docker pull openshift3/ose-pod:v3.4 still works. It should first check the misconfigured registry and see that the image is missing and precede to check registry.access.redhat.com.

This comment has been minimized.

@detiber

detiber Mar 2, 2017

Contributor

@brenton I'm not sure a system container is sufficient in this case either, not all systems will have docker available.

I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check.

@detiber

detiber Mar 2, 2017

Contributor

@brenton I'm not sure a system container is sufficient in this case either, not all systems will have docker available.

I think skopeo, or hitting directly against the search api, gives us the ability to actually query the list of tags for a given image, so it would give us the ability to query for the "latest" version for containerized installs without having to resort to repoquery hacks, pulling and querying a 'latest' image, or hardcoding default values. It also avoids the need for pulling images as part of a pre-requisite check.

Show outdated Hide outdated roles/openshift_health_checker/callback_plugins/zz_failure_summary.py
@@ -79,6 +79,7 @@ def _format_failure(failure):
(u'Host', host),
(u'Play', play),
(u'Task', task),
(u'Changed', stringc(result._result.get('changed', u'???'), C.COLOR_CHANGED)),

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

This output was focused on listing a summary of failures. Not sure about changed True/False being valuable there.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

This output was focused on listing a summary of failures. Not sure about changed True/False being valuable there.

Show outdated Hide outdated roles/openshift_health_checker/action_plugins/openshift_health_check.py
@@ -73,6 +73,7 @@ def run(self, tmp=None, task_vars=None):
if r.get("failed", False):
result["failed"] = True
result["msg"] = "One or more checks failed"
result["changed"] = r.get("changed", False)

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Strange logic, why would we add the "changed" key only when a check failed?

We can always add "changed", and compute it from the check results:

result["changed"] = any(r.get("changed", False) for r in check_results.values())
@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Strange logic, why would we add the "changed" key only when a check failed?

We can always add "changed", and compute it from the check results:

result["changed"] = any(r.get("changed", False) for r in check_results.values())
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
from openshift_checks.mixins import NotContainerizedMixin
class DockerImageAvailability(NotContainerizedMixin, OpenShiftCheck):

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

I think this check could work for both containerized and non-containerized installs?

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

I think this check could work for both containerized and non-containerized installs?

This comment has been minimized.

@sosiouxme

sosiouxme Feb 28, 2017

Member

Agreed, and actually it would influence which images are required. For a containerized install you would want the openshift/origin or openshift3/ose image to be available in addition to pod/registry/router/etc.

@sosiouxme

sosiouxme Feb 28, 2017

Member

Agreed, and actually it would influence which images are required. For a containerized install you would want the openshift/origin or openshift3/ose image to be available in addition to pod/registry/router/etc.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
failed_pulls.update(pulled_images["failed_images"])
if not openshift_image_tag:
if len(failed_pulls) < len(self.rpm_docker_images()) + len(self.rpm_qualified_docker_images()):

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Logic: if we only tried to pull len(self.rpm_docker_images()) images above, and len(self.rpm_qualified_docker_images()) > 0, then the condition in this if-clause is always true.

By construction we have len(failed_pulls) <= len(self.rpm_docker_images()).

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Logic: if we only tried to pull len(self.rpm_docker_images()) images above, and len(self.rpm_qualified_docker_images()) > 0, then the condition in this if-clause is always true.

By construction we have len(failed_pulls) <= len(self.rpm_docker_images()).

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Aha, this code seems to be duplicated below...

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Aha, this code seems to be duplicated below...

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
# pull each required image that requires a qualified image
# tag (v0.0.0.0) rather than a standard release format tag (v0.0)
pulled_qualified_images = self.pull_images(self.add_image_tags(self.rpm_qualified_docker_images(), "v" + openshift_image_tag), tmp, task_vars)
if "failed" in pulled_qualified_images and "failed_images" in pulled_qualified_images:

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Redundant condition, when "failed" in pulled_qualified_images, by construction, "failed_images" is also in pulled_qualified_images.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

Redundant condition, when "failed" in pulled_qualified_images, by construction, "failed_images" is also in pulled_qualified_images.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
@staticmethod
def rpm_qualified_docker_images():
return [
"openshift3/ose-haproxy-router",

This comment has been minimized.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

I think here we should be able to support both ose and origin depending on the deployment_type.

@rhcarvalho

rhcarvalho Feb 28, 2017

Contributor

I think here we should be able to support both ose and origin depending on the deployment_type.

This comment has been minimized.

@sosiouxme

sosiouxme Feb 28, 2017

Member

Yes. I'm also kind of thinking about which images we want to test for... pulling all of them is probably excessive, but it seems like we ought to check for more than just one. Maybe the pod and deployer images at least?

@sosiouxme

sosiouxme Feb 28, 2017

Member

Yes. I'm also kind of thinking about which images we want to test for... pulling all of them is probably excessive, but it seems like we ought to check for more than just one. Maybe the pod and deployer images at least?

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
changed = False
failed_pulls = set()
pulled_images = self.pull_images(self.add_image_tags(self.rpm_docker_images(), openshift_release), tmp, task_vars)

This comment has been minimized.

@sosiouxme

sosiouxme Feb 28, 2017

Member

Can we first check to see if the required images already exist in the local docker store? That way this could support disconnected installs (https://docs.openshift.com/container-platform/3.4/install_config/install/disconnected_install.html).

@sosiouxme

sosiouxme Feb 28, 2017

Member

Can we first check to see if the required images already exist in the local docker store? That way this could support disconnected installs (https://docs.openshift.com/container-platform/3.4/install_config/install/disconnected_install.html).

Show outdated Hide outdated roles/openshift_health_checker/action_plugins/openshift_health_check.py
@@ -74,6 +74,8 @@ def run(self, tmp=None, task_vars=None):
result["failed"] = True
result["msg"] = "One or more checks failed"
result["changed"] = any(r.get("changed", False) for r in check_results.values())

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

Wrong indentation level, this should be outside of the for loop.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

Wrong indentation level, this should be outside of the for loop.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
tags = ["preflight"]
# attempt to pull required docker images and fail if
# any images are missing, or error occurs during pull

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

This documentation comment would be best in the form of a docstring. I think it could complement the documentation of this class:

class DockerImageAvailability(OpenShiftCheck):
    """Check that required Docker images are available.

    This check attempts to pull required docker images and fail if
    any image is missing or an error occurs during pull.
    """

For one, that's more idiomatic than having comments on top of a method definition. And while we don't do it today, there is tooling for extracting docstrings and generating documentation -- so, for example, we could generate a page documenting all existing checks in the future.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

This documentation comment would be best in the form of a docstring. I think it could complement the documentation of this class:

class DockerImageAvailability(OpenShiftCheck):
    """Check that required Docker images are available.

    This check attempts to pull required docker images and fail if
    any image is missing or an error occurs during pull.
    """

For one, that's more idiomatic than having comments on top of a method definition. And while we don't do it today, there is tooling for extracting docstrings and generating documentation -- so, for example, we could generate a page documenting all existing checks in the future.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
if "failed_images" in pulled_images:
failed_pulls.update(pulled_images["failed_images"])
if not openshift_image_tag:

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

I don't understand why do we check on openshift_image_tag?!

And then "if not openshift_image_tag", we always fail?! I don't follow that logic, maybe I need lunch 🍝

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

I don't understand why do we check on openshift_image_tag?!

And then "if not openshift_image_tag", we always fail?! I don't follow that logic, maybe I need lunch 🍝

This comment has been minimized.

@juanvallejo

juanvallejo Mar 1, 2017

Member

Ah, I wrote this check under the assumption that openshift_image_tag could be optionally set in the config (so there would almost certainly be cases where this is not set). If it was the case that it was not set, then I would not be able to pull images with qualified tags, so I failed before getting to that part.

However it looks like I can count on openshift_image_tag to be set by the time this module runs, so I will remove that block.

@juanvallejo

juanvallejo Mar 1, 2017

Member

Ah, I wrote this check under the assumption that openshift_image_tag could be optionally set in the config (so there would almost certainly be cases where this is not set). If it was the case that it was not set, then I would not be able to pull images with qualified tags, so I failed before getting to that part.

However it looks like I can count on openshift_image_tag to be set by the time this module runs, so I will remove that block.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
def qualified_docker_images(is_origin_deployment):
if is_origin_deployment:
return [
"openshift/origin-haproxy-router"

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

In some places in origin and openshift-ansible we derive image names from a pattern like prefix-component:version.
Example:
https://github.com/rhcarvalho/origin/blob/bd3e362de2a9f849bd2d7d3ff2a372746f35fb2a/pkg/cmd/util/variable/imagetemplate.go#L30

In Origin, the DefaultImagePrefix is set to openshift/origin, and for OCP it is openshift3/ose. I think it would be natural to follow that pattern, and it would be easier to maintain these lists, since we know the image names follow a pattern.

@rhcarvalho

rhcarvalho Mar 1, 2017

Contributor

In some places in origin and openshift-ansible we derive image names from a pattern like prefix-component:version.
Example:
https://github.com/rhcarvalho/origin/blob/bd3e362de2a9f849bd2d7d3ff2a372746f35fb2a/pkg/cmd/util/variable/imagetemplate.go#L30

In Origin, the DefaultImagePrefix is set to openshift/origin, and for OCP it is openshift3/ose. I think it would be natural to follow that pattern, and it would be easier to maintain these lists, since we know the image names follow a pattern.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
failed.add(args["name"])
if failed:
return {"failed": True, "failed_images": failed}
@juanvallejo

This comment has been minimized.

Show comment
Hide comment
@juanvallejo

juanvallejo Mar 1, 2017

Member

@rhcarvalho thanks for the feedback
@sosiouxme will add a local image check using the docker_image_facts ansible module

Member

juanvallejo commented Mar 1, 2017

@rhcarvalho thanks for the feedback
@sosiouxme will add a local image check using the docker_image_facts ansible module

@sosiouxme

This comment has been minimized.

Show comment
Hide comment
@sosiouxme

sosiouxme Mar 3, 2017

Member
Member

sosiouxme commented Mar 3, 2017

@brenton

This comment has been minimized.

Show comment
Hide comment
@brenton

brenton Mar 6, 2017

Contributor

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

Contributor

brenton commented Mar 6, 2017

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

@detiber

This comment has been minimized.

Show comment
Hide comment
@detiber

detiber Mar 6, 2017

Contributor

We were talking today about how to get skopeo on the systems so we can use
it to check image availability. This is kind of tricky.

  1. RPM? Well, there's no guarantee it's there, and we don't really want to
    install one, especially not knowing the state of repos. Plus we can't
    install RPMs on Atomic hosts.

On Atomic hosts skopeo is already present, but I agree to the package being problematic.

  1. Docker container? That seems to make sense, except it relies on being
    able to pull an image with skopeo... in order to check that you can pull
    the images you're supposed to be able to... and that's a bit circular.
    Plus, disconnected hosts.

I don't think skopeo is a requirement for containerized tests, since the tests do not require restarting services. We could do pre-req tests using just docker. That said, docker isn't guaranteed to be on a minimal RHEL host either. If we need to set a minimum requirement to run, then requiring atomic instead of just docker as a dependency does not seem bad.

It almost seems like we'd have to bundle skopeo itself into an ansible
module.

This part is tricky, since it's a go binary. We could vendor skopeo in repo to be able to copy it remotely and execute on it, or even copy it as part of the action plugin. If we did that, we would have to have a process for managing the vendored package, though. That said, skopeo is hitting well-defined api endpoints to do the queries, so it might make more sense to just re-implement the logic as opposed to vendoring the binary.

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

I do like the idea of having a container with all the dependencies for openshift-ansible, It doesn't quite solve the chicken/egg scenario of how that container is run initially on a minimal host, though.

Contributor

detiber commented Mar 6, 2017

We were talking today about how to get skopeo on the systems so we can use
it to check image availability. This is kind of tricky.

  1. RPM? Well, there's no guarantee it's there, and we don't really want to
    install one, especially not knowing the state of repos. Plus we can't
    install RPMs on Atomic hosts.

On Atomic hosts skopeo is already present, but I agree to the package being problematic.

  1. Docker container? That seems to make sense, except it relies on being
    able to pull an image with skopeo... in order to check that you can pull
    the images you're supposed to be able to... and that's a bit circular.
    Plus, disconnected hosts.

I don't think skopeo is a requirement for containerized tests, since the tests do not require restarting services. We could do pre-req tests using just docker. That said, docker isn't guaranteed to be on a minimal RHEL host either. If we need to set a minimum requirement to run, then requiring atomic instead of just docker as a dependency does not seem bad.

It almost seems like we'd have to bundle skopeo itself into an ansible
module.

This part is tricky, since it's a go binary. We could vendor skopeo in repo to be able to copy it remotely and execute on it, or even copy it as part of the action plugin. If we did that, we would have to have a process for managing the vendored package, though. That said, skopeo is hitting well-defined api endpoints to do the queries, so it might make more sense to just re-implement the logic as opposed to vendoring the binary.

The option I like the most so far is having skopeo in the openshift-ansible image. That image will have to be available in the environment if customers are going to run these checks. If the checks are initiated from the control host it seems reasonable that the admin would know how to ensure that exact same image is available to all hosts in the environment for purposes of running skopeo.

I do like the idea of having a container with all the dependencies for openshift-ansible, It doesn't quite solve the chicken/egg scenario of how that container is run initially on a minimal host, though.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
while len(regs) and len(required_images):
current_reg = regs[0]
inspect = self.inspect_images(self.docker_path, self.skopeo_image, current_reg, required_images, tmp, task_vars)
required_images = inspect.get("failed_images")

This comment has been minimized.

@juanvallejo

juanvallejo Mar 6, 2017

Member

This block is exponential quadratic time at worst, however a successful local check helps the number of images it has to process / skips this completely. Also, if skopeo is not able to be pulled, or doesn't exist ahead of time locally, or the "skopeo image" does not contain the skopeo bin for whatever reason, the check fails early before getting to this block. Thoughts?

@juanvallejo

juanvallejo Mar 6, 2017

Member

This block is exponential quadratic time at worst, however a successful local check helps the number of images it has to process / skips this completely. Also, if skopeo is not able to be pulled, or doesn't exist ahead of time locally, or the "skopeo image" does not contain the skopeo bin for whatever reason, the check fails early before getting to this block. Thoughts?

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

exponential time at worst

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

Still, I think it is clear that we want to check images offline first before doing anything else.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

exponential time at worst

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

Still, I think it is clear that we want to check images offline first before doing anything else.

This comment has been minimized.

@juanvallejo

juanvallejo Mar 7, 2017

Member

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

:) thanks, also I am calling docker_image_facts in the inspect_local_images func before this check to do a pass on any images that have already been pulled

@juanvallejo

juanvallejo Mar 7, 2017

Member

Why exponential?! This is O(N * M) where N is the number of registries and M the number of images? Depends on how inspect_images work internally.

:) thanks, also I am calling docker_image_facts in the inspect_local_images func before this check to do a pass on any images that have already been pulled

@juanvallejo

This comment has been minimized.

Show comment
Hide comment
@juanvallejo

juanvallejo Mar 6, 2017

Member

Added a check that uses this skopeo image for now, with a local check that precedes it, at least while we decide how to handle skopeo as a dep

Member

juanvallejo commented Mar 6, 2017

Added a check that uses this skopeo image for now, with a local check that precedes it, at least while we decide how to handle skopeo as a dep

@rhcarvalho

Good direction, there are some points we can simplify the code.

Why aren't we using docker_image_facts directly?

I see the flow now is check-image-offline, then docker-run-skopeo. I'm still not sure on skopeo vs. docker pull... docker pull had the advantage of caching the images.
Depending on where this check will be run, we may favor either one or the other approach.

@brenton, thoughts? What is a more complete usage flow for this check?

Show outdated Hide outdated roles/openshift_health_checker/library/docker_info.py
def main():
argument_spec = dict(
name=dict(type='list'),
)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

nit: bad indentation

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

nit: bad indentation

Show outdated Hide outdated roles/openshift_health_checker/library/docker_info.py
Ansible module for determining information about the docker host.
"""
from ansible.module_utils.docker_common import *

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

It's bad form in Python to import *, and preferred to import explicitly the names we need.

Which brings us to the next question, why does DockerInfo needs to inherit from DockerBaseClass?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

It's bad form in Python to import *, and preferred to import explicitly the names we need.

Which brings us to the next question, why does DockerInfo needs to inherit from DockerBaseClass?

This comment has been minimized.

@juanvallejo

juanvallejo Mar 7, 2017

Member

It's bad form in Python to import *, and preferred to import explicitly the names we need.

Thanks, will fix this

Which brings us to the next question, why does DockerInfo needs to inherit from DockerBaseClass?

It is what I saw docker-related core modules do: https://github.com/ansible/ansible-modules-core/blob/devel/cloud/docker/docker_image.py#L259

@juanvallejo

juanvallejo Mar 7, 2017

Member

It's bad form in Python to import *, and preferred to import explicitly the names we need.

Thanks, will fix this

Which brings us to the next question, why does DockerInfo needs to inherit from DockerBaseClass?

It is what I saw docker-related core modules do: https://github.com/ansible/ansible-modules-core/blob/devel/cloud/docker/docker_image.py#L259

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

uggh... my comments apply to that file as well -- not a good source for learning Python ;)

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

uggh... my comments apply to that file as well -- not a good source for learning Python ;)

Show outdated Hide outdated roles/openshift_health_checker/library/docker_info.py
self.client = client
self.results = results
self.results['info'] = self.get_info()

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This is bad style. We should not do "work" in __init__, just initialize the instance.
Also bad style that we rely on mutation of the arguments instead of returning values.

Continues below

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This is bad style. We should not do "work" in __init__, just initialize the instance.
Also bad style that we rely on mutation of the arguments instead of returning values.

Continues below

Show outdated Hide outdated roles/openshift_health_checker/library/docker_info.py
info=[]
)
DockerInfo(client, results)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

continuing from above

For readers of this code, it looks like we "throw away" the instance of the class we've just created. This is pretty weird.

Again, it is surprising that this call is here just for the side effect of changing results... It kind of remembers C functions that you pass a pointer to a "return value" to compensate for lack of multiple return values.

To implement what we have so far, we don't really need a class at all.

http://docs.ansible.com/ansible/dev_guide/developing_modules_general.html#common-module-boilerplate

from ansible.module_utils.docker_common import AnsibleDockerClient


def main():
    client = AnsibleDockerClient(
        argument_spec=dict(
            name=dict(type='list'),
        ),
    )

    client.module.exit_json(
        changed=False,
        info=module.info(),  # note: the version of AnsibleDockerClient I have installed doesn't have this method...
    )


if __name__ == '__main__':
    main()

What makes me think, why do we even need this custom module instead of using the standard module?!

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

continuing from above

For readers of this code, it looks like we "throw away" the instance of the class we've just created. This is pretty weird.

Again, it is surprising that this call is here just for the side effect of changing results... It kind of remembers C functions that you pass a pointer to a "return value" to compensate for lack of multiple return values.

To implement what we have so far, we don't really need a class at all.

http://docs.ansible.com/ansible/dev_guide/developing_modules_general.html#common-module-boilerplate

from ansible.module_utils.docker_common import AnsibleDockerClient


def main():
    client = AnsibleDockerClient(
        argument_spec=dict(
            name=dict(type='list'),
        ),
    )

    client.module.exit_json(
        changed=False,
        info=module.info(),  # note: the version of AnsibleDockerClient I have installed doesn't have this method...
    )


if __name__ == '__main__':
    main()

What makes me think, why do we even need this custom module instead of using the standard module?!

This comment has been minimized.

@juanvallejo

juanvallejo Mar 7, 2017

Member

Thanks, will update with your suggestion.

# note: the version of AnsibleDockerClient I have installed doesn't have this method...

client.module.info() client.info() should work

What makes me think, why do we even need this custom module instead of using the standard module?!

If I try to call AnsibleDockerClient directly in the docker_image_availability check, I get the error:

{"msg": "Error: Module unable to decode valid JSON on stdin.  Unable to figure out what parameters were passed", "failed": true}

Using module_executor has been the only way I have seen this work

@juanvallejo

juanvallejo Mar 7, 2017

Member

Thanks, will update with your suggestion.

# note: the version of AnsibleDockerClient I have installed doesn't have this method...

client.module.info() client.info() should work

What makes me think, why do we even need this custom module instead of using the standard module?!

If I try to call AnsibleDockerClient directly in the docker_image_availability check, I get the error:

{"msg": "Error: Module unable to decode valid JSON on stdin.  Unable to figure out what parameters were passed", "failed": true}

Using module_executor has been the only way I have seen this work

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Isn't AnsibleDockerClient already exposed as some module in Ansible?
I thought this would be the place where we call docker_image_facts.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Isn't AnsibleDockerClient already exposed as some module in Ansible?
I thought this would be the place where we call docker_image_facts.

This comment has been minimized.

@juanvallejo

juanvallejo Mar 7, 2017

Member

Isn't AnsibleDockerClient already exposed as some module in Ansible?
I thought this would be the place where we call docker_image_facts.

No, docker_image_facts does not expose the ansible docker client as far as I can tell.

We do use docker_image_facts when we call self.inspect_local_images, however once we begin using skopeo after the local check, we need the AnsibleDockerClient in order to know which registries to target when inspecting images.

@juanvallejo

juanvallejo Mar 7, 2017

Member

Isn't AnsibleDockerClient already exposed as some module in Ansible?
I thought this would be the place where we call docker_image_facts.

No, docker_image_facts does not expose the ansible docker client as far as I can tell.

We do use docker_image_facts when we call self.inspect_local_images, however once we begin using skopeo after the local check, we need the AnsibleDockerClient in order to know which registries to target when inspecting images.

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Got it, makes sense... this is to get Docker config like configured registries. My bad, took me some time to fully understand it.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Got it, makes sense... this is to get Docker config like configured registries. My bad, took me some time to fully understand it.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
msg = "Failed to pull or use existing Skopeo image: %s" % skopeo_image.get("msg")
return {"failed": True, "msg": msg}
docker_info_module = self.module_executor("docker_info", {}, task_vars)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This name is misleading, the return value is not a "module", but a dict, the common return value of calling a module...

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This name is misleading, the return value is not a "module", but a dict, the common return value of calling a module...

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
@staticmethod
def rpm_docker_images(version):
return [
"openshift3/registry-console:{version}".format(version=version)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Hard-coded prefix?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Hard-coded prefix?

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
for image in tagged_images:
# the run command pulls the {skopeo_image} if it has not already been pulled.
# if pulling this image fails, the current image will be added to the failed set.
cmd_str = "{docker_path} run {skopeo_image} skopeo inspect docker://{registry}/{tagged_image}".format(

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Note: we can end up having version mismatches -- what if the image available/used is not the image (in the version) we expect?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Note: we can end up having version mismatches -- what if the image available/used is not the image (in the version) we expect?

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
# use a "docker run ..." command on the remote host to inspect each specific
# image inside of a Skopeo image container. If stdout contains zero lines,
# it is safe to assume that an image with the specified tag does not exist
# in the given registry (all output in case of image not found goes to stderr).

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Why aren't we looking at exit codes?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Why aren't we looking at exit codes?

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
cmd = self.exec_cmd(cmd_str, task_vars)
stderr = cmd.get("stderr")
if len(stderr):

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Throughout the code, you can replace instances of:

if len(s):

with

if s:
@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Throughout the code, you can replace instances of:

if len(s):

with

if s:
Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
def exec_cmd(self, cmd_str, task_vars):
cmd = self.module_executor("command", {"_raw_params": cmd_str}, task_vars)
return cmd

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

No need for the cmd variable?
Better no name than a bad name ;-) it is not a cmd / command, but the result of calling one.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

No need for the cmd variable?
Better no name than a bad name ;-) it is not a cmd / command, but the result of calling one.

@brenton

This comment has been minimized.

Show comment
Hide comment
@brenton

brenton Mar 7, 2017

Contributor

@rhcarvalho, to you question about using skopeo or docker pull. Right now I think don't think we need to focus on using this check to speed up installs. I suspect we may get to that soon but it would be a followup PR.

You raised a good point about the image we're planning to use for skopeo. We'll want to make sure that we always pull the latest one we require for out checks. We could do that a number of different ways:

  • Pull the image that matches the same version as the version of openshift-ansible we're running
  • Something really low tech like hardcoding a value or just latest (these have obvious drawbacks)

Honestly, I suspect some day we may be able to rely on skopeo being on the hosts so I don't want to overthink things right now. It sounds like it's built in to Atomic Host. Once we're confident our RPM content checks are working and guiding customers to properly configure hosts I'm ok with requiring the skopeo RPM to be installed on the remote hosts before we run these docker image checks.

Contributor

brenton commented Mar 7, 2017

@rhcarvalho, to you question about using skopeo or docker pull. Right now I think don't think we need to focus on using this check to speed up installs. I suspect we may get to that soon but it would be a followup PR.

You raised a good point about the image we're planning to use for skopeo. We'll want to make sure that we always pull the latest one we require for out checks. We could do that a number of different ways:

  • Pull the image that matches the same version as the version of openshift-ansible we're running
  • Something really low tech like hardcoding a value or just latest (these have obvious drawbacks)

Honestly, I suspect some day we may be able to rely on skopeo being on the hosts so I don't want to overthink things right now. It sounds like it's built in to Atomic Host. Once we're confident our RPM content checks are working and guiding customers to properly configure hosts I'm ok with requiring the skopeo RPM to be installed on the remote hosts before we run these docker image checks.

@juanvallejo

This comment has been minimized.

Show comment
Hide comment
@juanvallejo

juanvallejo Mar 7, 2017

Member

@rhcarvalho thanks for the feedback, review comments addressed

Member

juanvallejo commented Mar 7, 2017

@rhcarvalho thanks for the feedback, review comments addressed

Show outdated Hide outdated roles/openshift_health_checker/library/docker_info.py
# pylint: disable=missing-docstring
"""
Ansible module for determining information about the docker host.
"""

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Now I get why wrap this into a module... we need it to run on the inventory hosts, not the control host.
:-)

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Now I get why wrap this into a module... we need it to run on the inventory hosts, not the control host.
:-)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 16, 2017

Contributor

Please document here the intent, why did we have to write this module ourselves, what were the alternatives that do not work, clarify "information" (what kind of thing can we expect in the output?) and "docker host" (does that mean "docker daemon"?).
I think a bit more reasoning here can help us later understanding why we added this file instead of using some existing module.

@rhcarvalho

rhcarvalho Mar 16, 2017

Contributor

Please document here the intent, why did we have to write this module ourselves, what were the alternatives that do not work, clarify "information" (what kind of thing can we expect in the output?) and "docker host" (does that mean "docker daemon"?).
I think a bit more reasoning here can help us later understanding why we added this file instead of using some existing module.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
return self.exec_cmd(cmd_str, task_vars)["stderr"]
def get_docker_info(self, task_vars):
return self.module_executor("docker_info", {}, task_vars)["info"]

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This could raise a KeyError.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

This could raise a KeyError.

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
name = "docker_image_availability"
tags = ["preflight"]
skopeo_image = "juanvallejo/skopeo:latest"

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Adding :latest is a no-op in this case, since that is the default tag. It doesn't make docker run re-pull the image -- but I confess there's a long time I don't read those docs, I might be wrong.
I think there might be a flag to force pull? Can't remember.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Adding :latest is a no-op in this case, since that is the default tag. It doesn't make docker run re-pull the image -- but I confess there's a long time I don't read those docs, I might be wrong.
I think there might be a flag to force pull? Can't remember.

This comment has been minimized.

@juanvallejo

juanvallejo Mar 8, 2017

Member

I did not see a flag for force pulling, but I updated the update_skopeo_image method to execute docker pull on the skopeo image on each run of the check

@juanvallejo

juanvallejo Mar 8, 2017

Member

I did not see a flag for force pulling, but I updated the update_skopeo_image method to execute docker pull on the skopeo image on each run of the check

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
required_images.update(self.qualified_docker_images(reg_prefix, "v" + openshift_image_tag))
# use docker_image_facts to inspect local images before attempting to use Skopeo
required_images = self.inspect_local_images(required_images, task_vars)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

I'm missing the case in which all images are found locally and then we can return here, before involving skopeo?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

I'm missing the case in which all images are found locally and then we can return here, before involving skopeo?

Show outdated Hide outdated ...s/openshift_health_checker/openshift_checks/docker_image_availability.py
required_images = self.inspect_local_images(required_images, task_vars)
# ensure skopeo image exists or can be pulled and used successfully
failed = self.check_skopeo_image(task_vars)

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

As a reader, I'd expect this to be a bool, but it is surprisingly a string / the stderr from running a command.

Maybe calling it stderr would be a better name?

skopeo_stderr = ...

if skopeo_stderr:
    return {... "msg": skopeo_stderr}
@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

As a reader, I'd expect this to be a bool, but it is surprisingly a string / the stderr from running a command.

Maybe calling it stderr would be a better name?

skopeo_stderr = ...

if skopeo_stderr:
    return {... "msg": skopeo_stderr}

This comment has been minimized.

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Why do we try to run skopeo --version first, instead of calling the subcommand we actually intend?
How is this initial call to check_skopeo_image useful?

@rhcarvalho

rhcarvalho Mar 7, 2017

Contributor

Why do we try to run skopeo --version first, instead of calling the subcommand we actually intend?
How is this initial call to check_skopeo_image useful?

This comment has been minimized.

@juanvallejo

juanvallejo Mar 8, 2017

Member

Why do we try to run skopeo --version first, instead of calling the subcommand we actually intend?
How is this initial call to check_skopeo_image useful?

Its aim was to prove that the skopeo image was able to be pulled down from one of the registries (if it already was not present locally), and that the skopeo binary was able to be used in a container created from that image (hence the skopeo --version portion).

However, I will update updated this method to do a docker pull on the skopeo image instead (to ensure we have the latest version every time the check is run). stderr will still be returned by that method if the image cannot be pulled for whatever reason, preventing the check from proceeding any further in that case.

@juanvallejo

juanvallejo Mar 8, 2017

Member

Why do we try to run skopeo --version first, instead of calling the subcommand we actually intend?
How is this initial call to check_skopeo_image useful?

Its aim was to prove that the skopeo image was able to be pulled down from one of the registries (if it already was not present locally), and that the skopeo binary was able to be used in a container created from that image (hence the skopeo --version portion).

However, I will update updated this method to do a docker pull on the skopeo image instead (to ensure we have the latest version every time the check is run). stderr will still be returned by that method if the image cannot be pulled for whatever reason, preventing the check from proceeding any further in that case.