Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1755140: do not force skip tls verify to true on image source injection #110

Merged
merged 1 commit into from
Nov 8, 2019

Conversation

gabemontero
Copy link
Contributor

/assign @adambkaplan

@openshift/openshift-team-developer-experience FYI

The registry.conf built by the OCM and passed to openshift/builder should have the correct settings based on what has been supplied to the build config for pulling in source from an image, or from any global image config that has been created.

@openshift-ci-robot
Copy link
Contributor

@gabemontero: This pull request references Bugzilla bug 1755140, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1755140: do not force skip tls verify to true on image source injection

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 4, 2019
@gabemontero
Copy link
Contributor Author

a networking metrics flake in e2e-aws
/test e2e-aws

@gabemontero
Copy link
Contributor Author

@adambkaplan per our slack exchange yesterday, simply removing the setting of skip tls verify was insufficient.

Our image source tests are having tls difficulties:

2019-11-04T19:57:13.40473225Z error: error creating buildah builder: Error initializing source docker://image-registry.openshift-image-registry.svc:5000/e2e-test-build-image-source-b27j8/inputimage@sha256:8c80a437edd66ceabc6bcb69ffc85f71c93d997888624abeec8860796df6fa0e: error pinging docker registry image-registry.openshift-image-registry.svc:5000: Get https://image-registry.openshift-image-registry.svc:5000/v2/: x509: certificate signed by unknown authority

Looks like we need to explicitly seed the the containers client in openshift/builder with the certs provided by OCM.

@gabemontero
Copy link
Contributor Author

/test e2e-aws

@adambkaplan
Copy link
Contributor

@nalind shouldn't buildah use the certs in /etc/docker/certs.d, as we do elsewhere?

@gabemontero
Copy link
Contributor Author

Technically speaking @adambkaplan the config map certs are stored in /etc/docker/certs.d via https://github.com/openshift/builder/blob/master/cmd/main.go#L59-L60

However, certs from secrets are copied to /etc/pki/tls/certs/cluster.crt via https://github.com/openshift/builder/blob/master/cmd/main.go#L52-L53

In the test failure, we were trying to pull from the image registry; its certs are in the pull secret cert, no?

That said, I would have expected /etc/pki/tls/certs/cluster.crt to be picked up as well.

Or perhaps we need to explicitly set SystemContext.DockerDaemonCertPath to /etc/pki/tls/certs ??

@nalind ??

@gabemontero
Copy link
Contributor Author

Found the answer to my question: https://github.com/openshift/builder/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L50

/etc/pki/tls/certs is not in the list

@adambkaplan
Copy link
Contributor

@gabemontero the errors are x509: certificate signed by unknown authority. This implies that when we're extracting content for the pull we aren't trusting the internal registry. This is weird because normal image pull for a build works just fine.

@gabemontero
Copy link
Contributor Author

agree on the weirdness @adambkaplan

I suspect the extract source from image setup is different enough from builder image pull to cause the weirdness. Certainly the original //TODO comment about getting it from the host implies that (at least it does to me).

Oh, and I forgot that /etc/pki/tls/certs is an linux level thing ... most likely does not pertain do docker daemon cert path.

@gabemontero
Copy link
Contributor Author

One thing that is interesting is that the dockerClient passed into extractSourceFromImage is not current used @adambkaplan @nalind

Going to see about pulling the SystemContext from it and augmenting it with the auth.

@gabemontero
Copy link
Contributor Author

ok let's see what effect this additional commit has

@gabemontero
Copy link
Contributor Author

So passing in the context from our containers/image based client didn't help.

So doing some more digging @adambkaplan @nalind so far I see:

  • when we pull the builder image, we use the DockerClient.PullImage and that finds the cert
  • turns out, with this buildah.NewBuilder form of interaction, the only actual use of it is in source.go as part of extracting image content. There are a couple of other calls to it in the repo, but they are in inert paths which are not called
    -- daemonlessRun
    -- CreateContainer
  • this is my first foray into the buildah code, but after a few minutes of diving in, it is certainly different enough from the containers/image docker client we construct for pulling the builder that I am less surprised it is not behaving the same way.

Still digging, but of course any pointers/clues are appreciated.

@gabemontero
Copy link
Contributor Author

gabemontero commented Nov 6, 2019

I believe I'm zeroing in on it. c/images' detectPropertiesHelper is creating a vanilla tlsclientconfig vs. leveraging the one created when buildah's dockerClient is constructed.

There are a few callers into it so I'm having to sort the various permutations.

This method is called by the extract img source/buildah mount path, but not I believe in the generic image pull path.

@gabemontero
Copy link
Contributor Author

Prior theory debunked .... still trying to uncover what is different in the two buildah/images paths.

@gabemontero
Copy link
Contributor Author

Actually I misspoke earlier ... the image registry cert is in the config map.

Mounted in locations like /var/run/configs/openshift.io/certs/..2019_11_06_21_45_49.155501101/certs.d/image-registry.openshift-image-registry.svc:5000/ca.crt

@gabemontero
Copy link
Contributor Author

OK finally got somewhere ... openshift/builder is not successful in its population of /etc/docker/certs.d, at least for this scenario.

Debug:

time="2019-11-06T22:37:05Z" level=info msg="GGM setup certs /etc/docker/certs.d/image-registry.openshift-image-registry.svc:5000 does not exist"
time="2019-11-06T22:37:05Z" level=info msg="GGM dumping docker certs.d"
time="2019-11-06T22:37:05Z" level=info msg="GGM fetchManifest"

Now, why do pulls of the builder image work? .... Still a little fuzzy on details, but so far:

  • the buildah copy source from image path is doing a ping over https first to distinguish between v1 and v2 docker and that is where the lack of certs gets us
  • the call to buildah pull is _, err = buildah.Pull(context.TODO(), "docker://"+imageName, options) ... perhaps the "docker://" protocol looking thing there actually bypasses https

@adambkaplan let's actively pursue @nalind or somebody else on container team on Thursday to confirm/deny that second bullet if he doesn't catch up with these PR comments by then.

In the interim, going to see about fixing openshift/builder's copy of the config map data to /etc/docker/certs.d

@gabemontero
Copy link
Contributor Author

@adambkaplan realized what is up in scrum today ... we are doing the image/src stuff in a init container vs. regular one, and the img registry cert cfg map is not getting mounted in the init containers for the build pod.

I'm starting on the needed OCM PR.

@gabemontero
Copy link
Contributor Author

@gabemontero
Copy link
Contributor Author

locally testing with openshift/openshift-controller-manager#46 also shows that commit 641f36b is not needed, so will remove

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

CI / acquire lease flake on e2e-aws-builds

/retest

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

tests are green now @adambkaplan ... bump/ptal

@adambkaplan
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2019
@openshift-merge-robot openshift-merge-robot merged commit 507dc53 into openshift:master Nov 8, 2019
@openshift-ci-robot
Copy link
Contributor

@gabemontero: All pull requests linked via external trackers have merged. Bugzilla bug 1755140 has been moved to the MODIFIED state.

In response to this:

Bug 1755140: do not force skip tls verify to true on image source injection

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gabemontero gabemontero deleted the tls-verify branch November 8, 2019 20:08
@gabemontero
Copy link
Contributor Author

/cherrypick release-4.2

@openshift-cherrypick-robot

@gabemontero: new pull request created: #111

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gabemontero
Copy link
Contributor Author

/cherrypick release-4.1

@openshift-cherrypick-robot

@gabemontero: new pull request created: #112

In response to this:

/cherrypick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants