Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject the trusted-ca-bundle into openshift-apiserver pods #226

Merged
merged 3 commits into from Aug 28, 2019

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented Aug 20, 2019

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 20, 2019
@stlaz stlaz force-pushed the add_trusted_ca_bundle branch 2 times, most recently from ca5b3c7 to 642f3c6 Compare August 21, 2019 06:58
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 21, 2019
@stlaz stlaz changed the title Inject the trusted-ca-bundle into openshift-apiserver pods WIP: Inject the trusted-ca-bundle into openshift-apiserver pods Aug 21, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 21, 2019
@stlaz stlaz force-pushed the add_trusted_ca_bundle branch 5 times, most recently from 75361fa to 18216dc Compare August 21, 2019 13:49
@stlaz stlaz changed the title WIP: Inject the trusted-ca-bundle into openshift-apiserver pods Inject the trusted-ca-bundle into openshift-apiserver pods Aug 22, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 22, 2019
@@ -101,6 +134,12 @@ spec:
- name: serving-cert
secret:
secretName: serving-cert
- name: trusted-ca-bundle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, given a message from proxy people yesterday, non-optional CMs like this may prolong upgrades since the network operator may start injecting the CMs too late

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right answer was no :-) We want to require this.

Copy link
Member Author

@stlaz stlaz Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently yes was still the good answer :)) There's people who want us to have it optional for the reasons above, this is now handled by bash guard in the pod's commands so that you don't destroy your system trust store if the configmap is missing (shouldn't be) or empty

Copy link

@danehans danehans Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional is needed in the case where your operator starts before the cluster network operator that responds to the cm injection request.

@stlaz
Copy link
Member Author

stlaz commented Aug 22, 2019

/test e2e-aws
/hold
I wonder whether the errors we're seeing could be caused by mounting an empty configmap to trusted store dir

@stlaz
Copy link
Member Author

stlaz commented Aug 22, 2019

/test e2e-aws

@stlaz
Copy link
Member Author

stlaz commented Aug 22, 2019

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 22, 2019
@mfojtik
Copy link
Member

mfojtik commented Aug 22, 2019

@deads2k @sttts just so we are clear... this add mechanism that will force all three openshift apiservers to restart at the same time without rolling new revision out (daemonset). I'm fine with it, because this just substitute the dynamic cert reloading. From what I tested manually, the restart is pretty fast, but aggregator might notice it.

@stlaz
Copy link
Member Author

stlaz commented Aug 22, 2019

The tests passing now only proves that the CM injection is kind of flawed, will add a two step process to make sure the CM always has data not to wedge the cluster by deleting its system trust store when mounting a CM with no data...

@stlaz stlaz force-pushed the add_trusted_ca_bundle branch 2 times, most recently from 95e0b08 to d21939c Compare August 22, 2019 13:23
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@stlaz
Copy link
Member Author

stlaz commented Aug 27, 2019

/hold
Latest logs show that openshift-apiserver most probably broke, you can also see that the watchdog can’t find the PID

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2019
@stlaz
Copy link
Member Author

stlaz commented Aug 27, 2019

From openshift-apiserver logs: “Waiting for watchdog........”

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2019
args:
- "--config=/var/run/configmaps/config/config.yaml"
- |
echo $$ > /var/run/watchdog/pid
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably noticed, these dollar signs seem to be escaped, so they won’t be interpreted.

I wonder if we could either use realpath of /proc/self or get the value of either /proc/self/stat{,us}. Starts to be a bit hackish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they should be escaped and passed to bash, which then interprets them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml doesn't escape I hope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I agree that W0827 11:29:32.124287 29 cmd.go:211] Unable to parse pid file /var/run/watchdog/pid: strconv.Atoi: parsing "$": invalid syntax is convincing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m just assuming given the watchdog reports “/var/run/watchdog/pid: strconv.Atoi: parsing "$": invalid syntax”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$$$$ does the trick. Is that yaml escaping of dollar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell, I did not think dollar sign has a special meaning for yaml files :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try kubectl run --restart=Never -i -t busybox --image busybox -- /bin/sh -ec 'echo $$$$'

@sttts
Copy link
Contributor

sttts commented Aug 28, 2019

Upgrade flakes seem unrelated.

/retest
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2019
@mfojtik
Copy link
Member

mfojtik commented Aug 28, 2019

/hold

@deads2k had opinions about using the watchdog, I think we should just rollout new deployment when proxy change.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2019
Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, soltysh, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mfojtik
Copy link
Member

mfojtik commented Aug 28, 2019

/hold cancel

Alright, we will go with this to unblock 4.2, but there should be BZ created with follow up to revert this approach and just use the existing operator mechanism to rollout new version.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2019
@openshift-merge-robot openshift-merge-robot merged commit 897ccf8 into openshift:master Aug 28, 2019
@sttts
Copy link
Contributor

sttts commented Aug 28, 2019

Follow-up bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1746375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants