-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1883903:Add retries to SDN's RBAC proxy #786
Bug 1883903:Add retries to SDN's RBAC proxy #786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, although, could you please put up a warning message should the retry fails?
9b6c5df
to
0cc2102
Compare
@danielmellado Done |
0cc2102
to
bc88d74
Compare
until [ "$retries" -ge 20 ] | ||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until [ "$retries" -ge 20 ] | |
do | |
while [[ "${retries}" -lt 20 ]]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the stylistic suggestions here still apply, here and elsewhere. (single-line rather than multi-line and while
rather than until
because those are both more standard. [[
rather than [
and ${retries}
rather than $retries
because they have fewer gotchas and it matches the kube/OCP shell script style guide that was never finalized...
"https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/namespaces/openshift-sdn/services/sdn" | | ||
python -c 'import json,sys; print(json.load(sys.stdin)["metadata"]["creationTimestamp"])' && | ||
break || | ||
echo "WARN: Failed to get sdn service from API" 1>&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is expected it should shouldn't be a "warning". And you should say you're retrying.
Also, can you use if
/else
rather than &&
/||
? (I realize the variable assignment might make that tricky, so maybe no...)
(EDIT: fixed "shouldn't")
) | ||
retries=$(( retries + 1 )) | ||
sleep 15 | ||
done | ||
|
||
TS=$(date -d "${TS}" +%s) | ||
WARN_TS=$(( ${TS} + $(( 20 * 60)) )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... github won't let me review outside the context of the diff, but below here I see:
if [[ "${CUR_TS}" -gt "WARN_TS" ]]; then
echo $(date -Iseconds) WARN: sdn-metrics-certs not mounted after 20 minutes.
elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then
echo $(date -Iseconds) INFO: sdn-metrics-certs not mounted. Waiting one hour.
"one hour" should be "20 minutes" shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will slide that in the PR
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ | ||
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \ | ||
"https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/namespaces/openshift-sdn/services/sdn" | | ||
python -c 'import json,sys; print(json.load(sys.stdin)["metadata"]["creationTimestamp"])' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick, but are we 100% sure that the json python module would be available within the system? Maybe there's something around that installs it and I'm not aware of it, but if that's not the case, we should also catch that exception or make sure it's installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a RHEL container so we're guaranteed to have it.
943a321
to
f1efa45
Compare
@danwinship while I was addressing it, I realized it never never exited if the retries were never successful, so besides addressing your comments I also added these new lines:
|
/retest |
f1efa45
to
36eedb4
Compare
Do we need a similar patch for any of the other daemonsets? |
36eedb4
to
92a8c6f
Compare
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
Still flaky |
92a8c6f
to
7a0dca9
Compare
There's something wrong, just added a -x to get more logging, don't review yet |
cc3bd35
to
8a95b2d
Compare
@danwinship I had to replace the if curl | python; then break; fi for a curl | python && break. With the variable asignation it wasn't working as I expected. |
8a95b2d
to
7782ba1
Compare
7782ba1
to
ae77c76
Compare
Because kube-proxy may not be initialized by the time the RBAC proxy starts it may crashloop for a while. Doesn't have any actual impact but the restarts show in oc get pod and people may worry about that.
/hold cancel |
ae77c76
to
dbdc25e
Compare
There was an issue doing the break inside the subshell. Now this is fixed and ready to merge.Can you please lgtm? |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: juanluisvaladas, rcarrillocruz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@juanluisvaladas: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retitle Bug 1883903:Add retries to SDN's RBAC proxy |
@juanluisvaladas: This pull request references Bugzilla bug 1883903, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@juanluisvaladas: All pull requests linked via external trackers have merged: Bugzilla bug 1883903 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Because kube-proxy may not be initialized by the time the RBAC proxy
starts it may crashloop for a while. Doesn't have any actual impact but
the restarts show in oc get pod and people may worry about that.