Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep bootstrapper pod alive when error occurs #921

Merged
merged 3 commits into from Jun 6, 2018

Conversation

kunmingg
Copy link
Contributor

@kunmingg kunmingg commented Jun 4, 2018

fix #901


This change is Reviewable

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 4, 2018

/assign @ankushagarwal

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 4, 2018

/retest

log.Fatalf("%v\n", err)
if s.InCluster && s.KeepAlive {
log.Infof("Bootstrapper failed with error: %v\n", err)
log.Infof("Keeping pod alive...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log a description of why we're doing this so that it's easier for the user to understand what's happening

@ankushagarwal
Copy link
Contributor

/lgtm
/approve

/cc @jlewi
/hold in case @jlewi has comments

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ankushagarwal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ankushagarwal
Copy link
Contributor

/hold

@@ -43,6 +44,14 @@ func main() {
}

if err := app.Run(s); err != nil {
log.Fatalf("%v\n", err)
if s.InCluster && s.KeepAlive {
log.Infof("Bootstrapper failed with error: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use log.Errorf here since you are logging an error.

@jlewi
Copy link
Contributor

jlewi commented Jun 5, 2018

One minor comment but otherwise looks good.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 5, 2018
@ankushagarwal
Copy link
Contributor

/lgtm

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/retest

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/hold cancel

@jlewi
Copy link
Contributor

jlewi commented Jun 5, 2018

Looks like a test flake; filed
kubeflow/testing#147

/test all

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/retest

3 similar comments
@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/retest

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/retest

@kunmingg
Copy link
Contributor Author

kunmingg commented Jun 5, 2018

/retest

@jlewi
Copy link
Contributor

jlewi commented Jun 6, 2018

/test all

Looks like the most recent failure was a minikube error waiting for the VM to be deleted.

@k8s-ci-robot k8s-ci-robot merged commit 9dfda4f into kubeflow:master Jun 6, 2018
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
* keep bootstrapper pod alive when running inside k8s

* Edit message when error happens

* handle review feedback
surajkota pushed a commit to surajkota/kubeflow that referenced this pull request Jun 13, 2022
…rces (kubeflow#921)

* Applications should not try to take ownership of cluster scoped resources.

* See kubeflow#4767 - If applications try to take ownership
  of cluster scoped resources or resources in other namespaces
  this can violate requirements of the K8s GC and lead to unpredictable
  behavior.

* It looks like this might be causing the profile controller deployment
  to get GC'd after 24 hours.

* Bump application version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bootstrapper should not crash loop on error
4 participants