-
Notifications
You must be signed in to change notification settings - Fork 143
Conversation
@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: jiaqianjing. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
|
I think it is caused by the version of the codegen tools |
@gaocegege Any ideas which version can affect on this? |
Can we merge it now? |
No idea about it..
|
@gaocegege We can try to merge it, but unit test can still be failed because of |
https://travis-ci.com/organizations/kubeflow/migrate @jlewi Hi Jeremy, can you help do it? |
@andreyvelich and @gaocegege As WG leads can you please come up with an approach for maintaining and supporting your testing infrastructure that minimizes the burden on the GitHub org admins. If you need a GitHub org change please open an issue and assign it to one of the org admins (other than me) listed in: |
I think the best approach is to give WG Leads the permission to manage projects in these test infras. I will try to promote it with GitHub org admins |
What does this mean in terms of GitHub permissions? |
@gaocegege @jlewi Will admin permission for pytorch-operator repository be enough to control Travis? /cc @johnugeorge |
Can we merge it now? |
@gaocegege Sure, let's create an issue to track this problem. |
/lgtm |
/hold cancel |
/retest |
@Jeffwan @jinchihe This PR is needed to fix Travis and e2e bugs. @jinchihe Do you have any ideas how to fix: |
Yeah, I found that's OK if use old k8s cluster, but for now the k8s 1.14.x cannot be deployed any more. |
@jinchihe We use the same command in Katib to deploy cluster and it is working (https://github.com/kubeflow/katib/blob/master/test/scripts/v1beta1/create-cluster.sh#L36), but use kubectl and kustomize to deploy Katib and operators. I believe it fails on this step. |
Not sure, but guess here, need to confirm. Thanks.
|
@andreyvelich: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
152b920
to
d3f22c8
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, gaocegege The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Related: #292.
I was able to run unit test locally after these changes .
I changed
expectedServiceDeletions
from 0 to 1. Here: https://github.com/kubeflow/pytorch-operator/blob/master/pkg/controller.v1/pytorch/job.go#L163-L171, we don't check thatjob.Spec.CleanPodPolicy == common.CleanPodPolicyRunning
before deleting services.@jiaqianjing is that correct behaviour? In TF operator we have one loop for services and pods: https://github.com/kubeflow/tf-operator/blob/master/pkg/controller.v1/tensorflow/job.go#L162-L172, but in PyTorch operator we analyse services and pods in the different loops.
To fix this error:
Fail in goroutine after TestCopyLabelsAndAnnotation has completed
, I changed run test controller, like here: https://github.com/kubeflow/pytorch-operator/blob/master/pkg/controller.v1/pytorch/controller_test.go#L341-L347.Do you know how we can add travis check before PR submission?
/assign @gaocegege @johnugeorge
/cc @jiaqianjing