Fix e2e test: gang scheduling. #835
Comments
@thandayuthapani , please help to check what happened :) |
@k82cn It is passing in my local test setup |
I'm able to find PodGroup Unschedulable event in local test setup, but in Travis is getting timeout at this step because it is not able to find that event or event not getting generated. Will check what is the problem. |
@thandayuthapani , we found similar issue in Volcano which because of some change in e2e, refer to volcano-sh/volcano@c53779d#diff-8349006db2c242fd7424e1dfb3295840R430 for more detail. |
@k82cn Test Case failure is because of Unschedulable event is not generated so it is getting timed out waiting for unschedulable event and test case fails. Event is not generated because of fields in podGroupStatus in PodGroup object is reset to its nil value, while status updater function is called. StatusUpdater uses K8s Update API call, when that is being replaced with UpdateStatus call, test case passed. Some problem with Update call, where status data is getting reset to its nil value. |
Can you help to check history why it's failed recently? |
It used to pass with local DINDv1.13 setup, but once I Cleaned that setup and brought up new setup of DIND-v1.13, I started facing the same problem as CI was facing. I think after DIND pulls new images for kubernetes components, it is facing this problem. Because it was not facing the problem in my old DIND setup with same code, but with new DIND setup(New images being pulled for kubernetes components), it was facing the issue, with no change in code. There was no change in DIND version, but only new images were being pulled by DIND clusters for kubernetes components. |
but volcano-sh/kube-batch seems fine without this fix. |
That test case has been skipped in volcano/kube-batch |
xref https://travis-ci.com/volcano-sh/volcano/jobs/198124601 , it has been fixed in the commit that I mentioned above. |
It volcano-sh/volcano we are using KIND to bring k8s cluster, but in kubernetes-sigs/kube-batch we use DIND to bring us k8s cluster. |
then try kind; honestly, I'm unconfotable about the fix if we do not know the root cause :) |
With Local KIND setup, gang scheduling and statement test case is passing, should I have to make change in Kube-Batch to use KIND now in CI? |
ok, let's try kind :) |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
After the revert of batch job patches #806 , the gang scheduling testcase is falling, need investigate and fix.
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: