New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Test Failure] TFJob test failure; no module named py #1218
Comments
For #1200 it looks like the test passed on retry which is very weird. |
Looking into this. |
/asssign @lluunn Thanks |
Found pod: kubeflow-test-infra kubeflow-presubmit-kubeflow-e2e-gke-1224-e8f6cef-2600-21af-237888385 Logs: describe pod Environment:
cannot exec into it: error: cannot exec into a container in a completed pod; current phase is Failed |
However, no
|
https://github.com/kubeflow/tf-operator/blob/master/py/__init__.py It's not deleted |
It's weird:
|
So |
|
It's there for this one http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-presubmit-kubeflow-e2e-gke-1224-1296b1b-2610-cc8a?tab=workflow So looks like bootstraper deleted it somehow..? |
#1218 (comment)
So something else is deleting it.. |
testing repo also got deleted. |
What do you mean? |
@lluunn Nice work! |
That one is not merged, it's closed. |
Seems fixed now |
…1218) * GoogleCloudPlatform/kubeflow-distribution#33 is tracking GCP blueprints on private GKE with VPC-SC * This PR doesn't fully enable that but it includes a lot of necessary changes. * cluster-private-patch.yaml is a cluster patch that turns on a lot of settings to deploy GKE with private GKE * For ease of use we make the master publicly accessible anywhere; users could configure that behavior if desired using patch overlays. * Use kpt setters to name all the networking resources (firewall rules, networks, etc...) * This ensures the names are unique based on the KF deployment name and won't conflict with existing rules. * The setters also ensures that the references get set correctly; e.g. the firewall rules correctly refer the newly created network. * Add a CNRM resource to enable CloudDNS. * Per GoogleCloudPlatform/kubeflow-distribution#31 we should probably use CNRM and not AnthosCLI to enable all required services. * Add a kpt setter to control firewall rule logging * Enabling firewall rule logging can be useful to debug why connections are blocked. Enable logging on firewall rules. * Add an extra firewall rule for ISTIO *Per https://istio.io/docs/setup/platform-setup/gke/ we need to manually create an additional firewall rule to allow traffic to the ISTIO pilot webhook port. * Add a NAT to allow outbound internet egress * Egress is still blocked by firewall rules * Per kbueflow/gcp-blueprints#34 this was an attempt to make it possible to pull images from DockerHub and Quay.IO. This was partially succesful; pulling from DockerHub works but for Quay.IO the firewall rules are strill blocking required connections. * Fix the v3 version of the cert-manager package. * kubeflow#1134 moved the kubeflow issuer into its own package to avoid race conditions * That refactored means that the v3 packages no longer included the actual cert-manager resources * This PR fixes that by having the v3 package pull in the base package
TFJob test is failing with
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/kubeflow_kubeflow/1200/kubeflow-presubmit/2586/
Seems completely unrelated to that PR.
Need to check whether its also failing @ HEAD.
The text was updated successfully, but these errors were encountered: