Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test] ci-cluster-api-provider-gcp-make-conformance-v1alpha3-k8s-ci-artifacts #95729

Closed
thejoycekung opened this issue Oct 20, 2020 · 6 comments · Fixed by kubernetes-sigs/cluster-api-provider-gcp#318
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@thejoycekung
Copy link
Contributor

thejoycekung commented Oct 20, 2020

Which jobs are failing:
ci-cluster-api-provider-gcp-make-conformance-v1alpha3-k8s-ci-artifacts

Which test(s) are failing:
Overall

Since when has it been failing:
At least October 5

Testgrid link:
https://testgrid.k8s.io/sig-release-master-informing#capg-conformance-v1alpha3-k8s-master

Reason for failure:
A few different error messages ... I'm not sure which one is the root cause for failure so I'll include all of them.

# Get kubeconfig and store it locally.
kubectl get secrets test1-kubeconfig -o json | jq -r .data.value | base64 --decode > ./kubeconfig
timeout 15m bash -c "while ! kubectl --kubeconfig=./kubeconfig get nodes | grep master; do sleep 1; done"
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

There's also a long string of error messages from Gazelle, like:

gazelle: finding module path for import titanic.biz/bar: exit status 1: go: finding module for package titanic.biz/bar
cannot find module providing package titanic.biz/bar: module titanic.biz/bar: reading https://proxy.golang.org/titanic.biz/bar/@v/list: 410 Gone
	server response: not found: titanic.biz/bar@latest: unrecognized import path "titanic.biz/bar": parsing titanic.biz/bar: XML syntax error on line 1: expected attribute name in element

or

gazelle: finding module path for import lib: exit status 1: package lib is not in GOROOT (/bazel-scratch/.cache/bazel/_bazel_root/cae228f2a89ef5ee47c2085e441a3561/external/go_sdk/src/lib)
gazelle: finding module path for import nosuchpkg: exit status 1: package nosuchpkg is not in GOROOT (/bazel-scratch/.cache/bazel/_bazel_root/cae228f2a89ef5ee47c2085e441a3561/external/go_sdk/src/nosuchpkg)

or

gazelle: finding module path for import golang.org/x/tools/internal/lsp/circular/double/one: exit status 1: go: finding module for package golang.org/x/tools/internal/lsp/circular/double/one
module golang.org/x/tools@latest found (v0.0.0-20201019175715-b894a3290fff), but does not contain package golang.org/x/tools/internal/lsp/circular/double/one

Also looks like it's having trouble copying out the results:

INFO: Build completed successfully, 3323 total actions
INFO: Build completed successfully, 3323 total actions
mkdir -p /home/prow/go/src/k8s.io/kubernetes/_output/bin/
+ cp /home/prow/go/src/k8s.io/kubernetes/bazel-bin/test/e2e/e2e.test /home/prow/go/src/k8s.io/kubernetes/_output/bin/e2e.test
cp: cannot stat '/home/prow/go/src/k8s.io/kubernetes/bazel-bin/test/e2e/e2e.test': No such file or directory

Also looks like it's having trouble finding kubeconfig:

+ kubectl --kubeconfig=/home/prow/go/src/k8s.io/kubernetes/kubeconfig version
error: stat /home/prow/go/src/k8s.io/kubernetes/kubeconfig: no such file or directory

And finally gcloud is having troubles with access configs?

+ gcloud compute --project k8s-jkns-gce-slow-1-4 instances add-access-config --zone us-east4-a test1-md-0-p7bzr
ERROR: (gcloud.compute.instances.add-access-config) Could not fetch resource:
 - At most one access config currently supported.

There are some errors during cleanup/teardown but I think those are checking to make sure the resource is properly removed, e.g.:

+ timeout 600 kubectl delete cluster test1
cluster.cluster.x-k8s.io "test1" deleted
+ timeout 600 kubectl wait --for=delete cluster/test1
Error from server (NotFound): clusters.cluster.x-k8s.io "test1" not found
+ true
+ make kind-reset
make: *** No rule to make target 'kind-reset'.  Stop.
+ true
++ go env GOPATH
+ cd /home/prow/go/src/k8s.io/kubernetes
+ rm -f _output/bin/e2e.test
+ gcloud compute forwarding-rules delete --project k8s-jkns-gce-slow-1-4 --global test1-apiserver --quiet
ERROR: (gcloud.compute.forwarding-rules.delete) Could not fetch resource:
 - The resource 'projects/k8s-jkns-gce-slow-1-4/global/forwardingRules/test1-apiserver' was not found
+ true

Anything else we need to know:
Example spyglass link: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-cluster-api-provider-gcp-make-conformance-v1alpha3-k8s-ci-artifacts/1318486842399526912

/sig cluster-lifecycle
/cc @kubernetes/ci-signal

@thejoycekung thejoycekung added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Oct 20, 2020
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 20, 2020
@thejoycekung thejoycekung added this to New (no response yet) in CI Signal team (SIG Release) Oct 20, 2020
@neolit123
Copy link
Member

neolit123 commented Oct 20, 2020

i've notified the #cluster-api channel on k8s slack about this.
thanks for the report @thejoycekung

@cpanato
Copy link
Member

cpanato commented Oct 20, 2020

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 20, 2020
@thejoycekung thejoycekung moved this from New (no response yet) to Under investigation (prioritized) in CI Signal team (SIG Release) Oct 20, 2020
CI Signal team (SIG Release) automation moved this from Under investigation (prioritized) to Observing (observe test failure/flake before marking as resolved) Oct 22, 2020
@cpanato
Copy link
Member

cpanato commented Oct 23, 2020

much better https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-cluster-api-provider-gcp#capg-conformance-v1alpha3-k8s-master have a fail, but i will observe in the following days

@neolit123
Copy link
Member

neolit123 commented Oct 23, 2020 via email

@thejoycekung
Copy link
Contributor Author

@cpanato Thanks for your help! I think the fail is unrelated, that test is failing/flaking across several other jobs so I logged another issue for it.

@cpanato
Copy link
Member

cpanato commented Oct 23, 2020

I opened a PR for a fix proposal for that test: #95831

@thejoycekung thejoycekung moved this from Observing (observe test failure/flake before marking as resolved) to Resolved in CI Signal team (SIG Release) Nov 18, 2020
@thejoycekung thejoycekung moved this from Resolved to Resolved (2+ weeks) in CI Signal team (SIG Release) Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
CI Signal team (SIG Release)
  
Resolved (2+ weeks)
Development

Successfully merging a pull request may close this issue.

4 participants