New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gce-master-scale-performance tests are timing out #71215

Closed
mariantalla opened this Issue Nov 19, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@mariantalla
Contributor

mariantalla commented Nov 19, 2018

Which jobs are failing:
sig-release-master-blocking#gce-master-scale-performance

Which test(s) are failing:
The tests are timing out.

Since when has it been failing:
2018-11-16

Anything else we need to know:
The diff since last passing test (Nov 15th) is here.

/kind failing-test

@mariantalla

This comment has been minimized.

Contributor

mariantalla commented Nov 19, 2018

Assigning priority important-soon for now; this has been passing till now and could be a flake... Will promote to critical-urgent if run 260 fails.

cc @AishSundar @jberkus @mortent

/priority important-soon

@jberkus

This comment has been minimized.

jberkus commented Nov 19, 2018

/sig scalability

@jberkus

This comment has been minimized.

jberkus commented Nov 19, 2018

This may be just #69600 again, the next run will determine if we're seeing something new.

@AishSundar

This comment has been minimized.

Contributor

AishSundar commented Nov 20, 2018

@mborsz

This comment has been minimized.

Member

mborsz commented Nov 20, 2018

/assign

@mborsz

This comment has been minimized.

Member

mborsz commented Nov 20, 2018

I just checked this and it's safe to ignore: it's problem with the test itself and not the regression.

I see in gce activity logs that the following vms were restarted due to hostError:

  • gce-scale-cluster-minion-group-4-ttff
  • gce-scale-cluster-minion-group-q9bw
  • gce-scale-cluster-minion-group-2-9sbz
  • gce-scale-cluster-minion-group-1-v8jc
  • gce-scale-cluster-minion-group-3-2dk9

Our test doesn't support cases when nodes disappear for that reason well and fails in that case. I'm working on fixing the test to be resistant to cases like that.

@AishSundar

This comment has been minimized.

Contributor

AishSundar commented Nov 20, 2018

@jberkus

This comment has been minimized.

jberkus commented Nov 21, 2018

OK, switching this to a long-term fix non-blocker

/priority important-longterm
/kind flake
/remove-kind failing-test

@jberkus

This comment has been minimized.

jberkus commented Nov 26, 2018

ooops

/remove-priority important-soon

@wojtek-t

This comment has been minimized.

Member

wojtek-t commented Nov 29, 2018

This is a duplicate of: kubernetes/test-infra#10188

The problem is that prow started the job for the second time, which killed the underlying cluster from this run. And without cluster, it's not unexpected that the test timedout.

@wojtek-t wojtek-t closed this Nov 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment