Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the timeout on apiserver requests to fetch node stats. #4739

Merged
merged 1 commit into from
Feb 23, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 15 additions & 3 deletions test/e2e/cadvisor.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ import (
. "github.com/onsi/ginkgo"
)

const (
timeout = 1 * time.Minute
maxRetries = 10
sleepDuration = time.Minute
)

var _ = Describe("Cadvisor", func() {
var c *client.Client

Expand All @@ -44,21 +50,27 @@ func CheckCadvisorHealthOnAllNodes(c *client.Client, timeout time.Duration) {
nodeList, err := c.Nodes().List()
expectNoError(err)
var errors []error
for start := time.Now(); time.Since(start) < timeout; time.Sleep(5 * time.Second) {
retries := maxRetries
for {
errors = []error{}
for _, node := range nodeList.Items {
// cadvisor is not accessible directly unless its port (4194 by default) is exposed.
// Here, we access '/stats/' REST endpoint on the kubelet which polls cadvisor internally.
statsResource := fmt.Sprintf("api/v1beta1/proxy/minions/%s/stats/", node.Name)
By(fmt.Sprintf("Querying stats from node %s using url %s", node.Name, statsResource))
_, err = c.Get().AbsPath(statsResource).Timeout(1 * time.Second).Do().Raw()
_, err = c.Get().AbsPath(statsResource).Timeout(timeout).Do().Raw()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading this right? Each timeout is a minute, across 2 nodes, for 10 retries, so you want a budget of 20 minutes for this test to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is the intent. At this point, I don't even know the reason for the failure. We can dial this back once the test passes successfully.

if err != nil {
errors = append(errors, err)
}
}
if len(errors) == 0 {
return
}
if retries--; retries <= 0 {
break
}
Logf("failed to retrieve kubelet stats -\n %v", errors)
time.Sleep(sleepDuration)
}
Failf("Timed out after %v waiting for cadvisor to be healthy on all nodes. Errors:\n%v", timeout, errors)
Failf("Failed after retrying %d times for cadvisor to be healthy on all nodes. Errors:\n%v", maxRetries, errors)
}