Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cluster): add https req timeout & show time left waiting for healthz #427

Merged
merged 1 commit into from
Aug 24, 2020

Conversation

metral
Copy link
Contributor

@metral metral commented Aug 18, 2020

Proposed changes

fix(cluster): add https req timeout & show time left waiting for healthz

Related issues (optional)

Fixes #423

@metral
Copy link
Contributor Author

metral commented Aug 19, 2020

Tested using the following setup with the client machine not going through a bastion or in the same VPC as the cluster.

const vpc = new awsx.ec2.Vpc("myvpc", {
    tags: { "Name": "myvpc" },
});

const cluster2 = new eks.Cluster("mycluster", {
    vpcId: vpc.id,
    publicSubnetIds: vpc.publicSubnetIds,
    endpointPrivateAccess: true,
    endpointPublicAccess: false,
});

See video demo below that shows new request timeout and updated information (the video is sped up mid-way for demo purposes). When the update does time out and eventually error, as would be expected in this setup w/o a bastion in use, it does so shortly after the 5 min endpoint test window, versus the 15+min that it's currently taking.

https://drive.google.com/file/d/1nerx0eBXMCi1dWrI-gUimmO6nRg-WVRM/view

cc @clstokes ^

Copy link
Member

@lblackstone lblackstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the UX comment

nodejs/eks/cluster.ts Outdated Show resolved Hide resolved
@metral
Copy link
Contributor Author

metral commented Aug 20, 2020

@clstokes any feedback?

@clstokes
Copy link

clstokes commented Aug 20, 2020 via email

@metral
Copy link
Contributor Author

metral commented Aug 21, 2020

Does reqTimeoutMilliseconds = 1000; mean it will timeout after 1 second?

yes, 1 second timeout for the HTTP request itself, and 5 seconds in between retries.

How does this behave when deploying from the
U.S. to a region on the other side of the world (or vice versa)?

Ran tests without endpointP*Access set for each region below from a client in us-east-2 (Ohio) going over the public Internet.

  • Brazil - endpoint ready on first attempt for 2 runs
  • Frankfurt - endpoint ready on first attempt for 2 runs
  • Ireland - endpoint ready on first attempt for first run, and after 10 sec for second run
  • Oregon - endpoint ready on after ~1min on two runs, and ready after 10 sec on third run
  • Ohio - endpoint ready on first attempt for 2 runs

There doesn't seem to be any issue or delays with the endpoint check for local or int'l regions.

If the cluster's are inaccessible e.g. no bastion with private access on the cluster, they'd timeout and quit shortly after the 5 min. window.

@metral metral merged commit 1631f97 into master Aug 24, 2020
@pulumi-bot pulumi-bot deleted the metral/fix-healthz-timeout branch August 24, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

healthz check can take much longer than 300 seconds if the cluster is unreachable
3 participants