Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For GCE, compute the external IP by using the local metadata server #8899

Merged
1 commit merged into from
May 28, 2015

Conversation

roberthbailey
Copy link
Contributor

This is in many ways a revert of #7530 but after auditing the code I found that this function is now only used to determine an address of the node where it is currently running.

@roberthbailey
Copy link
Contributor Author

Still running e2e tests...

@roberthbailey
Copy link
Contributor Author

@mbforbes FYI

@@ -466,10 +468,10 @@ func (gce *GCECloud) getInstanceByName(name string) (*compute.Instance, error) {
}

// NodeAddresses is an implementation of Instances.NodeAddresses.
func (gce *GCECloud) NodeAddresses(instance string) ([]api.NodeAddress, error) {
externalIP, err := gce.getExternalIP(instance)
func (gce *GCECloud) NodeAddresses(_ string) ([]api.NodeAddress, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO to rename/redefine this function across the implementations of the interface to make it clear that it is getting the current instance's Addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the TODO to the cloud provider interface file rather than here.

server. This is in many ways a revert of kubernetes#7530 but after auditing
the code I found that this function is now only used to determine
an address of the node where it is currently running.
@ghost
Copy link

ghost commented May 28, 2015

I'm pretty uncomfortable merging this as is, as the function NodeAddresses(instance string) just throws away it's argument and doesn't actually do what the interface definition says it should. Is there a good reason to defer fixing that until later? I'd feel better removing the parameter from the interface definition, and fixing whatever breakage that results in. I don't want to be pedantic, but I guess I've come across too many TODO comments at the heart of system bugs recently. I could be convinced otherwise if the TODO comment was replaced by a P1 v1.0 issue.

@roberthbailey
Copy link
Contributor Author

@quinton-hoole that's a perfectly fair response. I didn't have time to do a more proper fix -- this was meant to stop the bleeding in our jenkins project. I'm happy to open a P1 issue to update/change the API definition if you can find an owner to carry it through to completion.

@ghost
Copy link

ghost commented May 28, 2015

Lets leave the scalability test disabled to save GCE QPS quota until we can
fix this PR properly (hopefully within a day or two), unless that
significantly inconveniences someone.

@piosz @wojtek @fgrzadkowski any objections?

Until a fixed version of this PR is merged, the scalability tests are causing us to exceed our GCE QPS quota in the project in which we run continuous integration (Jenkins) e2e tests. So if we can disable the scalability tests in Jenkins while we fix this, it would help.

@roberthbailey
Copy link
Contributor Author

e2e test result:

Summarizing 5 Failures:

[Fail] Density [AfterEach] [Performance suite] should allow starting 30 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:84

[Fail] PD [It] should schedule a pod w/ a RW PD, remove it, then schedule it on another host 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pd.go:105

[Timeout...] Networking [It] should function for intra-pod communication 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/networking.go:244

[Panic!] Services [BeforeEach] should be able to change the type and nodeport settings of a service 
/usr/src/go/src/runtime/asm_amd64.s:401

[Fail] Density [AfterEach] [Performance suite] should allow starting 3 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:84

Ran 52 of 60 Specs in 3513.927 seconds
FAIL! -- 47 Passed | 5 Failed | 1 Pending | 7 Skipped --- FAIL: TestE2E (3581.17s)
FAIL

@ghost
Copy link

ghost commented May 28, 2015

Those are all flaky tests, consistent with the kubernetes-e2e-gce-flaky job in Jenkins.

Tomorrow I will mark them as such in the test definitions (as opposed to just in our Jenkins continuous integration config), so that everyone is consistently aware of what's currently flaky and what's not.

@wojtek-t
Copy link
Member

@quinton-hoole
I completely agree that we should have scalability tests switched off until this PR is merged.

On the other hand, I have a very strong feeling that switching them on should be task of a very high priority - they are (at least partially) driving our optimizations efforts and can uncover possible regressions.
So I think that merging this PR as is (since your comment is not about correctness of this change but about maintainability/readability/...) and creating a v1.0 issue to fix it is what we really should do here.

@fgrzadkowski - what do you think?

@fgrzadkowski
Copy link
Contributor

@roberthbailey One suggestion to address Quinton's suggestion is to check if the node name matches name of the local machine (is it possible using metadata server or something else which is lightweight?) and return error if it's not. That would not break interface (in some sense) and should be easy to fix.

I agree with @wojtek-t that re-enabling scalability tests is very important for us and if my suggestion doesn't make sense, we should submit this PR and submit issue to address TODO.

@ghost ghost mentioned this pull request May 28, 2015
@ghost
Copy link

ghost commented May 28, 2015

OK, opened #8919 to fix this. Merging this PR in the mean time to stop the bleeding.

ghost pushed a commit that referenced this pull request May 28, 2015
For GCE, compute the external IP by using the local metadata server
@ghost ghost merged commit 9b67435 into kubernetes:master May 28, 2015
@wojtek-t
Copy link
Member

Thanks @quinton-hoole - can we now enable scalability tests on Jenkins?

@ghost
Copy link

ghost commented May 28, 2015

@wojtek-t Yes, I'll do that now. Thanks for reminding me 👍

@mbforbes
Copy link
Contributor

Out of curiosity, how did this break? NatIP no longer correct?

@ghost
Copy link

ghost commented May 28, 2015

Pre this PR, it just chewed up all of our API QPS quota on GCE, due to the large number of reads per node.

@roberthbailey roberthbailey unassigned ghost Aug 12, 2015
@roberthbailey roberthbailey deleted the ip-from-metadata branch August 20, 2015 21:36
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants