-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For GCE, compute the external IP by using the local metadata server #8899
Conversation
Still running e2e tests... |
@mbforbes FYI |
@@ -466,10 +468,10 @@ func (gce *GCECloud) getInstanceByName(name string) (*compute.Instance, error) { | |||
} | |||
|
|||
// NodeAddresses is an implementation of Instances.NodeAddresses. | |||
func (gce *GCECloud) NodeAddresses(instance string) ([]api.NodeAddress, error) { | |||
externalIP, err := gce.getExternalIP(instance) | |||
func (gce *GCECloud) NodeAddresses(_ string) ([]api.NodeAddress, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO to rename/redefine this function across the implementations of the interface to make it clear that it is getting the current instance's Addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the TODO to the cloud provider interface file rather than here.
server. This is in many ways a revert of kubernetes#7530 but after auditing the code I found that this function is now only used to determine an address of the node where it is currently running.
b788a75
to
1dfaa93
Compare
I'm pretty uncomfortable merging this as is, as the function NodeAddresses(instance string) just throws away it's argument and doesn't actually do what the interface definition says it should. Is there a good reason to defer fixing that until later? I'd feel better removing the parameter from the interface definition, and fixing whatever breakage that results in. I don't want to be pedantic, but I guess I've come across too many TODO comments at the heart of system bugs recently. I could be convinced otherwise if the TODO comment was replaced by a P1 v1.0 issue. |
@quinton-hoole that's a perfectly fair response. I didn't have time to do a more proper fix -- this was meant to stop the bleeding in our jenkins project. I'm happy to open a P1 issue to update/change the API definition if you can find an owner to carry it through to completion. |
Lets leave the scalability test disabled to save GCE QPS quota until we can @piosz @wojtek @fgrzadkowski any objections? Until a fixed version of this PR is merged, the scalability tests are causing us to exceed our GCE QPS quota in the project in which we run continuous integration (Jenkins) e2e tests. So if we can disable the scalability tests in Jenkins while we fix this, it would help. |
e2e test result:
|
Those are all flaky tests, consistent with the kubernetes-e2e-gce-flaky job in Jenkins. Tomorrow I will mark them as such in the test definitions (as opposed to just in our Jenkins continuous integration config), so that everyone is consistently aware of what's currently flaky and what's not. |
@quinton-hoole On the other hand, I have a very strong feeling that switching them on should be task of a very high priority - they are (at least partially) driving our optimizations efforts and can uncover possible regressions. @fgrzadkowski - what do you think? |
@roberthbailey One suggestion to address Quinton's suggestion is to check if the node name matches name of the local machine (is it possible using metadata server or something else which is lightweight?) and return error if it's not. That would not break interface (in some sense) and should be easy to fix. I agree with @wojtek-t that re-enabling scalability tests is very important for us and if my suggestion doesn't make sense, we should submit this PR and submit issue to address TODO. |
OK, opened #8919 to fix this. Merging this PR in the mean time to stop the bleeding. |
For GCE, compute the external IP by using the local metadata server
Thanks @quinton-hoole - can we now enable scalability tests on Jenkins? |
@wojtek-t Yes, I'll do that now. Thanks for reminding me 👍 |
Out of curiosity, how did this break? |
Pre this PR, it just chewed up all of our API QPS quota on GCE, due to the large number of reads per node. |
This is in many ways a revert of #7530 but after auditing the code I found that this function is now only used to determine an address of the node where it is currently running.