New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure SD Failure metric and 404 Handling #10476
Azure SD Failure metric and 404 Handling #10476
Conversation
…okup Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
|
@roidelapluie Please review when you have the time. |
discovery/azure/azure.go
Outdated
| @@ -539,7 +555,7 @@ func mapFromVMScaleSetVM(vm compute.VirtualMachineScaleSetVM, scaleSetName strin | |||
| } | |||
| } | |||
|
|
|||
| func (client *azureClient) getNetworkInterfaceByID(ctx context.Context, networkInterfaceID string) (*network.Interface, error) { | |||
| func (client *azureClient) getNetworkInterfaceByID(ctx context.Context, networkInterfaceID string) (*network.Interface, *autorest.DetailedError) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions/methods should always return errors as the abstract error interface.
| func (client *azureClient) getNetworkInterfaceByID(ctx context.Context, networkInterfaceID string) (*network.Interface, *autorest.DetailedError) { | |
| func (client *azureClient) getNetworkInterfaceByID(ctx context.Context, networkInterfaceID string) (*network.Interface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any advantages to doing this and several drawbacks.
- The calling code has to use
errors.As()to try and get aDetailedError, which would require another failure check. - There's also no indication from the method's signatures that a
DetailedErrorwould get returned. That information gets hidden to the caller. This makes code maintenance and future development harder.
I see how this would make sense if several different errors could be returned. Am I missing something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Go, fallible functions return error and never concrete types. The complexities that arise from concrete error return types are, well, far too numerous to put into a GitHub PR comment 😉
The calling code has to use errors.As()
This is standard operating procedure.
There's also no indication from the method's signatures that a DetailedError would get returned.
It's expected that this kind of thing is captured in method documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any references that go into further details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we narrow down the error to ignore in getNetworkInterfaceByID? This particular error would ignorer 404 in 3 different calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If callers need to access specific details from the error, then you can return an error type in the function...
case code was 404:
return nil, &DetailedError{...}And use errors.As at the callsite...
var detailedErr *DetailedError
iface, err := c.getNetworkInterfaceByID(ctx, id)
switch {
case err == nil:
// handle success case
case errors.As(err, &detailedErr):
// handle DetailedError case
case err != nil:
// handle default error case
}Or...
if iface, err := c.getNetworkInterfaceByID(ctx, id); err == nil {
// handle success case
} else if detailedErr := &(DetailedError{}); errors.As(err, &detailedErr) {
// handle DetailedError case
} else if err != nil {
// handle default error case
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I found similar patterns when researching custom errors after your first post. I asked, because I didn't see any advantage to creating custom errors over the current implementation of just returning a DetailedError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a good idea for functions that return errors always to use the error type in their signature (as we did above) rather than a concrete type such as *MyError, to help guarantee the error is created correctly. As an example, os.Open returns an error even though, if not nil, it’s always of concrete type *os.PathError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the references. They were very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this method to return the error interface with different concrete types. The 404 check is performed here instead of by the caller.
discovery/azure/azure.go
Outdated
| if detailedErr != nil { | ||
| if detailedErr.StatusCode == 404 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need to access elements of a specific error type, use errors.As.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
|
We built and deployed the version in this PR. The results looked good. The happy path with no issues works fine, and we forced an error condition, and the failedCount metric is working as expected. |
…erent errors Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
|
@peterbourgon do you want to give this a last review? |
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Errors should not start with an uppercase, that is my last nit. Thanks @peterbourgon for the extra review.
discovery/azure/azure.go
Outdated
| @@ -561,6 +577,11 @@ func mapFromVMScaleSetVM(vm compute.VirtualMachineScaleSetVM, scaleSetName strin | |||
| } | |||
| } | |||
|
|
|||
| var errorNotFound = errors.New("Network interface does not exist") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var errorNotFound = errors.New("Network interface does not exist") | |
| var errorNotFound = errors.New("network interface does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
|
Thanks!! |
* For Azure sd, added failure counter and skipping of 404's from Nic lookup Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
This commit introduces a new metric to count the number of failed requests to Linode's API when using Linode SD. Resolves prometheus#10672, inspired by prometheus#10476. _Note_: this doens't count failures when polling the `/account/events` endpoint, as a `401` there is how we determine if the supplied token has the needed API scopes to do event polling vs full refreshes each interval. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit introduces a new metric to count the number of failed requests to Linode's API when using Linode SD. Resolves prometheus#10672, inspired by prometheus#10476. _Note_: this doens't count failures when polling the `/account/events` endpoint, as a `401` there is how we determine if the supplied token has the needed API scopes to do event polling vs full refreshes each interval. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit introduces a new metric to count the number of failed requests to Linode's API when using Linode SD. Resolves #10672, inspired by #10476. _Note_: this doens't count failures when polling the `/account/events` endpoint, as a `401` there is how we determine if the supplied token has the needed API scopes to do event polling vs full refreshes each interval. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit introduces a new metric to count the number of failed requests to Linode's API when using Linode SD. Resolves prometheus#10672, inspired by prometheus#10476. _Note_: this doens't count failures when polling the `/account/events` endpoint, as a `401` there is how we determine if the supplied token has the needed API scopes to do event polling vs full refreshes each interval. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

For issue 10455. In Azure service discovery:
404responses log and continue instead of returning an errorNo existing unit tests, so needs manual testing.