New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/oauth2: consider session cancellation #182
pkg/oauth2: consider session cancellation #182
Conversation
Currently, if telemeter assumes a refresh token is still valid, it will treat errors from fetching an access token as fatal. In reality, the authorization server might have reset our session and invalidate the refresh token server-side despite still being valid. In that case it returns a 400 status code. This fixes it. Fixes MON-690
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good generally, just some extra metrics.
pkg/oauth2/token_source.go
Outdated
return c.passwordCredentialsToken() | ||
} | ||
|
||
func (c *passwordCredentialsTokenSource) passwordCredentialsToken() (*oauth2.Token, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn’t happen too often, but it would be good to know when it does. Let’s add a counter metric to understand how often this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed 👍 added a metric, PTAL
pkg/oauth2/token_source.go
Outdated
var ( | ||
grantsTotal = prometheus.NewCounter( | ||
prometheus.CounterOpts{ | ||
Name: "password_credentials_grants_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have a telemeter_
prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, ptal
pkg/oauth2/token_source.go
Outdated
) | ||
|
||
func init() { | ||
prometheus.MustRegister(grantsTotal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not use the global registry? I'd like us to generally move away from this pattern, and we should be a role model for the rest of the teams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. we can do a head-start here 👍 I suggest to dependency-inject the registry here, then gradually factor all the other places, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed, ptal
pkg/oauth2/token_source.go
Outdated
} | ||
|
||
func (c *passwordCredentialsTokenSource) passwordCredentialsToken() (*oauth2.Token, error) { | ||
c.grantsCounter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me we may want to differentiate in a successful and errored token retrieval no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have that indication (status code returned by tollbooth) already in the oauth
client metric, i.e. client_api_requests_total{client="oauth",code="..."}
. This metric is also used to set up alerting.
Would it be idiomatic to add a err="false|true"
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's differentiated in terms of refresh token use and credentials flow? if so lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed oob:
- The existing
client_api_requests_total
metric does not differentiate the cause of the refresh. I added thecause
label for that reason. - We'll add a
status
label here to have more insight on the result of the password grant operation.
ptal
username: username, | ||
password: password, | ||
cfg: cfg, | ||
grantsCounter: grantsCounter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically the actual metric is passed in here, or even grouped in a metrics struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just to ease testing, but I can pass a metrics struct here (or the counter interface) too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adressed
This gives us more insights on a) why the refresh token was re-issued (cause label) b) if the operation failed (status label)
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, s-urbaniak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Currently, if telemeter assumes a refresh token is still valid, it will
treat errors from fetching an access token as fatal.
In reality, the authorization server might have reset our session and
invalidate the refresh token server-side despite still being valid.
In that case it returns a 400 status code.
This fixes it.
Fixes MON-690
/cc @brancz @squat @paulfantom