Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/oauth2: consider session cancellation #182

Merged
merged 6 commits into from Jun 13, 2019

Conversation

s-urbaniak
Copy link
Contributor

@s-urbaniak s-urbaniak commented Jun 7, 2019

Currently, if telemeter assumes a refresh token is still valid, it will
treat errors from fetching an access token as fatal.

In reality, the authorization server might have reset our session and
invalidate the refresh token server-side despite still being valid.

In that case it returns a 400 status code.

This fixes it.

Fixes MON-690

/cc @brancz @squat @paulfantom

Currently, if telemeter assumes a refresh token is still valid,
it will treat errors from fetching an access token as fatal.

In reality, the authorization server might have reset our session
and invalidate the refresh token server-side despite still being valid.

In that case it returns a 400 status code.

This fixes it.

Fixes MON-690
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 7, 2019
Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally, just some extra metrics.

return c.passwordCredentialsToken()
}

func (c *passwordCredentialsTokenSource) passwordCredentialsToken() (*oauth2.Token, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn’t happen too often, but it would be good to know when it does. Let’s add a counter metric to understand how often this happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed 👍 added a metric, PTAL

var (
grantsTotal = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "password_credentials_grants_total",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have a telemeter_ prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, ptal

)

func init() {
prometheus.MustRegister(grantsTotal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not use the global registry? I'd like us to generally move away from this pattern, and we should be a role model for the rest of the teams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. we can do a head-start here 👍 I suggest to dependency-inject the registry here, then gradually factor all the other places, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, ptal

}

func (c *passwordCredentialsTokenSource) passwordCredentialsToken() (*oauth2.Token, error) {
c.grantsCounter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me we may want to differentiate in a successful and errored token retrieval no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have that indication (status code returned by tollbooth) already in the oauth client metric, i.e. client_api_requests_total{client="oauth",code="..."}. This metric is also used to set up alerting.

Would it be idiomatic to add a err="false|true" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's differentiated in terms of refresh token use and credentials flow? if so lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed oob:

  1. The existing client_api_requests_total metric does not differentiate the cause of the refresh. I added the cause label for that reason.
  2. We'll add a status label here to have more insight on the result of the password grant operation.

ptal

username: username,
password: password,
cfg: cfg,
grantsCounter: grantsCounter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically the actual metric is passed in here, or even grouped in a metrics struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just to ease testing, but I can pass a metrics struct here (or the counter interface) too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adressed

This gives us more insights on
a) why the refresh token was re-issued (cause label)
b) if the operation failed (status label)
@brancz
Copy link
Contributor

brancz commented Jun 13, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 83d33cd into openshift:master Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants