Skip to content
This repository has been archived by the owner. It is now read-only.

oauth client_id's that are being logged as expired #324

Closed
jrgm opened this issue Aug 15, 2015 · 13 comments
Closed

oauth client_id's that are being logged as expired #324

jrgm opened this issue Aug 15, 2015 · 13 comments
Assignees

Comments

@jrgm
Copy link
Contributor

@jrgm jrgm commented Aug 15, 2015

I looked at production oauth logs for token.token.verify.expired over the past two days and did a rollup of the reported client_id's. So here it is on a percentage basis.

  92%   5882386c6d801776  - Firefox
   6%   ea3ca969f8c6bb0d  - Firefox Accounts
   2%   3332a18d142636cb  - Fennec
@jrgm
Copy link
Contributor Author

@jrgm jrgm commented Aug 15, 2015

I only put this in percentage, but if/when the question comes up, the answer is "a significant number of requests".

@rfk
Copy link
Member

@rfk rfk commented Aug 16, 2015

Thanks @jrgm! I'm glad to see it's all assertion-grant-flow-using client codes in this list, not third-party reliers.

The "Firefox Accounts" one should be fixed already IIUC since we're no longer caching the access tokens.

The next thing we need to do is confirm whether Firefox and Fennec are handling 401s from expired tokens correctly. If they both recover successfully by e.g. just issuing themselves a new token, then I think we'll be OK to re-enable the expiry.

(Ideally, clients would be using the server-provided expiry time to refresh their tokens before they expire, but we can't depend on that in general, so must expect some stable background rate of 401s. If it remains "a significant number of requests" then we should try to prioritize changes to client behaviour to reduce it.)

@jrgm
Copy link
Contributor Author

@jrgm jrgm commented Aug 17, 2015

The "Firefox Accounts" one should be fixed already IIUC since we're no longer caching the access tokens.

Note that this data is from production traffic in the past two days.

One other thing to note, is that the tokens that are reporting as expired were issued months ago in some percentage of cases.

@rfk
Copy link
Member

@rfk rfk commented Aug 17, 2015

Note that this data is from production traffic in the past two days.

Ah, interesting, we'll have to dig in deeper. Linking to kibana dashboard with these events in it for my reference:

https://kibana.fxa.us-west-2.prod.mozaws.net/index.html#/dashboard/elasticsearch/OAuth%20App%20Logs

@rfk rfk added this to the FxA-0: quality milestone Aug 28, 2015
@rfk
Copy link
Member

@rfk rfk commented Sep 15, 2015

Summarizing current status: we believe that the latest versions of all of these clients should cope gracefully with expired tokens, but they'll happily keep using a token until we tell them it has expired. These events will therefore continue at their current rate until we start expiring the tokens.

@jrgm is going to take a look at the latest breakdown and client versions associated with these logs.

We should then try some manual testing with expired tokens on the latest versions of Desktop, Fennec and content-server to confirm that their behaviour is OK. If so, once we're at sufficient penetration of update-to-date versions then we can try re-enabling the expiry.

@jrgm
Copy link
Contributor Author

@jrgm jrgm commented Sep 16, 2015

I looked again at two days and proportions are about the same

95%   5882386c6d801776
 3%   ea3ca969f8c6bb0d
 2%   3332a18d142636cb

Unfortunately, as these are server-based calls to /verify, we don't have any UA info.

@rfk
Copy link
Member

@rfk rfk commented Sep 18, 2015

I tried latest fxa-content-server with an expired oauth token, and it did indeed seem to discard its cached token and fetch a new one.

@rfk
Copy link
Member

@rfk rfk commented Sep 18, 2015

I tried desktop Fx release channel with an expired oauth token, via the following procedure:

  • setup fxa-local-dev, and npm start a new firefox profile connected to it
  • connect to sync, set a profile picture, and visit about:preferences#sync so that my picture is displayed
  • restart the local oauth server, clearing its token database
  • change my avatar picture via the web
  • refresh about:preferences#sync to cause it to re-fetch my avatar

No errors were shown in UI, but it failed to fetch the updated avatar. Console output showed a bunch of errors and unhandled promise rejections. So sadly, I suspect we're not ready to start expiring these just yet :-/

I'll keep digging

@rfk
Copy link
Member

@rfk rfk commented Oct 30, 2015

Status update for the planning meeting: I'm still working on this, slowly, as a background task. I need to re-test the above with the new Firefox release.

@rfk
Copy link
Member

@rfk rfk commented Nov 27, 2015

I tried this again in latest release (Firefox 42) and it looks like it's working correctly. The browser detects the 401 from the oauth server, discards its cached token and fetches a new one, and successfully proceeds to update its local version of my profile picture.

@jrgm, assuming we haven't started seeing any other client ids in token.token.verify.expired logs, I think we're ready to switch expiry back on. We will see a elevated baseline rate of 401s on the oauth server, but this is expected behaviour. Fx desktop, Fx android, and fxa-content-server will all recover gracefully from 401s by issuing themselves a new token on demand.

@rfk rfk assigned jrgm and unassigned rfk Nov 27, 2015
@rfk
Copy link
Member

@rfk rfk commented Nov 27, 2015

(putting this back in next and assigning to @jrgm for the next action, which is to confirm that no new client ids have started showing up in the token.token.verify.expired logs).

@jrgm
Copy link
Contributor Author

@jrgm jrgm commented Nov 30, 2015

Current counts don't show any new client_id's using expired tokens, and these proportions:

88.50% 5882386c6d801776 - Firefox
0.01% ea3ca969f8c6bb0d - Firefox Accounts
11.49% 3332a18d142636cb - Fennec

@jrgm jrgm assigned rfk and unassigned jrgm Nov 30, 2015
@rfk
Copy link
Member

@rfk rfk commented Dec 1, 2015

Thanks @jrgm, sounds like we can go ahead with switching the expiry back on. I'm going to close this out.

@rfk rfk closed this Dec 1, 2015
@rfk rfk removed the waffle:now label Dec 1, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants