Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateful Authentication #197

Open
HassanAbouelela opened this issue Aug 5, 2022 · 0 comments
Open

Stateful Authentication #197

HassanAbouelela opened this issue Aug 5, 2022 · 0 comments

Comments

@HassanAbouelela
Copy link
Member

Current Auth System

Currently, the authorization system on forms is broken into two portions, Discord OAuth, and a stateless JWT user token. The log-in process starts on the frontend, where:

  1. Users click login
  2. They get sent to discord to authenticate
  3. The frontend callback URL is called with a discord access token, which is
  4. Sent to the backend, to fetch discord data, and create the JWT.
  5. The JWT is returned to the frontend, in a Set-Cookie header, which creates a cookie that is not visible to the frontend
  6. The frontend tracks state by saving a less secure cookie, which is set when the backend returns a 200 code when authorizing

On top of the typical auth flow, there are a few more processes:

  • The frontend schedules a refresh, which uses a backend API and the user's discord refresh token to keep them logged in
  • The backend will occasionally re-authenticate and re-fetch the discord data invisibly in portions where having the most up-to-date information is important. One place where this is done is submission.
  • The backend will also clear the cookie invisibly and fail a request if it fails to authorize

Issues

The current system is disorganized and spread out. There is a lot of responsibility on the frontend which is challenging to fulfill, and the different cookies and steps can sometimes lead to desyncs, which cause difficult errors. The desync issue was especially prevalent during the code jam, where a lot of forms were created, and users were logging in frequently enough that all portions of the system were firing at various times. Some stop gap solutions were implemented for the desync issue, such as clearing auth state when any error is encountered, but it still makes for a bad user experience. This is especially true since it only happens after a user has spent time filling out a form, and (until python-discord/forms-frontend#418 is implemented) means all the data is gone.

There are other issues with this setup as well, including:

  • The frontend being responsible for refreshes is messy to schedule, and often users will not have the site open long enough for that to work. Updating user information can only happen when a user triggers a request since we don't keep their access token on hand.
  • Being completely stateless, we've had issues with leaked tokens, and being unable to invalidate or block users. This is partially solvable by creating a unique key for every token, and keeping a table for leaked tokens, and another for banned users, but we lose the benefits of stateless authentication.
  • It's a lot more difficult to set up some other desired features - such as bot tokens - since it would effectively require two completely separate systems going on at the same time (the current stateless JWTs, and a separate stateful authorization system which is not tied to Discord OAuth).

Proposed Solution

One proposed solution is to switch to a stateful authentication model (still could use JWTs, but other session tracking methods are on the table). It alleviates some of the issues mentioned above:

  • The backend will shoulder all responsibilities. Only one token needs to exist, and it does not need to be updated to insert new user data. There is nothing to desync anymore.
  • Refreshing discord access tokens can be done server-side with much more ease, and the data can be updated in real time instead of waiting for a user to make a request. This is very beneficial for keeping track of permissions, updating access, displaying more accurate information about historical submissions, and so on.
  • We can easily revoke, modify, and delete user tokens
  • In terms of performance, it shouldn't be any worse than what we currently have. Despite using a stateless token, the system is actually a blend of the worst elements of both, with database look-ups on every request, some external network IO with Discord in certain circumstances, etc.

What this won't solve of the issues described:

  • We still need to do frontend refreshes of the forms token every x amount of days for security purposes, so some amount of mess is still to be expected.
  • There are still going to be two systems, one for normal users, and another for bot accounts. They should be a lot more similar, if not the same, in this implementation though.
  • We still need to more gracefully handle when an auth issue occurs. This should hopefully be fixed by the redux issue.

Challenges & Considerations

  • If the backend maintains access to a user's account, even after they stopped using the site, then it would be user-friendly to allow them to delete their refresh token from the backend. This means another route and a frontend feature
  • We would need a scheduler to keep all the discord tokens alive, and keep our database up to date. We use a multi-worker model running on multiple kubernetes pods, which makes it difficult to properly communicate and delegate a scheduler. I see two possible workarounds here:
    • Properly implement support through kubernetes features. Whether this is a sidecar, or proper delegation, or a standalone pod. This is not a very flexible solution, and would need to be reworked if we moved away from kubernetes, and can be difficult to use in developer environments.
    • Do locking, queuing, and delegation through a shared resource. The one which would be available in all cases in our setup would probably be the database.
  • This last one is a more general issue, but one you might encounter when dealing with the change. All our data is stored in NoSQL databases, parsed into pydantic models, and validated by spectree. Making any modifications to the model structure requires figuring out how to handle old data which would no longer follow the models. We also lose out on some benefits of relational databases like foreign keys.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant