Skip to content

Conversation

ewjoachim
Copy link
Contributor

@ewjoachim ewjoachim commented Dec 19, 2019

Fixes #6051

High level description to ease review:

(Assuming you're familiar with the ticket).

The view

In this PR, we create a view (github_disclose_token) which:

  • Is a json POST API view meant to be called by GitHub
  • Returns either 204 no content or 40x and a json error payload
  • Expects a json body and 2 custom headers

This view uses 2 main components described below.

GitHubTokenScanningPayloadVerifyService

This service validates that the call originates from GitHub by doing the following:

  • A request to GitHub meta API & parsing the result to retrieve a public key, given a fingerprint
  • Check that the provided signature of the json payload of the request is valid given this public key

TokenLeakAnalyzer

This part is voluntarily independant from GitHub
The request JSON payload contains leaked tokens to analyse. Each token might either be a string of the form pypi-xxx or a b64 encoded form of __token__:pypi-xxx.

  • We extract the token from whichever form it is
  • We check if the token exists in the DB and it's signature is correct
  • If so we invalidate it
  • And send a mail to the user

The rest

  • We needed a new function check_if_macaroon_exists in the macaroon service (& associated tests)
  • There's a new email function send_password_compromised_email_leak and the associated email
  • There's a FAQ page

TODO:

  • Get some answers regarding:
    • How to cache things?
    • Do we have a github token, and if not, would we plan to get one to avoid IP based rate limiting? Well, I've implemented it, but it's optionnal
    • When things go bad, how should we log?
    • How to check multiple headers in the view config predicates?
  • tests:
    • GitHubTokenScanningPayloadVerifyService
    • NullGitHubTokenScanningPayloadVerifyService
    • check_if_macaroon_exists
    • TokenLeakAnalyzer
    • github_disclose_token view
    • send_password_compromised_email_leak
  • FAQ in the doc
  • Get an answer from GitHub and implement the signature crypto check
  • Test that it works on the dev env

@ewdurbin
Copy link
Member

ewdurbin commented Dec 19, 2019

Hi @ewjoachim https://github.com/pypa/warehouse/blob/master/warehouse/email/ses/views.py has a lot of good reference that may help address many of your TODOs.

@ewjoachim
Copy link
Contributor Author

Ah, I'll look at this, thank you a lot.

@ewjoachim
Copy link
Contributor Author

I'm not sure I understand why the MessageVerifier is implemented as a class in /utils rather than as a service. I believe the architecture of what I'm trying to achieve is indeed very similar, so I'm enclined to do the same with a GitHubTokenScanningRequestVerifier, but I'm not sure I really understand why this should not be a service ?

@ewjoachim ewjoachim force-pushed the token-scanning-6051 branch 2 times, most recently from 0fcdcd3 to 0cda935 Compare December 22, 2019 15:14
@ewjoachim
Copy link
Contributor Author

Ok, it still misses tests & docs, but it should start being readable.

I'm interested in pointers for the remaining todos, feedback if my code style was right (escpecially if I need to redo large parts of the code, before I start writing all the tests) and maybe wording advice for the email.

@dstufft
Copy link
Member

dstufft commented Dec 22, 2019

Services are for things where you might realistically want to replace the implementation with something else at runtime. So the code that interacts with MessageVerifier is never going to want to use anything but a MessageVerifier, it makes no sense to make it swapable because that code is so directly tied to SES that there's not really a second implementation that makes sense.

However something like... where we store file objects is something that we might absolutely want to swap at runtime (and we do, we use different storage engines at runtime versus at development time) so it makes perfect sense for that to be a service. Another example is the HIBP code, one option when we were developing that was taking the HIBP data and running our own copy locally, using a service let us make that a more feasible option with minimal changes to the code if we ever decide to go that route.

@ewjoachim
Copy link
Contributor Author

It feels like we could have a reason to swap the fetching and signature verification part at run or test time (for example for a version that would not contact github and work with local keys and key ids).

On the other hand, the token check, revocation and user warning code seems pretty much straightfoward.

Is that to say I should have implemented this a a 3 parts:

  • Payload parsing & basic checks, not as a service
  • Signature check (including contacting github to check the public key), as a service
  • Examining tokens and revoking the valid ones, not as a service

?

Side note: I've tried to implement part 3 completely independently from GitHub, with the idea that we may want to have revocation protocols from other services (google ? manual with a rate limit ? Who knows ?), but whatever the origin of the disclosure is, it doesn't mean the implementation would change, so this would still say "not a service" if I understand correctly.

So, shall I change it all ?

@ewjoachim
Copy link
Contributor Author

So, shall I change it all ?

I've done that.

New iteration, far less # TODOs. Next step, get an answer from GitHub, and write some tests.

@ewjoachim ewjoachim force-pushed the token-scanning-6051 branch from ca0700a to 13c0c02 Compare January 5, 2020 10:45

macaroon_service = self._request.find_service(IMacaroonService, context=None)

database_macaroon = macaroon_service.find_macaroon_from_raw(raw_macaroon=token)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG ! I nearly missed that I need to somehow verify that the signature on that macaroon is correct, otherwise I'm just going to check that there is a macaroon with this ID in the DB.
It's probably not that bad, given I'd need to guess an existing UUID, but... Still...

@ewjoachim ewjoachim force-pushed the token-scanning-6051 branch 2 times, most recently from 2fd98d6 to f93ffdf Compare January 8, 2020 23:50
@ewjoachim ewjoachim force-pushed the token-scanning-6051 branch from 3aae784 to 0c9a791 Compare February 5, 2020 23:44
@ewjoachim
Copy link
Contributor Author

Missing coverage:

warehouse/accounts/services.py                      335      1     88      0    99%   805
warehouse/accounts/utils.py                          67     39      8      0    40%   37, 43-47, 82-104, 111, 115-146
warehouse/accounts/views.py                         284     19     90      0    94%   731-765
warehouse/email/__init__.py                          72      1     10      0    99%   174

Still a few open questions for which I'll need someone's help though (see in the PR message at the top)

@ewjoachim ewjoachim force-pushed the token-scanning-6051 branch 7 times, most recently from 0d6b49a to 183bfb7 Compare February 9, 2020 01:49
@ewjoachim ewjoachim marked this pull request as ready for review February 9, 2020 01:50
@ewjoachim ewjoachim changed the title WIP - GitHub token scanning api view GitHub token scanning api view Feb 9, 2020
@ewjoachim
Copy link
Contributor Author

Ok, I'm ready for review. I've been told the PR is likely too big to be easily review, which I would totally understand, but then I need suggestion on how to break it up into smaller ones.

Also, there are still unanswered questions regarding cache, logging and Pyramid's header predicate (see at the top in the todo-list)

@ewjoachim
Copy link
Contributor Author

Rebased, fixed conflict.

@ewdurbin / @dstufft Do you think you might have some time to make a first round of review ? Is there something I can do to make it easier to you ?

@ewjoachim
Copy link
Contributor Author

ok, make translations does not create diff, neither does make reformat. The api token is now optional.

@di di merged commit af44261 into pypi:master Sep 14, 2020
@di
Copy link
Member

di commented Sep 14, 2020

Thanks @ewjoachim for all your work here!

di added a commit that referenced this pull request Sep 15, 2020
di added a commit that referenced this pull request Sep 15, 2020
@ewjoachim ewjoachim deleted the token-scanning-6051 branch September 15, 2020 07:17
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Sep 15, 2020
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Nov 11, 2020
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Nov 18, 2020
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Dec 11, 2020
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Mar 15, 2021
ewjoachim pushed a commit to ewjoachim/warehouse that referenced this pull request Mar 16, 2021
ewdurbin pushed a commit that referenced this pull request Mar 22, 2021
* GitHub Token scanning view (un-revert)

(See #7124)

* Don't check for macaroon signature

* attr was removed

* Code review on help page & email

* Apply suggestions from code review

Co-authored-by: Dustin Ingram <di@users.noreply.github.com>

* Translations

* Remove Base64BasicAuthTokenLeakMatcher, probably overkill

* Renaming test

* Fix a logic error

generator was never consumed so cache was never written

* MacaroonService.find_from_raw should raise on error, not return None

* Payload data is already bytes

* Tasks need to be called with task wrapper

* find_from_raw misses return

* Fix cache: needs to actually persist between calls

* No task parameter if bind=False

* Celery-based request objects need timers too

* Fake Celery requests need a fake IP too

In the future we probably want to make the UserEvent's ip_address field nullable instead

* Add security log

* Translations

* Wording

* Fix test

* Remove intermediate function time_ms

Co-authored-by: Dustin Ingram <di@users.noreply.github.com>

Co-authored-by: Dustin Ingram <di@users.noreply.github.com>
@di di mentioned this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitHub Token Scanning Partnership
6 participants