-
Notifications
You must be signed in to change notification settings - Fork 1k
GitHub token scanning api view #7124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @ewjoachim https://github.com/pypa/warehouse/blob/master/warehouse/email/ses/views.py has a lot of good reference that may help address many of your TODOs. |
Ah, I'll look at this, thank you a lot. |
I'm not sure I understand why the |
0fcdcd3
to
0cda935
Compare
Ok, it still misses tests & docs, but it should start being readable. I'm interested in pointers for the remaining todos, feedback if my code style was right (escpecially if I need to redo large parts of the code, before I start writing all the tests) and maybe wording advice for the email. |
0cda935
to
96e496e
Compare
Services are for things where you might realistically want to replace the implementation with something else at runtime. So the code that interacts with However something like... where we store file objects is something that we might absolutely want to swap at runtime (and we do, we use different storage engines at runtime versus at development time) so it makes perfect sense for that to be a service. Another example is the HIBP code, one option when we were developing that was taking the HIBP data and running our own copy locally, using a service let us make that a more feasible option with minimal changes to the code if we ever decide to go that route. |
It feels like we could have a reason to swap the fetching and signature verification part at run or test time (for example for a version that would not contact github and work with local keys and key ids). On the other hand, the token check, revocation and user warning code seems pretty much straightfoward. Is that to say I should have implemented this a a 3 parts:
? Side note: I've tried to implement part 3 completely independently from GitHub, with the idea that we may want to have revocation protocols from other services (google ? manual with a rate limit ? Who knows ?), but whatever the origin of the disclosure is, it doesn't mean the implementation would change, so this would still say "not a service" if I understand correctly. So, shall I change it all ? |
96e496e
to
ca0700a
Compare
I've done that. New iteration, far less |
ca0700a
to
13c0c02
Compare
warehouse/accounts/utils.py
Outdated
|
||
macaroon_service = self._request.find_service(IMacaroonService, context=None) | ||
|
||
database_macaroon = macaroon_service.find_macaroon_from_raw(raw_macaroon=token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG ! I nearly missed that I need to somehow verify that the signature on that macaroon is correct, otherwise I'm just going to check that there is a macaroon with this ID in the DB.
It's probably not that bad, given I'd need to guess an existing UUID, but... Still...
2fd98d6
to
f93ffdf
Compare
3aae784
to
0c9a791
Compare
Missing coverage:
Still a few open questions for which I'll need someone's help though (see in the PR message at the top) |
0d6b49a
to
183bfb7
Compare
Ok, I'm ready for review. I've been told the PR is likely too big to be easily review, which I would totally understand, but then I need suggestion on how to break it up into smaller ones. Also, there are still unanswered questions regarding cache, logging and Pyramid's header predicate (see at the top in the todo-list) |
183bfb7
to
44f2e67
Compare
7f0b52d
to
e4b4a46
Compare
ok, |
Thanks @ewjoachim for all your work here! |
* GitHub Token scanning view (un-revert) (See #7124) * Don't check for macaroon signature * attr was removed * Code review on help page & email * Apply suggestions from code review Co-authored-by: Dustin Ingram <di@users.noreply.github.com> * Translations * Remove Base64BasicAuthTokenLeakMatcher, probably overkill * Renaming test * Fix a logic error generator was never consumed so cache was never written * MacaroonService.find_from_raw should raise on error, not return None * Payload data is already bytes * Tasks need to be called with task wrapper * find_from_raw misses return * Fix cache: needs to actually persist between calls * No task parameter if bind=False * Celery-based request objects need timers too * Fake Celery requests need a fake IP too In the future we probably want to make the UserEvent's ip_address field nullable instead * Add security log * Translations * Wording * Fix test * Remove intermediate function time_ms Co-authored-by: Dustin Ingram <di@users.noreply.github.com> Co-authored-by: Dustin Ingram <di@users.noreply.github.com>
Fixes #6051
High level description to ease review:
(Assuming you're familiar with the ticket).
The view
In this PR, we create a view (
github_disclose_token
) which:This view uses 2 main components described below.
GitHubTokenScanningPayloadVerifyService
This service validates that the call originates from GitHub by doing the following:
TokenLeakAnalyzer
This part is voluntarily independant from GitHub
The request JSON payload contains leaked tokens to analyse. Each token might either be a string of the form
pypi-xxx
or a b64 encoded form of__token__:pypi-xxx
.The rest
check_if_macaroon_exists
in the macaroon service (& associated tests)TODO: