Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for API keys #994

Closed
edmorley opened this issue Mar 2, 2016 · 70 comments
Closed

Add support for API keys #994

edmorley opened this issue Mar 2, 2016 · 70 comments

Comments

@edmorley
Copy link

edmorley commented Mar 2, 2016

A scary number of people embed their PyPI username and password in their Travis config (using Travis encrypted variables), to enable automatic releases for certain branches (Travis even has a guide for it).

In addition, the packaging docs example encourages users to save their password in plaintext on disk in their .pypirc (they can of course use twine's password prompting, but I wonder how many read that far, rather than just copy the example verbatim?)

Whilst in an ideal world credentials of any form wouldn't be saved unencrypted to disk (or given to a third-party such as Travis) and instead users prompted every time - I don't think this is realistic in practice.

API keys would offer the following advantages:

  1. Higher-entropy credentials that are guaranteed to have not been reused on multiple sites.
  2. The ability to give the API key a smaller permissions scope than that of the owner's username/password. For example an API key would not be permitted to change a user's listed GPG key or in the future, their 2FA settings. Or an API key could be limited to a specific package.
  3. Since this would be separate from the existing username/password auth, a signing based approach (eg HMAC) could be used, without breaking older clients. This would ensure that if a connection was MiTMed (eg due to a protocol or client exploit), the API key itself would still remain secure.
  4. Eventually support could be dropped for the password field in .pypirc, leaving a much safer choice between password prompting every time, or creating an API key that could be saved to disk.
  5. If/when support is added for 2FA, users who need to automate PyPI uploads won't have to forgo 2FA for their whole account. They could instead choose to just create a 2FA-circumventing API key for just the one package that needs uploads in automation.

Many thanks :-)

(I've filed this against warehouse since I'm presuming this is beyond the scope of maintenance-only changes being made to the old PyPI codebase)

@dstufft
Copy link
Member

dstufft commented Mar 3, 2016

This is another thing I've been wanting to do, but is likely a post-launch task. I'm a bit on the fence of how to exactly handle them, but one option I've been thinking about is instead of API keys, using client certificates for TLS which would give built in support for a signing based approach, high entropy, allow it to be used for all uploads (could store it password protected typically, and just offer a password-less option for automation), and expiration of the token.

One problem with this is that it would mean we can't route uploads through our CDN, however uploads don't really gain anything by going through the CDN (and in fact, it's a bit harmful since uploads need a longer timeout than normal requests, we're forced to have high, 20+ second timeouts on upload).

I've also considered something like OAuth here instead of just an API Key which would solve a lot of these problems as well, in addition to making it possible to securely grant other projects the ability to modify just one package (or one scope inside of that package).

There's also the likely future signing tool, TUF, where we could just enforce that all uploads must be signed by a valid key for that author, and use that key as the authentication.

A lot of different options here, which is another reason why it's likely a post-launch task :)

@lukesneeringer
Copy link

I really really want to get my PyPI information out of CI. At the risk of responding to a years-old thread...I want to volunteer to do this work (as well as #996). :-)

At this point, Warehouse is launched (albeit in beta), and the legacy upload endpoint is deprecated. I assert it would be reasonable to add this, although others who have actually been thinking about this for more than a few days might know better than I (so feel free to chime in and tell me!).

Before reading @dstufft's comments above, my thought was, "Implement API keys", but I have nothing against the idea of certificates.

Here is what I think is needed (sub out "key" below with "certificate" if we go that route):

  • The ability to create a new key.
  • The ability to invalidate a key.
  • The ability to see when each existing key was last used.
  • The ability to scope a key's permissions to only be able to publish particular packages.

Would anyone have any objection to my taking some time to scope this out further, with an eye to getting the work in soon-ish? (Since I am new around here, it would probably require some review cycles from @ewdurbin, @dstufft, etc.)

@brainwane
Copy link
Contributor

Thanks for your note, @lukesneeringer, and sorry for the slow response! Thank you for volunteering to do this work!

As I think you know, but just for context for future folks finding this discussion, the folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.

So that's what Ernest, Dustin, and Nicole have been concentrating on and will be concentrating on for the next few months. But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.

Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.

Thanks again! Talk with you soon.

@lukesneeringer
Copy link

But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.

Sounds good.

My guess is that this is probably work that can be done in parallel to the Warehouse improvements. The trick would be that the keys would not work on legacy PyPI, and therefore anyone using the legacy URL would not be able to use them. (However, I suppose it might be the case that review cycles or whatnot would not be available.)

Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.

Yep -- we already did that. :-)

@brainwane brainwane added this to the 6. Post Legacy Shutdown milestone Feb 12, 2018
@brainwane
Copy link
Contributor

@lukesneeringer Oh great, glad you and Ernest have already started working together!

In our meeting today we said "yay" about you working on this! Please go ahead and start scoping it out and let us know your thought process as you work. I could imagine you finding the SimplySecure resources useful on a UX level.

We also decided that, as a new feature, this belongs in a future milestone. But we will do our level best to review your work as you have it!

Could I please ask you to also comment at #996 to mention there that you're working on it?

Also, will you be at the PyCon sprints?

@lukesneeringer
Copy link

That sounds good.
I currently plan to be at only day one of PyCon sprints, but I have not booked plane tickets yet, so that is mutable.

@brainwane
Copy link
Contributor

I'm going to be at all four days, and I think a number of other Python packaging/distribution developers will too. I think it'll likely be a good time to hash out architectural stuff and do some pair programming and in-person reviews. So if you could be there two or 2.5 days that would probably be of benefit.

@brainwane
Copy link
Contributor

@lukesneeringer how is this going? Do you have any plans or code that you'd like us to look at?

@lukesneeringer
Copy link

@brainwane Hi there; I have been on vacation. I will have a plan (and some code) for you to look at it on Friday. :-)

@lukesneeringer
Copy link

lukesneeringer commented Mar 9, 2018

@brainwane @ewdurbin et. al.

I have started doing research and have a minimal amount of code to paper, but want to bring in other voices and such at this point.

The API keys themselves

I assert that a new database model should be added to packaging which adds the API keys. My rationale for putting this in packaging rather than in accounts is simply because it is going to have a relationship to Project, and this avoids a circular import (or circular mental reasoning).

Key Contents

As far as the contents of the keys, I am learning toward using RSA keys, and having the interface essentially allow you to upload the public keys (meaning that initially the user will be responsible for creating said keys). The request would include a signature (signed with the private key) which is validated against the expected signature using the public key.

There are a few downsides to this approach: It puts the burden on the package maintainer to generate the key, is the big one. We could potentially later do what some other sites do where they provide generation, store the public key in the database, and give a forced one-time download of the private key. I think we should start with user generated keys, however, because it allows users to generate encrypted keys (and store the encryption key in CI).

Data Structure

I propose the following data model:

class AccessKey(db.Model):
    '''Access keys for project access separate from passwords.'''

    __tablename__ = "access_keys"

    # We a public key, and the client is responsible for signing
    # requests using the corresponding private key.
    public_key = Column(Text)

    # An access key must be attached to either a user or a project,
    # and may be attached to both.
    #
    # Attaching to a user limits the scope of the key to projects which
    # that user can access (at the time access is attempted, not when the
    # key is made). It is possible for this set of projects to be zero.
    #
    # Attaching to a project limits the scope of the key to that project.
    user = orm.relationship(
        User,
        backref="access_keys",
        lazy=False,
        nullable=True,
    )
    project = orm.relationship(
        Project,
        backref="access_keys",
        lazy=False,
        nullable=True,
    )

    expiry = Column(
        DateTime(timezone=False),
        nullable=True,
    )

    created = Column(
        DateTime(timezone=False),
        nullable=False,
        server_default=sql.func.now(),
    )

What is important here is the relationships with user and project -- essentially, a key can be attached to either or both. If attached to a user (and only a user), then uploads may be performed for anything that user can access at the time the upload is attempted. Keys attached to projects (and only a project) may be used for that project with no other authentication. Keys attached to both must meet both restrictions (this implies that the key could provide no privileges whatsoever should the user lose access to the project later).

Implementation

I assert that this would require an additional auth_policy class to be added, which would validate that the API key was sent. If user-based authentication was also performed, or if the key was tied to a specific user, it would return that user, otherwise it would return a stub user object.

Then, logic needs to be added to forklift/legacy.py, the file_upload function, to (a) short-circuit the user check at the top of the function, allowing logic to continue if an API key was provided, (b) extend request.has_permission to validate the API key for the project. Additionally, the "create a new project" would need to be short-circuited if the API keys did not positively identify a single user.

Finally, this would entail a change in setuptools to use keys when provided. Ideally, we would search for keys in certain directories (e.g. the project directory, ~/.pypa or its Windows equivalent, etc.) for a file with a specific naming convention, and use it if found.

Restrictions

The biggest restriction on this is that the API keys would only initially be usable for the upload functionality. (Presumably register could very shortly follow.)

Concerns

My biggest concern about this is the keys. Using RSA keys provides several useful benefits (passphrases, high entropy, etc.), but it also feels a good bit more complicated than what (for example) npm does. Other package managers just use direct API keys (which seems awfully insecure) or some less secure form of key-secret combo. One concern here is that if this is deemed too difficult to get set up, people may choose not to use it.

Another concern is "key collision". The idea here is to be able to have single package tokens, but most people work on lots of Python packages. Similarly, one might want a passphrase-based key to go in CI and a passphrase-less key to go on local disk. I think this sort of thing is solvable by being smart about naming and ordering. A potentially attractive idea is to actually look for project-specific keys in a subdirectory of the user's home directory before looking for a project-specific key in the project folder, then look for user-wide keys in the reverse order.

Conclusion

This is a writeup for the moment. I have the model written (a trivial task) and am going to start with the various pieces of plumbing written above. Feedback is definitely desired before I get too far into it.

@moshez
Copy link

moshez commented Mar 10, 2018

Looks awesome!

One question though --
What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)

@dstufft
Copy link
Member

dstufft commented Mar 10, 2018

I'm still digesting this, but I wanted to jot down my initial thoughts:

  • I am hesitant to have API keys that are not attached to users in some way. The permissions system in play should be able to support it, but I believe that there is likely going to be assumptions baked in throughout (and maybe even in external systems?) that a user is uploading the file not an API key. More important though, it provides a clear owner of the credentials so that if something were uploaded with them, the project knows who to go talk to, and acts as an auditing log to see who created the key.
  • I think that we need more detail about the specific threat model that we're going to be using when designing this system. What kind of attacks are we hoping to protect against? What capabilities can we assume the attacker has? It's hard to judge whether or not there's something of value being provided by the use of RSA keys over something simpler without a clearly defined threat model to judge possible solutions against.
  • Designing something that involves changes to other tooling (such as twine or setuptools) should ideally get some buy-in from the authors of those tools, and possibly discussion on distutils-sig (or at least a pointer mailed to distutils-sig about the discussion). One benefit of a simpler API key is that it could simply be piped through those other tools as a password, and thus wouldn't require wider agreement.

@lukesneeringer
Copy link

What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)

Sorry, I misspoke. I meant that they should upload a public key.

I am hesitant to have API keys that are not attached to users in some way.

Here is my rationale: often times, organizations want API keys that are independent of individual users. Essentially, a company does not want all of their keys to break because an individual user leaves, and most people do not make separate accounts on PyPI for work vs. personal.

A higher weight way to solve this problem would be to have explicit organizations to which credentials could be attached. Lower-weight version: Encourage the use of organization-level "users" -- but that has the downside of things like a single password that everyone shares, etc.

What kind of attacks are we hoping to protect against? What capabilities can we assume the attacker has? It's hard to judge whether or not there's something of value being provided by the use of RSA keys over something simpler without a clearly defined threat model to judge possible solutions against.

Ironically enough, I actually went for something heavier weight because of your thinking in the previous comment. :-)

Given that we have been storing passwords in plaintext since time immemorial, and given that most other package managers go for simpler solutions, there is a good chance that I am overthinking here. I think there are two primary concerns: (1) mistakenly leaked or otherwise woefully unsecured keys, and (2) sniffing. The proposal I am putting forward does basically nothing for (1) and effectively guards against (2).

Designing something that involves changes to other tooling (such as twine or setuptools) should ideally get some buy-in from the authors of those tools, and possibly discussion on distutils-sig (or at least a pointer mailed to distutils-sig about the discussion). One benefit of a simpler API key is that it could simply be piped through those other tools as a password, and thus wouldn't require wider agreement.

I would not have any issue with this approach; it would still improve on the status quo. (This is, of course, an inferior approach for the sniffing concern, but it has been fine for most other package managers to the best of my knowledge.)

@dstufft
Copy link
Member

dstufft commented Mar 13, 2018

I've been thinking about this a lot and I think I've come up with the start of a proposal for how to handle this.

To start off with, I think that a public key crypto scheme is generally overkill for this application. We don't have N untrusted parties that need to be able to verify the authenticity of a request, just a single trusted party (PyPI) which means there is little need to have a scheme that hides the credential from PyPI itself. A public key crypto scheme would prevent people who can intercept traffic from getting the upload credentials, however PyPI also mandates TLS which provides the same properties (and if you can break our TLS, you can also just upload your own key to the account of the user you wish to upload as).

I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.

So given that, I think the best path forward is to use some sort of bearer token authentication scheme. The simplest of these would just be a simple API key where Warehouse generates a high entropy secret and shares that with the user. However that has a number of drawbacks, such as:

  • Minting new API keys requires talking to Warehouse with some form of "root" credential for that account.
  • Any mechanism for limiting an API key would have to be built into Warehouse itself and would need to be stored in the database for each individual API key.

After thinking about this for a few days and talking it over with some folks who are much smarter than me, I think that the best path forward here is to use Macaroons. Macaroons are a form of bearer token, where Warehouse would mint a macaroon and pass it onto the user. In this regards they are similar to the simple API key design. Where macaroons get more powerful is that instead of baking things like "here is a list of projects that this macaroon is able to access" into the database, it is stored as part of the macaroon itself AND that given a macaroon, you can add additional "caveats" (like which projects it can access) and mint a new macaroon without ever talking to Warehouse.

This would allow a workflow like:

  1. User gets a Macaroon from Warehouse that is specific to their user and has access to all permissions and does not expire.
  2. User decides they want to utilize Travis to upload their "foobar" package, so they take their Macaroon and attach a new caveat, project: foobar to it, and mint a new Macaroon which does not expire and hands it to Travis.
  3. Travis wants to limit the ability for credentials they have to leak and be used persistently, so when their deployer code runs, instead of giving a token that is good ~forever, they attach a new caveat, expires: now() + 10 minutes and mint a new Macaroon that they pass into the deployer code as the password.
  4. Warehouse looks at the macaroon that the Travis deployer uploaded and then does:
    1. Sees which root key it used, looks that up from the DB (call this k0), does HMAC(k0,<initial data>) to get k1 (this is what was given to the user in step (1).
    2. Adds the caveats added in step (2) and does HMAC(k1, <new data>) to get k2 (this is what was given to Travis in step 2).
    3. Adds the caveat added in step (3) and does HMAC(k2, <new data>) to get k3 (this is what was sent to Warehouse by Travis in step (3).
    4. Verifies that the macaroon sent by Travis is the same thing we generated as k3.
    5. Iterates over all of the caveats added throughout all of the steps, and evaluates them to ensure that they all evaluate to True for this specific request.

I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.

We would need to figure out which caveats we want to support in Warehouse (even though anyone can add caveats, the caveats have to be supported by Warehouse so you can't add arbitrary ones). Off the top of my head I can think of the following (naming can be adjusted):

  • not-before: An object representing at what point this macaroon becomes valid. Ideally some serialization of a UTC datetime (omitting means it is always valid).
  • expires: An object representing when this macaroon expires. This would ideally be some serialization of a UTC datetime I think (omitting means it never expires).
  • permissions: A list of permissions that this macaroon has been scoped to (omitting means all permissions, an empty list would be no permissions). Exposing this would likely require auditing the names of our permissions in Warehouse so that they make sense as a public API since currently they're just internal details. It would also require auditing the existing permissions to make sure that the scope of them is not too wide or too narrow.
  • resources?: A list of resources that this macaroon is scoped too. This is one where I'm the least sure of how to handle it, so ignore the name for now. Basically we would want a way to allow a macaroon to be scoped to specific resources. I think ideally it would support being scoped to multiple resources at a time, possible of disparate types (off hand the kinds I could think being useful are Project, Release, File, User, maybe more in the future?). These might be best as top level caveats like users: [], releases: [], etc. Will require work to figure out the best way to represent this.

One caveat to the above, is that we likely should require both a not-before and a expires or neither. Having one or the other should likely be an error. The general idea is that in order for expiration like that to work as expected, the clocks of the system adding the expiration caveats and the clock of the system verifying them have to be relatively in sync. If someone sets an expires: now() + 10 minutes but their clock is accidentally set to 10 years in the future, they're going to get a macaroon that is valid for 10 years 10 minutes, instead of just 10 minutes-- and worse it's going to be a silent error where it just appears to work. However if we require both fields to either be present or absent, then they'd only be able to generate a macaroon that is valid for 10 minutes, in 10 years which would get rejected immediately and they'd be able to determine something is wrong.

Internal to Warehouse, we'd need a table that would store all of the initial root keys (k0 in the above). Each of these keys would essentially be a tuple of identifier, key that the verification code could then look up. We would never expose the k0 to end users. We might also want to store any initial caveats that were added to the macaroon given to the user so that we can give the user a list of their macaroons as well as the caveats that were attached to each. Obviously we'd also need a UI for managing the macroons that a user has, with the ability to delete them (which would effectively revoke them and make any sub-macaroon worthless) as well as create them, ideally with a UI to add caveats to the initial macaroon (even though the system will support end users adding caveats on their own without talking to Warehouse, a UI in Warehouse will be a lot more user friendly for the common case).

There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to. Essentially scoping the macaroon to just that user (not the same as scoping it to a specific user in the resources above, but saying "this macaroon can effectively act as $USER"). While the first cut may always add that, and may have a database column that always links a macaroon to a specific user, adding that caveat to start with means that in the future, if we want to have macaroons that are not tied to a specific user, we will be able to do that without accidentally granting super privileges to all existing macaroons.

What do you think?

@dstufft
Copy link
Member

dstufft commented Mar 13, 2018

One nice thing about the above idea, is that end users can completely ignore the extra power granted by Macaroons if they want. They can simply treat it as an API key, generate a Macaroon via Warehouse, pass it to Twine as the password for their account and go on with life without giving it a second thought. Of course someone who wants to unlock additional power for delegation or resource constraining can opt into that and can utilize the additional benefits provided by Macaroons. The design of Macaroons is fairly brilliant in this aspect where it's very much a low cost abstraction for the basic use case, but enables you to delve deeper into it to unlock a lot more power.

In the hypothetical Travis example, the end user might not even be aware that Travis is adding additional caveats to their Macaroon (or that it's even possible to do that). It could easily be presented to them as nothing more than adding the API key to Travis, with Travis doing the extra expiration stuff completely behind the scenes for the benefit of the user.

I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire. While we generally trust TLS to protect these credentials, automatic scope limitation like that is basically zero cost and provides defense in depth so that in the case someone is able to look into the TLS stream (for example, a company MITM proxy) the credentials they get are practically useless once they've been used once.

@moshez
Copy link

moshez commented Mar 13, 2018

To extend what @dstufft said -- we can have a constraint be "upload file with hash " which means replay attacks (short of pre-image attacks) are useless. If we further have twine auto-attenuate with this + short time frame it means future pre-image attacks are useless too, there better be a pre-image attack ready now.

This means that while, of course, we all love TLS a lot, with this in place, TLS would not be needed for security -- even complete breakage of TLS would not allow someone to upload a package with malicious code.

@lukesneeringer
Copy link

lukesneeringer commented Mar 14, 2018

I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.

I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.

I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.

I like this idea too. +1.

There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to.

I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.

I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.

I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire.

This is definitely something that would be easy and valuable to do. It gives you the value of request signing, effectively.


I am sold on this. I will get an implementation of this in soon. Also, thanks @dstufft for teaching me about Macaroons. That is really valuable.

@dstufft
Copy link
Member

dstufft commented Mar 14, 2018

@lukesneeringer I should mention that after talking through this more with people, I think that the right implementation would look something like:

  • Add a configuration value for the root key, this would need to include an identifier as well as the key. If we ever lose this key we'd revoke all macaroons by changing this setting.
  • Add another caveat to the macaroons which is basically just some opaque, unique value (it could be just a uuid, or some random from os.urandom()) and store that value in the database. The caveat would then effectively be id: <opaque value> and it would look up the opaque value from the database table and if it doesn't exist, then the macaroon is not valid. That allows us (and users) to revoke a single macaroon "tree" without having to revoke the entire set of macaroons. This scheme also allows us to not need to store root keys in the database.

Some replies to your comments:

I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.

We need a number of the values out of the macaroons in order to construct what the signing key should be in this hypothetical signing algorithm. It'd possible we could do something where we rip enough stuff out of the macaroon format so that caveats are still sent along with the request, but not the actual HMAC signatures, so that the server could still construct what the expected signing key is. I'm not sure that would be worth the effort.

I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.

I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.

Basically the user caveat is how you say "This macaroon (and by nature, all macaroons created from this macroon) are scoped to only resources that X user has access to and acts as if it were X user. The reason this is a caveat instead of just a column in the database (although it likely should be one of those too) is to keep our options open in the future, so that we can potentially start creating macaroons without that caveat (perhaps with a resources: Project(foobar) caveat) without accidentally upgrading all previous macaroons to unscoped.

So to start out with, we'd always include that acts-as-user: caveat, but maybe in the future we don't and everything works fine.

@dstufft
Copy link
Member

dstufft commented Mar 14, 2018

Oh, and the opaque, unique value would also give us something we can enumerate to display a list of macaroons owned by a user (to allow them to delete/revoke unused ones) and allows us to do things like record in the database whenever they are used, so we can display the last time each one was used too.

@lukesneeringer
Copy link

lukesneeringer commented Mar 15, 2018

All this sounds good. Also, apologies, I was replying inline before I read the entire post, so my first quote above is less relevant than I thought. If twine ever makes the addition you recommend to add the date range, it accomplishes the same thing as signing would.

@webknjaz
Copy link
Member

@brainwane filed a bug: #6262

@brainwane
Copy link
Contributor

Upload-only API tokens (both user-scoped and project-scoped) are now in beta on PyPI and Test PyPI! Our update on Discourse is at https://discuss.python.org/t/pypi-security-work-multifactor-auth-progress-help-needed/1042/31 .

Uploading with an API token is currently optional but encouraged; in the future, PyPI will set and enforce a policy requiring users with two-factor authentication enabled to use API tokens to upload (rather than just their password sans second factor). Once the beta period for API tokens is complete, we will make a launch announcement on the pypi-announce mailing list, and start to notify project maintainers and owners of the upcoming policy change. Then, after a suitable waiting period, we will begin to enforce this restriction, and include a notice in the error message returned to clients.

@davidism
Copy link

davidism commented Jul 29, 2019

Is there any chance of adding 2FA support for uploads, as opposed to only accepting tokens? Seems like 2FA should be supported and preferred for dev machines. Storing API tokens locally doesn't seem any more secure than storing the username and password locally in that case.

@moshez
Copy link

moshez commented Jul 29, 2019

"more secure" depends, of course, on your threat model.

One common problem with storing secrets locally is that they are available to any future application that runs as the user (any current application that runs as the user is a threat for 2FA-based systems, since it can directly hijack the session). However, this threat can be mitigated by only storing short-lived tokens. The Macaroon system we are implement allows adding such validity caveats to tokens before storing them. For example, you could create a token valid for 5 minutes before each upload.

In addition, it is also expected and straight-forward to invalidate API tokens through the UI.

Can you indicate what threat model you think 2FA for uploads solves that short-lived tokens do not?

@Carreau
Copy link
Contributor

Carreau commented Jul 29, 2019

PyPI will set and enforce a policy requiring users with two-factor authentication enabled to use API tokens to upload

Are there any discussions as to where how the project-name:api-token mapping should be stored ?
Typically to tell twine which token it should use ? In CI it's easy with env variable; not so much on dev machines which may release multiple projects.

Or do you just expect devs to use token with the same scope as the user ?

@davidism
Copy link

davidism commented Jul 29, 2019

So you're saying the workflow to upload a package from a dev machine would be to log in to pypi with 2FA, get a short lived token, and tell twine about it? Why not just cut out the middle step and tell twine about 2FA?

@ThiefMaster
Copy link

That sounds like a terrible idea to me - I really do not want to have to involve a browser (or enter my password in a CLI which would be worse since I generally do not know my passwords but generate them from a master password or randomly (and then store them in a password manager)) to publish packages. I think there are two usecases here:

  • publishing from CI: For this you want a token that is long-lived (possibly no expiration), and does not require 2FA
  • publishing from a developer's machine: This is where 2FA makes sense!

When publishing to npm right now, everything works straightforward: I run npm publish, it tries to login with the access token stored locally, and because I have 2FA enabled it asks me for a 2FA token, then retries to publish using the access token and the 2FA token. This not only adds extra security but also prevents accidentally publishing something, since every time you publish you need to enter a 2FA token!

So ideally, I'd like to have the same behavior with pypi/twine. I wouldn't mind if it was internally using a long-lived token that requires 2FA in addition and created a short-lived token to do the actual publish. This would actually be convenient for cases where you publish multiple packages in one go so you don't need to enter multiple tokens (and even allow reuse of a TOTP which is probably a bad idea).

@fschulze
Copy link

This could be implemented by adding a 2fa caveat. In the UI one would create a token with a 2fa requirement (maybe with a simple checkbox). Then twine would see that token (the macaroons can easily be inspected), ask the user for the 2fa code, add it to the existing token and send that to pypi. The new token would expire automatically when TOTP is used and the token from the UI would be useless without the 2fa code.

I think this sounds sensible, useful and relatively straight forward to implement. Unless I overlooked something fundamental.

@pganssle
Copy link
Contributor

One thing I would like to throw into the mix here with regards to 2FA for uploads is that #726 proposes a "two phase" upload, where packages are uploaded from the command line and (optionally) go into a "staging area" before they become available to the public and immutable. I will find that enormously useful for other reasons, but it also adds another workflow where the final upload is necessarily gated by 2FA - uploads into the staging area would be possible using the upload API key but none of that would actually be published without logging in with the 2FA key.

Obviously that doesn't help anyone who deliberately wants to avoid using the browser as any part of their upload workflow, but it may be more convenient than the "get a short-term key from the browser and paste it into twine" workflow.

@woodruffw
Copy link
Member

Just 0.02c from the implementation side: yes, I think the right way to do this would be with an additional 2FA caveat as @fschulze proposed. That's out of scope for the current work (and would involve changes to twine and other uploaders), but wouldn't be too difficult to implement.

OTOH, single-use and/or time-scoped tokens (as proposed by @moshez) that require a second factor for minting would provide similar security properties and potentially be less invasive for automatic deployments.

@webknjaz
Copy link
Member

@Carreau does it really make sense to have multiple tokens on dev machine? If yes, you could have multiple "repository" entries, one per each token/project. If no (better DX) — just use a user-wise token. There's been some discussion @ https://twitter.com/Ewjoachim/status/1154479563419869184

@Carreau
Copy link
Contributor

Carreau commented Jul 29, 2019

does it really make sense to have multiple tokens on dev machine

For me yes it does. I only want my work machine to be able to publish some packages; and vice versa for my home machine. Also tokens are "upload only" (or are going to be); so I can keep my password safer.

For now I'm good with a custom solution but would love for an agreed upon way of doing it before various incompatible solutions emerge.

@brainwane
Copy link
Contributor

Heads-up for people trying the beta of uploading with API tokens:

@graingert
Copy link
Contributor

graingert commented Aug 1, 2019

Heads-up for people trying the beta of uploading with API tokens:

That's ok, as long as you don't change existing tokens

@brainwane
Copy link
Contributor

@graingert I'm sorry, but yes, we will probably be making so that tokens you have already created do not work. As the manager on this project I'm comfortable making that choice during this beta, since we have warned people that there was a chance this would need to happen during the beta. To quote @ewdurbin in #6287 (comment) ,

We know who have provisioned API tokens and can email them to give them a headsup 24 hours before disabling the older grammar.

@ewdurbin
Copy link
Member

ewdurbin commented Aug 5, 2019

We have updated the token username and prefix in #6342.

username: @token => __token__
password/token: pypi:<base64 token body> => pypi-<base64 token body>

These changes should alleviate the need for escaping heroics.

The previous format will continue to work for now, but users will be notified to update their configurations to match the new syntax before the beta period is over.

@brainwane
Copy link
Contributor

@takluyver asked:

Is there any plan for an API to create upload tokens? E.g. I'd like to have a command-line tool prompt me once for my password & 2FA code, then obtain and store a project-scoped token to use for uploads.

Sorry for moving your comment here, @takluyver, but I want to keep this issue focused on API keys and that issue on the rollout!

We don't have a specific plan for that API feature yet, no. I filed your request as #6396.

@brainwane
Copy link
Contributor

We've rolled out scoped API tokens for package upload on PyPI. It is in beta, and #5661 is a meta-issue where we are tracking its rollout and getting the last few items fixed before ending the beta, and the policy changes (requiring API token usage for some users) we'll make after that.

We've now implemented all the items in this API token checklist. Some features are out of scope for our current funding:

So, per agreement with other maintainers in that meeting, I'm closing this issue.

Please enjoy upload API tokens on Warehouse, and file new issues to request new API key-related features. Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.