Implement "hook" support for package signature verification. #1035

Closed
nejucomo opened this Issue Jul 8, 2013 · 52 comments

Comments

Projects
None yet
@nejucomo

nejucomo commented Jul 8, 2013

Synopsis

Some people want package signature verification during their pip installs. Other people think relying on authenticated package repository connections (such as over TLS) is sufficient for their needs.

Of those who want package signature verification, there is disagreement about how to tell PIP which signatures to trust (and how users will manage package signing public keys).

Rationale

The rationale for this ticket is to provide a mechanism in mainline pip for signature verification enthusiasts to experiment with different approaches. If a particular approach becomes popular, pip could consider incorporating that particular approach.

In the meantime, rather than have endless committee-style-arguments about how to do package verification, we should have a system that lets users choose for themselves, but only if they opt in.

Also, it keeps package verification cleanly separate from the pip codebase.

Criteria

This ticket may be marked as wontfix, or some other status to indicate that the pip developers reject this proposal.

This ticket may be marked closed, only when these conditions are met:

  • A user can configure an arbitrary "hook" to process downloaded packages prior to proceeding.
    • By default, this hook is configured to nothing, and pip with this feature behaves like pip without this feature.
    • A hook provides an interface which:
      • takes two inputs:
        1. a path to a local file, which is a newly downloaded package
        2. a URL which the file was retrieved from
      • and returns a single boolean indicating "accept/reject".
    • When a user has a hook configured, it is invoked on every downloaded package file before any other processing. If it returns true, pip proceeds as normal. If it returns false, pip logs an error and exits.

Implementation Details

I prefer a hook api where the config specifies a path to an executable in pip's config file. The inputs are passed as commandline arguments to a subprocess which invokes that command. The hook's stdout & stderr are the same as the parent pip process. The exit status is 0 to indicate "accept package" and non-zero to indicate "reject package".

-but I'd be happy with any system that fulfills the Criteria above.

Related Issues

Note, there is a less-well-specified ticket #425. I made this ticket because the vagueness of that ticket makes it difficult to close. (Is #425 satisfied by TLS authentication to package repositories based on a standard OS or user trust root? Does it imply or require package signature verification?)

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Jul 8, 2013

Member

Thanks for this ticket!

Immediate thoughts are that I think package signing at this stage in the game is premature as there are other avenues of very serious attacks available... However the proposed system is not really related strictly to signing. It could also be used to implement something like https://pypi.python.org/pypi/peep. So personally I'm going to think about this ticket a little bit before hand to figure out if I believe it's going to provide a useful feature without serious short comings in the near term.

Member

dstufft commented Jul 8, 2013

Thanks for this ticket!

Immediate thoughts are that I think package signing at this stage in the game is premature as there are other avenues of very serious attacks available... However the proposed system is not really related strictly to signing. It could also be used to implement something like https://pypi.python.org/pypi/peep. So personally I'm going to think about this ticket a little bit before hand to figure out if I believe it's going to provide a useful feature without serious short comings in the near term.

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Jul 8, 2013

Member

This has another useful purpose too, companies or organizations could use it to disallow installing items that haven't been through a security audit or license review or what have you. For instance OpenStack could potentially use it to help ensure that an unapproved dependency isn't added.

Member

dstufft commented Jul 8, 2013

This has another useful purpose too, companies or organizations could use it to disallow installing items that haven't been through a security audit or license review or what have you. For instance OpenStack could potentially use it to help ensure that an unapproved dependency isn't added.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 9, 2013

A syntax like the following would be convenient:

pip install --verify-<sig> -e git+https://github.com/pypa/pip#egg=pip

...

These may be helpful for creating documentation on this feature and how it relates to other components of a secure python packaging process:

Source Repository GPG

Python Package GPG (./<package>.asc)

Python Wheel JWS S/MIME (PEP 427)

Index Mirror DSA (PEP 381)

Package Signatures for .deb, .rpm, ...

Python Package Configuration Management Systems

[Cryptographic] Hash Functions

seeAlso: #425 (this comment)

A syntax like the following would be convenient:

pip install --verify-<sig> -e git+https://github.com/pypa/pip#egg=pip

...

These may be helpful for creating documentation on this feature and how it relates to other components of a secure python packaging process:

Source Repository GPG

Python Package GPG (./<package>.asc)

Python Wheel JWS S/MIME (PEP 427)

Index Mirror DSA (PEP 381)

Package Signatures for .deb, .rpm, ...

Python Package Configuration Management Systems

[Cryptographic] Hash Functions

seeAlso: #425 (this comment)

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 10, 2013

This has another useful purpose too, companies or organizations could use it to disallow installing items that haven't been through a security audit or license review or what have you. For instance OpenStack could potentially use it to help ensure that an unapproved dependency isn't added.

So would a use case be something like verifying a dependency graph of packages' checksums and metadata?

This has another useful purpose too, companies or organizations could use it to disallow installing items that haven't been through a security audit or license review or what have you. For instance OpenStack could potentially use it to help ensure that an unapproved dependency isn't added.

So would a use case be something like verifying a dependency graph of packages' checksums and metadata?

@pnasrat

This comment has been minimized.

Show comment
Hide comment
@pnasrat

pnasrat Jul 11, 2013

Contributor

I'm pretty interested in implementing this. Should I knock up a strawman Pull Request?

Contributor

pnasrat commented Jul 11, 2013

I'm pretty interested in implementing this. Should I knock up a strawman Pull Request?

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Jul 12, 2013

Member

So I've thought about this some more, and it's really started to grow on me.

Some thoughts on what I'd like to see:

I think the hook should be a python hook, that allows us to pass data about the thing we are trying to install into the hook easily, and receive more complex return types than pass/fail. If someone wants a simple call a command and the subprocess module is simple to use so the python portion of the hook in that case would be a small shim.

I think there needs to be more return types than Pass/Fail. In my mind there are four distinct return values. They are Pass, Warn, Retry, Fail. The defintions of them (again in my mind) would be:

Pass: The installation looks fine, go ahead and install it
Warn: The installation is ok, but there is a warning that should be presented to the user (this one is possibly not needed and warning could be done via the logging system).
Retry: This particular package is unsuitable, but pip can attempt to locate another package that fulfils this dependency (either from a different location, a different type, or a different version).
Fail: This package is unsuitable. Pip should not attempt to satisfy it and should throw an error.

At least that's what I think :) I'd love a PR that implements this hook feature.

Member

dstufft commented Jul 12, 2013

So I've thought about this some more, and it's really started to grow on me.

Some thoughts on what I'd like to see:

I think the hook should be a python hook, that allows us to pass data about the thing we are trying to install into the hook easily, and receive more complex return types than pass/fail. If someone wants a simple call a command and the subprocess module is simple to use so the python portion of the hook in that case would be a small shim.

I think there needs to be more return types than Pass/Fail. In my mind there are four distinct return values. They are Pass, Warn, Retry, Fail. The defintions of them (again in my mind) would be:

Pass: The installation looks fine, go ahead and install it
Warn: The installation is ok, but there is a warning that should be presented to the user (this one is possibly not needed and warning could be done via the logging system).
Retry: This particular package is unsuitable, but pip can attempt to locate another package that fulfils this dependency (either from a different location, a different type, or a different version).
Fail: This package is unsuitable. Pip should not attempt to satisfy it and should throw an error.

At least that's what I think :) I'd love a PR that implements this hook feature.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 12, 2013

I think there needs to be more return types than Pass/Fail. In my mind there are four distinct return values. They are Pass, Warn, Retry, Fail. The defintions of them (again in my mind) would be:

Pass: The installation looks fine, go ahead and install it
Warn: The installation is ok, but there is a warning that should be presented to the user (this one is possibly not needed and warning could be done via the logging system).
Retry: This particular package is unsuitable, but pip can attempt to locate another package that fulfils this dependency (either from a different location, a different type, or a different version).
Fail: This package is unsuitable. Pip should not attempt to satisfy it and should throw an error.

What are the os.exit() codes for each of these?

So would a use case be something like verifying a dependency graph of packages' checksums and metadata?

Pip package lists are specified as requirement specifiers in requirements.txt files.

So, in order to verify a list (a topologically sorted dependency graph) of python packages required for an environment, it is/will/would_be necessary to determine the path to the .asc file (for each/every/most package listed in a requirements description format).

I think there needs to be more return types than Pass/Fail. In my mind there are four distinct return values. They are Pass, Warn, Retry, Fail. The defintions of them (again in my mind) would be:

Pass: The installation looks fine, go ahead and install it
Warn: The installation is ok, but there is a warning that should be presented to the user (this one is possibly not needed and warning could be done via the logging system).
Retry: This particular package is unsuitable, but pip can attempt to locate another package that fulfils this dependency (either from a different location, a different type, or a different version).
Fail: This package is unsuitable. Pip should not attempt to satisfy it and should throw an error.

What are the os.exit() codes for each of these?

So would a use case be something like verifying a dependency graph of packages' checksums and metadata?

Pip package lists are specified as requirement specifiers in requirements.txt files.

So, in order to verify a list (a topologically sorted dependency graph) of python packages required for an environment, it is/will/would_be necessary to determine the path to the .asc file (for each/every/most package listed in a requirements description format).

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Jul 12, 2013

Member

I don't think it should be shelling out to executable by default. It should call a python function as a hook and use a python return value. If people want their particular instance of the hook to shell out that's a simple python wrapper that they can shell out on their own.

Member

dstufft commented Jul 12, 2013

I don't think it should be shelling out to executable by default. It should call a python function as a hook and use a python return value. If people want their particular instance of the hook to shell out that's a simple python wrapper that they can shell out on their own.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 14, 2013

takes two inputs:

  • a path to a local file, which is a newly downloaded package
  • a URL which the file was retrieved from
source_url = "https://pypi.python.org/packages/source/p/pip/pip-1.3.1.tar.gz#md5=cbb27a191cebc58997c4da8513863153"
asc_url = "https://pypi.python.org/packages/source/p/pip/pip-1.3.1.tar.gz.asc#md5=cbb27a191cebc58997c4da8513863153"
pkg_file = "./path/to/pip-1.3.1.tar.gz"
asc_file = "./path/to/pip-1.3.1.tar.gz.asc"

def verify(pkg_file, asc_file, source_url, asc_url):
    return [distlib].index.verify_signature(asc_file, pkg_file)

... http://pythonhosted.org/distlib/tutorial.html#verifying-signatures

takes two inputs:

  • a path to a local file, which is a newly downloaded package
  • a URL which the file was retrieved from
source_url = "https://pypi.python.org/packages/source/p/pip/pip-1.3.1.tar.gz#md5=cbb27a191cebc58997c4da8513863153"
asc_url = "https://pypi.python.org/packages/source/p/pip/pip-1.3.1.tar.gz.asc#md5=cbb27a191cebc58997c4da8513863153"
pkg_file = "./path/to/pip-1.3.1.tar.gz"
asc_file = "./path/to/pip-1.3.1.tar.gz.asc"

def verify(pkg_file, asc_file, source_url, asc_url):
    return [distlib].index.verify_signature(asc_file, pkg_file)

... http://pythonhosted.org/distlib/tutorial.html#verifying-signatures

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Jul 14, 2013

Member

Distlib's Signature support is inherently broken. You cannot just pipe out to GPG and trust whatever keys are in the trustdb. Just because you trust me for X does not mean you trust me for Y.

Member

dstufft commented Jul 14, 2013

Distlib's Signature support is inherently broken. You cannot just pipe out to GPG and trust whatever keys are in the trustdb. Just because you trust me for X does not mean you trust me for Y.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 14, 2013

[Distlib Signature Support]

So remove [distlib].index? I guess the question I was trying to ask was: what is the minimal python function call signature necessary to most correctly verify what it is we are trying to verify here.

I prefer a hook api where the config specifies a path to an executable in pip's config file.

http://stevedore.readthedocs.org/en/latest/ may be useful for adding hooks / plugins / extension points and/or as a reference for [setuptools entry_point configuration]

Mercurial hooks and extensions pass something like a context dict instead of positional arguments as with the verify() interface listed above.

The inputs are passed as commandline arguments to a subprocess which invokes that command.

How and when should I sanitize this input? What is the best way to specify the command arguments?

cmd = ("bash",string_downlaoded_from_the_internets) , shell=False
# NOT
cmd = "bash %s" % string_downlaoded_from_the_internets

The hook's stdout & stderr are the same as the parent pip process. The exit status is 0 to indicate "accept package" and non-zero to indicate "reject package".

From a shell script, is there then a way to differentiate between failed and sig-check-failed for an install -r requirements.txt?

I would be in favor of either and/or both:

  • Python argspec for verify(): e.g. verify(pkg_file, asc_file, source_url, asc_url)
  • a context dict keyset for the argspec parameters: e.g. dict.fromkeys(('pkg_file', 'asc_file', 'source_url', 'asc_url'), None)

[Distlib Signature Support]

So remove [distlib].index? I guess the question I was trying to ask was: what is the minimal python function call signature necessary to most correctly verify what it is we are trying to verify here.

I prefer a hook api where the config specifies a path to an executable in pip's config file.

http://stevedore.readthedocs.org/en/latest/ may be useful for adding hooks / plugins / extension points and/or as a reference for [setuptools entry_point configuration]

Mercurial hooks and extensions pass something like a context dict instead of positional arguments as with the verify() interface listed above.

The inputs are passed as commandline arguments to a subprocess which invokes that command.

How and when should I sanitize this input? What is the best way to specify the command arguments?

cmd = ("bash",string_downlaoded_from_the_internets) , shell=False
# NOT
cmd = "bash %s" % string_downlaoded_from_the_internets

The hook's stdout & stderr are the same as the parent pip process. The exit status is 0 to indicate "accept package" and non-zero to indicate "reject package".

From a shell script, is there then a way to differentiate between failed and sig-check-failed for an install -r requirements.txt?

I would be in favor of either and/or both:

  • Python argspec for verify(): e.g. verify(pkg_file, asc_file, source_url, asc_url)
  • a context dict keyset for the argspec parameters: e.g. dict.fromkeys(('pkg_file', 'asc_file', 'source_url', 'asc_url'), None)
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 14, 2013

... These were the stevedore documentation links I was looking for:

  • Enabled through Installation
  • Enabled explicitly
  • Self-Enabled

... These were the stevedore documentation links I was looking for:

  • Enabled through Installation
  • Enabled explicitly
  • Self-Enabled
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 14, 2013

Is the package signature hook called for .zip, .egg, and .whl packages AND for editable distributions?

There are new metadata attributes for package source locations.

# PEP 345 Metadata 1.2
download_url = str
# PEP 426 Metadata 2.0
source_url = {
    'key': http_path,
    'key_2': 'git+https://editable/path@version'
}

Is the package signature hook called for .zip, .egg, and .whl packages AND for editable distributions?

There are new metadata attributes for package source locations.

# PEP 345 Metadata 1.2
download_url = str
# PEP 426 Metadata 2.0
source_url = {
    'key': http_path,
    'key_2': 'git+https://editable/path@version'
}

@westurner westurner referenced this issue in conda/conda Dec 17, 2013

Closed

ENH: Mirroring/caching support #414

@vsajip

This comment has been minimized.

Show comment
Hide comment
@vsajip

vsajip Mar 27, 2014

Contributor

Distlib's Signature support is inherently broken. You cannot just pipe out to GPG and trust whatever keys are in the trustdb.

How so? You can specify the keystore to use. If necessary to support a potentially different keystore for each file, this could be accommodated via an extra argument to the verify_signature method. This is an incremental change to the API to make it more convenient, but can you explain why you think it's inherently broken?

Contributor

vsajip commented Mar 27, 2014

Distlib's Signature support is inherently broken. You cannot just pipe out to GPG and trust whatever keys are in the trustdb.

How so? You can specify the keystore to use. If necessary to support a potentially different keystore for each file, this could be accommodated via an extra argument to the verify_signature method. This is an incremental change to the API to make it more convenient, but can you explain why you think it's inherently broken?

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Mar 27, 2014

Member

Because throwing cryptography at a problem without providing a solution to the actual problem does do anything. Your solution uses gpg, GPG has a built in trust model which doesn't work for PyPI style packaging where it's a free for all. GPG web of trust validates identity, but it doesn't validate that a person is alllowed to sign for a particular file. You say that you can just point to a different trustdb in that case, but that still doesn't solve the underlying problem of how something gets into the trustdb to begin with.

Implementing packaging signing needs to start with a proper trust model, just slapping some crypto on top of it doesn't solve the problem.

Member

dstufft commented Mar 27, 2014

Because throwing cryptography at a problem without providing a solution to the actual problem does do anything. Your solution uses gpg, GPG has a built in trust model which doesn't work for PyPI style packaging where it's a free for all. GPG web of trust validates identity, but it doesn't validate that a person is alllowed to sign for a particular file. You say that you can just point to a different trustdb in that case, but that still doesn't solve the underlying problem of how something gets into the trustdb to begin with.

Implementing packaging signing needs to start with a proper trust model, just slapping some crypto on top of it doesn't solve the problem.

@vsajip

This comment has been minimized.

Show comment
Hide comment
@vsajip

vsajip Mar 28, 2014

Contributor

I see what you mean, but how something gets into the trust database is not really up to distlib to solve. To do things properly you need something like a web of trust - the distlib approach can still work in specific environments and scenarios for some people / organisations. No one piece of software can solve the trust problem, and it's not up to low-level software like distlib to determine which keys are trusted (that would be policy, not mechanism). Providing a piece of the puzzle is not "throwing cryptography at a problem" - it's more like "if you have keys you trust, then distlib provides a straightforward way of verifying signatures".

Contributor

vsajip commented Mar 28, 2014

I see what you mean, but how something gets into the trust database is not really up to distlib to solve. To do things properly you need something like a web of trust - the distlib approach can still work in specific environments and scenarios for some people / organisations. No one piece of software can solve the trust problem, and it's not up to low-level software like distlib to determine which keys are trusted (that would be policy, not mechanism). Providing a piece of the puzzle is not "throwing cryptography at a problem" - it's more like "if you have keys you trust, then distlib provides a straightforward way of verifying signatures".

@Ivoz

This comment has been minimized.

Show comment
Hide comment
@Ivoz

Ivoz Mar 28, 2014

Member

Keys you trust for what?

Member

Ivoz commented Mar 28, 2014

Keys you trust for what?

@vsajip

This comment has been minimized.

Show comment
Hide comment
@vsajip

vsajip Mar 28, 2014

Contributor

Keys you trust for what?

A key you trust to verify the signature of a specific package you downloaded. This will be the package publisher's public key (the corresponding private key having been used by the publisher to sign the package you downloaded), which you will have obtained through some trusted channel (so that you know the key belongs to the publisher, rather than someone claiming to be the publisher). This is easier said than done, but certainly doable for specific packages and publishers, with their cooperation.

Contributor

vsajip commented Mar 28, 2014

Keys you trust for what?

A key you trust to verify the signature of a specific package you downloaded. This will be the package publisher's public key (the corresponding private key having been used by the publisher to sign the package you downloaded), which you will have obtained through some trusted channel (so that you know the key belongs to the publisher, rather than someone claiming to be the publisher). This is easier said than done, but certainly doable for specific packages and publishers, with their cooperation.

@Ivoz

This comment has been minimized.

Show comment
Hide comment
@Ivoz

Ivoz Mar 28, 2014

Member

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

Member

Ivoz commented Mar 28, 2014

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

@vsajip

This comment has been minimized.

Show comment
Hide comment
@vsajip

vsajip Mar 28, 2014

Contributor

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.
  2. If index is an instance of distlib.index.PackageIndex, do index.gpg_home = '/path/to/keys'.
  3. Ensure that you have downloaded the archive and signature for the package to e.g. /path/to/package.tar.gz and /path/to/package.tar.gz.asc.
  4. Call index.verify_signature('/path/to/package.tar.gz.asc', '/path/to/package.tar.gz')
Contributor

vsajip commented Mar 28, 2014

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.
  2. If index is an instance of distlib.index.PackageIndex, do index.gpg_home = '/path/to/keys'.
  3. Ensure that you have downloaded the archive and signature for the package to e.g. /path/to/package.tar.gz and /path/to/package.tar.gz.asc.
  4. Call index.verify_signature('/path/to/package.tar.gz.asc', '/path/to/package.tar.gz')
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Mar 29, 2014

Implementing packaging signing needs to start with a proper trust model, just slapping some crypto on top of it doesn't solve the problem.

.

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

.

  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

So the trust model must include a mechanism for specifying which keys are valid for which packages?

  1. Keyserver
  2. Key <-> Package mappings
  3. Key <-> Package mapping server

https://en.wikipedia.org/wiki/Web_of_trust

Implementing packaging signing needs to start with a proper trust model, just slapping some crypto on top of it doesn't solve the problem.

.

Ok, so what's the mechanism of specifying that a certain key is only trusted for a certain package?

.

  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

So the trust model must include a mechanism for specifying which keys are valid for which packages?

  1. Keyserver
  2. Key <-> Package mappings
  3. Key <-> Package mapping server

https://en.wikipedia.org/wiki/Web_of_trust

@Ivoz

This comment has been minimized.

Show comment
Hide comment
@Ivoz

Ivoz Mar 29, 2014

Member
  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

See, this is the entirety of the hard part of the problem domain, but you've neatly tucked it away in a single sentence. Actual signing and verifying has been easy for the past decade. It's so mechanically easy it's hardly worth implementing (and possibly even dangerous to do so, as you may give users a false sense of security) until you have a rigorous design for problem number 1, how do I get keys for people I trust and how to I decide what the heck I trust them with, and when, and for what.

I'd liken implementing package signing and verification without a well-thought identity, ownership and trust model overlying it, to implementing SSL in a browser without a PKI or certificate verification.

Member

Ivoz commented Mar 29, 2014

  1. Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

See, this is the entirety of the hard part of the problem domain, but you've neatly tucked it away in a single sentence. Actual signing and verifying has been easy for the past decade. It's so mechanically easy it's hardly worth implementing (and possibly even dangerous to do so, as you may give users a false sense of security) until you have a rigorous design for problem number 1, how do I get keys for people I trust and how to I decide what the heck I trust them with, and when, and for what.

I'd liken implementing package signing and verification without a well-thought identity, ownership and trust model overlying it, to implementing SSL in a browser without a PKI or certificate verification.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Mar 29, 2014

.2. Key <-> Package mappings

Is this a signed graph with typed edges?

With SSL, certs are tied to DNS (technically "Common Name") identifiers.

Not all packages are on PyPi, so a PyPi URN wouldn't solve for as many cases as just mapping Keys to Package URIs with 'types' (or 'roles'?): {"committer", [...], "-er" }.

To me, this seems like a useful metadata requirement to impose upon software project teams.

Could such "Key <-> Package mappings" metadata be inligned (topologically) with checksums in requirements.txt and/or requirements.txt.lock files (like peep)? Are there cert store formats which can store (package_uri, role, key) tuples?

[EDIT]

.2. Key <-> Package mappings

Is this a signed graph with typed edges?

With SSL, certs are tied to DNS (technically "Common Name") identifiers.

Not all packages are on PyPi, so a PyPi URN wouldn't solve for as many cases as just mapping Keys to Package URIs with 'types' (or 'roles'?): {"committer", [...], "-er" }.

To me, this seems like a useful metadata requirement to impose upon software project teams.

Could such "Key <-> Package mappings" metadata be inligned (topologically) with checksums in requirements.txt and/or requirements.txt.lock files (like peep)? Are there cert store formats which can store (package_uri, role, key) tuples?

[EDIT]

@vsajip

This comment has been minimized.

Show comment
Hide comment
@vsajip

vsajip Mar 30, 2014

Contributor

Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

See, this is the entirety of the hard part of the problem domain, but you've neatly tucked it away in a single sentence.

Why write a screed when a short sentence will do? This has been discussed elsewhere many times.

Actual signing and verifying has been easy for the past decade. It's so mechanically easy it's hardly worth implementing

Well, I've implemented it for my own use, and others can use that implementation or not, just as they choose :-)

(and possibly even dangerous to do so, as you may give users a false sense of security) until you have a rigorous design for problem number 1

You're saying you shouldn't provide a solution for some people unless you provide a solution for everyone? I don't agree with this argument - it's a bit like saying PKI shouldn't have been invented at all, or that C shouldn't have been invented until the problem of buffer overflow exploits was solved ;-) There are scenarios where one can obtain and use trusted keys, and I have used PKI and GnuPG successfully in such scenarios. And a "false sense of security" can even bite seasoned security pros - just look at all the exploits around SSL - but that doesn't mean we should have nothing in its place.

Contributor

vsajip commented Mar 30, 2014

Get the key you trust for a package into a GPG keystore in directory /path/to/keys.

See, this is the entirety of the hard part of the problem domain, but you've neatly tucked it away in a single sentence.

Why write a screed when a short sentence will do? This has been discussed elsewhere many times.

Actual signing and verifying has been easy for the past decade. It's so mechanically easy it's hardly worth implementing

Well, I've implemented it for my own use, and others can use that implementation or not, just as they choose :-)

(and possibly even dangerous to do so, as you may give users a false sense of security) until you have a rigorous design for problem number 1

You're saying you shouldn't provide a solution for some people unless you provide a solution for everyone? I don't agree with this argument - it's a bit like saying PKI shouldn't have been invented at all, or that C shouldn't have been invented until the problem of buffer overflow exploits was solved ;-) There are scenarios where one can obtain and use trusted keys, and I have used PKI and GnuPG successfully in such scenarios. And a "false sense of security" can even bite seasoned security pros - just look at all the exploits around SSL - but that doesn't mean we should have nothing in its place.

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Mar 30, 2014

Member

It's worth noting that the complexity of the trust problem for package distribution is the main reason http://www.python.org/dev/peps/pep-0458/ and "The Update Framework" itself exist.

In relation to idea of "implement a hook that assumes any already verified GPG trust DB", well that's the same reason I signed off on Daniel's embedded signature support in PEP 427 - he had a constrained environment where he wanted to use that feature, and it was easy enough for everyone else to just ignore. Same goes for folks that have sorted out their GPG trust issues.

As far as this issue goes, +1 from me for the notion of making the verification step pluggable - we just need to be careful how those plugins get configured, because indirect attack vectors are always fun for all involved :)

Member

ncoghlan commented Mar 30, 2014

It's worth noting that the complexity of the trust problem for package distribution is the main reason http://www.python.org/dev/peps/pep-0458/ and "The Update Framework" itself exist.

In relation to idea of "implement a hook that assumes any already verified GPG trust DB", well that's the same reason I signed off on Daniel's embedded signature support in PEP 427 - he had a constrained environment where he wanted to use that feature, and it was easy enough for everyone else to just ignore. Same goes for folks that have sorted out their GPG trust issues.

As far as this issue goes, +1 from me for the notion of making the verification step pluggable - we just need to be careful how those plugins get configured, because indirect attack vectors are always fun for all involved :)

@ypid

This comment has been minimized.

Show comment
Hide comment
@ypid

ypid Oct 3, 2016

As there is no native support available I am using a workaround based on Verifying PyPI and Conda Packages for my packages. Examples: yaml4rst, hlc

ypid commented Oct 3, 2016

As there is no native support available I am using a workaround based on Verifying PyPI and Conda Packages for my packages. Examples: yaml4rst, hlc

@ypid ypid referenced this issue in debops/debops-tools Oct 3, 2016

Open

Signed PyPI releases #170

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Mar 23, 2017

From https://github.com/blockchain-certificates/cert-schema/issues/25#issuecomment-282571524 :

# EXAMPLE 4: A signature chain in a Linked Data document
{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!",
  "signatureChain": [{
    "type": "RsaSignature2015",
    "creator": "http://example.com/i/pat/keys/5",
    "created": "2011-09-23T20:21:34Z",
    "domain": "example.org",
    "nonce": "2bbgh3dgjg2302d-d2b3gi423d42",
    "signatureValue": "OGQzNGVkMzVm4NTIyZTkZDY...NmExMgoYzI43Q3ODIyOWM32NjI="
  }, {
    "type": "RsaSignature2015",
    "creator": "http://bank.example.com/notary/keys/7f3j",
    "created": "2011-09-23T20:24:12Z",
    "domain": "example.org",
    "nonce": "83jj4hd62j49gk38",
    "signatureValue": "yZTkZDYOGzNGVkMVm4NTIQz...M32NjINmExMDIyOWgoYzI43Q3O="
  }]
}

From https://github.com/blockchain-certificates/cert-schema/issues/25#issuecomment-282571524 :

# EXAMPLE 4: A signature chain in a Linked Data document
{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!",
  "signatureChain": [{
    "type": "RsaSignature2015",
    "creator": "http://example.com/i/pat/keys/5",
    "created": "2011-09-23T20:21:34Z",
    "domain": "example.org",
    "nonce": "2bbgh3dgjg2302d-d2b3gi423d42",
    "signatureValue": "OGQzNGVkMzVm4NTIyZTkZDY...NmExMgoYzI43Q3ODIyOWM32NjI="
  }, {
    "type": "RsaSignature2015",
    "creator": "http://bank.example.com/notary/keys/7f3j",
    "created": "2011-09-23T20:24:12Z",
    "domain": "example.org",
    "nonce": "83jj4hd62j49gk38",
    "signatureValue": "yZTkZDYOGzNGVkMVm4NTIQz...M32NjINmExMDIyOWgoYzI43Q3O="
  }]
}
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Mar 23, 2017

# EXAMPLE 5: A complete example of a signature suite

{
  "id": "https://w3id.org/security#LinkedDataSignature2015",
  "type": "SignatureSuite",
  "canonicalizationAlgorithm": "https://w3id.org/security#URDNA2015",
  "digestAlgorithm": "http://example.com/digests#sha512",
  "signatureAlgorithm": "http://www.w3.org/2000/09/xmldsig#rsa-sha256"
}
# EXAMPLE 5: A complete example of a signature suite

{
  "id": "https://w3id.org/security#LinkedDataSignature2015",
  "type": "SignatureSuite",
  "canonicalizationAlgorithm": "https://w3id.org/security#URDNA2015",
  "digestAlgorithm": "http://example.com/digests#sha512",
  "signatureAlgorithm": "http://www.w3.org/2000/09/xmldsig#rsa-sha256"
}
@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Mar 31, 2017

Member

I'm going to close this, I don't think we're going to implement it (nor do I think we want to implement it) and TUF will provide a better mechanism for signed packages once that is implemented.

Member

dstufft commented Mar 31, 2017

I'm going to close this, I don't think we're going to implement it (nor do I think we want to implement it) and TUF will provide a better mechanism for signed packages once that is implemented.

@dstufft dstufft closed this Mar 31, 2017

@NicoHood

This comment has been minimized.

Show comment
Hide comment
@NicoHood

NicoHood May 18, 2017

@dstufft We really need a feature like that nowadays. As you might have noticed multiple websites get compromised. Sample of handbrake. Users need to be able to verify the source via GPG to ensure no modifications in transit or on the server were made.

This is especially important as a lot of users use pip to download their python modules. Simply because they are not available on the operating system or just because lots of google posts suggest this. Especially because most of them suggest to install via sudo and not via --user. This is a very large attack vector without GPG source verification.

Please add an option for GPG verification and also suggest the user to verify the source if signatures are available (and display the fingerprint to the user).

@dstufft We really need a feature like that nowadays. As you might have noticed multiple websites get compromised. Sample of handbrake. Users need to be able to verify the source via GPG to ensure no modifications in transit or on the server were made.

This is especially important as a lot of users use pip to download their python modules. Simply because they are not available on the operating system or just because lots of google posts suggest this. Especially because most of them suggest to install via sudo and not via --user. This is a very large attack vector without GPG source verification.

Please add an option for GPG verification and also suggest the user to verify the source if signatures are available (and display the fingerprint to the user).

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft May 18, 2017

Member

It's almost certain there is not going to be an option to verify GPG signatures within pip. GPG signatures are practically worthless on their own unless you have a trust model (and the built in web of trust is not good enough) and any effort that goes into implementing a trust model around GPG that works for us would be better spent implementing TUF.

Member

dstufft commented May 18, 2017

It's almost certain there is not going to be an option to verify GPG signatures within pip. GPG signatures are practically worthless on their own unless you have a trust model (and the built in web of trust is not good enough) and any effort that goes into implementing a trust model around GPG that works for us would be better spent implementing TUF.

@NicoHood

This comment has been minimized.

Show comment
Hide comment
@NicoHood

NicoHood May 18, 2017

@dstufft you specify the trusted key in the install command as written above. And the website that requires to install those deps will also list the fingerprints of the signed sources. Then pip compares the provided fingerprints on the pypi server with the command line. This way a pypi server side hack will be noticed.

This is a general problem of crypto. But you cant excuse with the statement that its not 100% failsafe and gpg is not usable with this limitation. Its the best and only real solution we have to verify sources. And if you make it not too complicated for the usecases above its a fairly simple process.

@dstufft you specify the trusted key in the install command as written above. And the website that requires to install those deps will also list the fingerprints of the signed sources. Then pip compares the provided fingerprints on the pypi server with the command line. This way a pypi server side hack will be noticed.

This is a general problem of crypto. But you cant excuse with the statement that its not 100% failsafe and gpg is not usable with this limitation. Its the best and only real solution we have to verify sources. And if you make it not too complicated for the usecases above its a fairly simple process.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner May 18, 2017

westurner commented May 18, 2017

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan May 19, 2017

Member

@NicoHood We're thoroughly skeptical of claims that this is in high demand or a major end user security concern, as we have zero commercial pip redistributors reporting sufficient customer demand for them to invest engineering time in improving the security model of the tooling. Instead, they either cache the published hashes, or cache entire artifacts, such that PyPI compromises after the initial release won't have any impact on them and their customers.

Similarly, publishers can detect any such post-publication compromises for themselves by maintaining a list of previously published hashes, and checking them against what PyPI is providing (or what redistributors are providing, for that matter - assuming they're republishing unmodified sources without applying any downstream patches).

Signatures are only useful as a way of verifying publishers, and GPG has no trust model to enable that in a useful form for an open platform like PyPI (this isn't like a Linux distro where you'd just be trusting the GPG key used in the distro's build system).

Member

ncoghlan commented May 19, 2017

@NicoHood We're thoroughly skeptical of claims that this is in high demand or a major end user security concern, as we have zero commercial pip redistributors reporting sufficient customer demand for them to invest engineering time in improving the security model of the tooling. Instead, they either cache the published hashes, or cache entire artifacts, such that PyPI compromises after the initial release won't have any impact on them and their customers.

Similarly, publishers can detect any such post-publication compromises for themselves by maintaining a list of previously published hashes, and checking them against what PyPI is providing (or what redistributors are providing, for that matter - assuming they're republishing unmodified sources without applying any downstream patches).

Signatures are only useful as a way of verifying publishers, and GPG has no trust model to enable that in a useful form for an open platform like PyPI (this isn't like a Linux distro where you'd just be trusting the GPG key used in the distro's build system).

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner May 19, 2017

We're thoroughly skeptical of claims that this is in high demand or a major end user security concern, as we have zero commercial pip redistributors reporting sufficient customer demand for them to invest engineering time in improving the security model of the tooling. Instead, they either cache the published hashes, or cache entire artifacts, such that PyPI compromises after the initial release won't have any impact on them and their customers.

Is there demand for end-to-end security in a continuous deployment workflow?

  • git and hg support hash-checking
  • pip supports hash-checking
  • pipenv supports hash-checking
  • OS package management systems support hash-checking (debsums, rpm -V/--verify) and GPG signatures
    • any key added to the trust ring is considered okay for any package from any repository

...

GitPython

pygit2

mercurial

OS Packages

Similarly, publishers can detect any such post-publication compromises for themselves by maintaining a list of previously published hashes, and checking them against what PyPI is providing (or what redistributors are providing, for that matter - assuming they're republishing unmodified sources without applying any downstream patches).

  • (package, version, source://uri@$rev, [{patches}])
    • a packaging attribute (in addition to possibly a semver version string git rev identifier) to track the original {git, hg, } revision URI would be helpful
      • (repo uri, [branch name,] commit id)
    • [git] diff -r $rev -r $rev_after_packaging_packages_are_applied would be real nice.

Signatures are only useful as a way of verifying publishers, and GPG has no trust model to enable that in a useful form for an open platform like PyPI (this isn't like a Linux distro where you'd just be trusting the GPG key used in the distro's build system).

What could solve for this?

"signature": {
    "type": ["MerkleProof2017", "Extension"],
    "merkleRoot": "68f3ede17fdb67ffd4a5164b5687a71f9fbb68da803b803935720f2aa38f7728",
    "targetHash": "c9ead76a54426b4ce4899bb921e48f5b55ea7592e5cee4460c86ebf4698ac3a6",
    "proof": [{
        "right": "7fef060cb17614fdfddd8c558e102fbb96433f5281e96c80f805459773e51163"
    }],
    "anchors": [{
      "sourceId": "8623beadbc7877a9e20fb7f83eda6c1a1fc350171f0714ff6c6c4054018eb54d",
      "type": "BTCOpReturn"
    }]
  }
  • Challenges:
    • With which key would one sign the ACL document?
      • Would pypa/pypi/warehouse then need to bless a project master key and then sign it (sort of like a blockchain transaction)?

We're thoroughly skeptical of claims that this is in high demand or a major end user security concern, as we have zero commercial pip redistributors reporting sufficient customer demand for them to invest engineering time in improving the security model of the tooling. Instead, they either cache the published hashes, or cache entire artifacts, such that PyPI compromises after the initial release won't have any impact on them and their customers.

Is there demand for end-to-end security in a continuous deployment workflow?

  • git and hg support hash-checking
  • pip supports hash-checking
  • pipenv supports hash-checking
  • OS package management systems support hash-checking (debsums, rpm -V/--verify) and GPG signatures
    • any key added to the trust ring is considered okay for any package from any repository

...

GitPython

pygit2

mercurial

OS Packages

Similarly, publishers can detect any such post-publication compromises for themselves by maintaining a list of previously published hashes, and checking them against what PyPI is providing (or what redistributors are providing, for that matter - assuming they're republishing unmodified sources without applying any downstream patches).

  • (package, version, source://uri@$rev, [{patches}])
    • a packaging attribute (in addition to possibly a semver version string git rev identifier) to track the original {git, hg, } revision URI would be helpful
      • (repo uri, [branch name,] commit id)
    • [git] diff -r $rev -r $rev_after_packaging_packages_are_applied would be real nice.

Signatures are only useful as a way of verifying publishers, and GPG has no trust model to enable that in a useful form for an open platform like PyPI (this isn't like a Linux distro where you'd just be trusting the GPG key used in the distro's build system).

What could solve for this?

"signature": {
    "type": ["MerkleProof2017", "Extension"],
    "merkleRoot": "68f3ede17fdb67ffd4a5164b5687a71f9fbb68da803b803935720f2aa38f7728",
    "targetHash": "c9ead76a54426b4ce4899bb921e48f5b55ea7592e5cee4460c86ebf4698ac3a6",
    "proof": [{
        "right": "7fef060cb17614fdfddd8c558e102fbb96433f5281e96c80f805459773e51163"
    }],
    "anchors": [{
      "sourceId": "8623beadbc7877a9e20fb7f83eda6c1a1fc350171f0714ff6c6c4054018eb54d",
      "type": "BTCOpReturn"
    }]
  }
  • Challenges:
    • With which key would one sign the ACL document?
      • Would pypa/pypi/warehouse then need to bless a project master key and then sign it (sort of like a blockchain transaction)?
@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan May 20, 2017

Member

@westurner You've been warned multiple times on multiple projects not to post random link dumps into tracker issues (and elsewhere). Please voluntarily refrain from doing so, so it doesn't need to escalate to another block.

Member

ncoghlan commented May 20, 2017

@westurner You've been warned multiple times on multiple projects not to post random link dumps into tracker issues (and elsewhere). Please voluntarily refrain from doing so, so it doesn't need to escalate to another block.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner May 20, 2017

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan May 21, 2017

Member

@westurner We've had a defined technical solution to this problem for years, and Donald referred to it above: The Update Framework.

The details are covered in two PEPs:

This was also one of the key points of concern I raised in my overview of the state of Python packaging last year: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.html#making-pypi-security-independent-of-ssl-tls

It is not a technical problem now, and hasn't been since those PEPs were written. Throwing more technical ideas or evidence of unfunded demand at the PyPA developers does nothing to advance the situation.

Instead, it's a funding and sustainability problem, that requires folks either to lobby commercial redistributors to tackle this problem comprehensively on behalf of their customers, or else to make the case for why the PSF should fund this when vendors with a strong reputation for handling open source security management concerns on behalf of their customers decline to do so. Either way, the PyPA developers are not the right people to be directing any advocacy towards.

Member

ncoghlan commented May 21, 2017

@westurner We've had a defined technical solution to this problem for years, and Donald referred to it above: The Update Framework.

The details are covered in two PEPs:

This was also one of the key points of concern I raised in my overview of the state of Python packaging last year: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.html#making-pypi-security-independent-of-ssl-tls

It is not a technical problem now, and hasn't been since those PEPs were written. Throwing more technical ideas or evidence of unfunded demand at the PyPA developers does nothing to advance the situation.

Instead, it's a funding and sustainability problem, that requires folks either to lobby commercial redistributors to tackle this problem comprehensively on behalf of their customers, or else to make the case for why the PSF should fund this when vendors with a strong reputation for handling open source security management concerns on behalf of their customers decline to do so. Either way, the PyPA developers are not the right people to be directing any advocacy towards.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner May 21, 2017

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner May 21, 2017

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft May 21, 2017

Member

Each repository is responsible for it's own security, so if you're using PyPI, then packages installed from PyPI derive their trust from a PyPI specific set of root keys. If you're using DevPI it will be up to DevPI to support TUF with it's own instance specific DevPI set of root keys. DevPI would/could validate the trust from PyPI before mirroring it onto DevPI and signing it itself.

Member

dstufft commented May 21, 2017

Each repository is responsible for it's own security, so if you're using PyPI, then packages installed from PyPI derive their trust from a PyPI specific set of root keys. If you're using DevPI it will be up to DevPI to support TUF with it's own instance specific DevPI set of root keys. DevPI would/could validate the trust from PyPI before mirroring it onto DevPI and signing it itself.

@rhuddleston

This comment has been minimized.

Show comment
Hide comment
@rhuddleston

rhuddleston Sep 17, 2017

This issue should be re-opened. I'm not asking for the system to be perfect I will download the gpg public keys for the packages I want to be able to install via pip. I simply want pip to only allow installation of packages that match those signatures. If someone changes the key (or removed it) it's my problem to figure out if the key was legitimately changed or if someone compromised the package. It's really no different that what I do for deb repos for example.

This issue should be re-opened. I'm not asking for the system to be perfect I will download the gpg public keys for the packages I want to be able to install via pip. I simply want pip to only allow installation of packages that match those signatures. If someone changes the key (or removed it) it's my problem to figure out if the key was legitimately changed or if someone compromised the package. It's really no different that what I do for deb repos for example.

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 17, 2017

Member

@rhuddleston If you're willing to trust the GPG key management practices of arbitrary publishers, then it's already entirely feasible to implement your own pip wrapper that adds the check you're seeking.

You don't need anyone's permission for that, and you certainly don't need to wait for hook support in the official pip client. (As a previous example of something like this, checking downloads against previously recorded hashes started out as a peep feature, rather than as a pip one)

But we're not going to recommend GPG as a general measure, because the web of trust model doesn't scale adequately for an open publishing platform with arbitrary publishers: it relies on the assumption that the signing keys are managed securely, and we simply don't agree that that's a well-founded assumption in the context of PyPI.

Member

ncoghlan commented Sep 17, 2017

@rhuddleston If you're willing to trust the GPG key management practices of arbitrary publishers, then it's already entirely feasible to implement your own pip wrapper that adds the check you're seeking.

You don't need anyone's permission for that, and you certainly don't need to wait for hook support in the official pip client. (As a previous example of something like this, checking downloads against previously recorded hashes started out as a peep feature, rather than as a pip one)

But we're not going to recommend GPG as a general measure, because the web of trust model doesn't scale adequately for an open publishing platform with arbitrary publishers: it relies on the assumption that the signing keys are managed securely, and we simply don't agree that that's a well-founded assumption in the context of PyPI.

@NicoHood

This comment has been minimized.

Show comment
Hide comment
@NicoHood

NicoHood Sep 17, 2017

So instead of using (the not perfect) GPG you simply leave it as it is without any kind of verification?

So instead of using (the not perfect) GPG you simply leave it as it is without any kind of verification?

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 17, 2017

Member

No, we use the only verification we can currently meaningfully offer:

  • hash checking to ensure that previously downloaded artifacts don't change
  • completely out-of-band signature checking that bypasses PyPI and the PyPA tooling entirely (as if you genuinely don't trust the PyPI admins or services, you can't trust any package signatures that PyPI publishes, nor any signature checking tools obtained from PyPI).

Unlike Linux distros, where GPG signatures provide assurance that the software you're installing was actually published by the distro, GPG signatures provide no meaningful assurance in the context of an open publication platform like PyPI - believing they do is only possible in the absence of clearly defined threat modelling that identifies the actors and actions you're aiming to defend against, and the kinds of trust you're aiming to enable.

It is possible to create a trust management system that would meaningfully improve the state of PyPI security by reducing the reliance on the HTTPS CA system for delivery assurance (see the links to PEP 458 and PEP 480 above), but "just add GPG!" isn't it.

Member

ncoghlan commented Sep 17, 2017

No, we use the only verification we can currently meaningfully offer:

  • hash checking to ensure that previously downloaded artifacts don't change
  • completely out-of-band signature checking that bypasses PyPI and the PyPA tooling entirely (as if you genuinely don't trust the PyPI admins or services, you can't trust any package signatures that PyPI publishes, nor any signature checking tools obtained from PyPI).

Unlike Linux distros, where GPG signatures provide assurance that the software you're installing was actually published by the distro, GPG signatures provide no meaningful assurance in the context of an open publication platform like PyPI - believing they do is only possible in the absence of clearly defined threat modelling that identifies the actors and actions you're aiming to defend against, and the kinds of trust you're aiming to enable.

It is possible to create a trust management system that would meaningfully improve the state of PyPI security by reducing the reliance on the HTTPS CA system for delivery assurance (see the links to PEP 458 and PEP 480 above), but "just add GPG!" isn't it.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 17, 2017

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Sep 17, 2017

Member

To be clear, if end users correctly managed a trust store that mapped project names to GPG keys then it is fine and that would add an additional layer of security over what currently exists.

The issue is ultimately one of impact. Due to differences in the distro vs PyPI/pip case, we do not currently have the mechanism in place to automatically map projects to gpg keys, which means that end users will be responsible for doing this themselves. It is my opinion that the vast bulk of people will simply not bother and thus we will have added this feature for little benefit except for a minority of users.

Now, one could argue that adding a feature that a user can ignore doesn't cost them anything-- but in my opinion it does. It adds additional overhead in the things they need to understand in order to actually use pip, more things they need to weed through. On the maintenance side it also adds additional complexity which means that it's harder to test and develop and maintain pip in the long run, particularly for something that we're pretty sure we're not going to be using.

The other problem here is an ecosystem one. By providing a way to validate GPG keys we're implicitly telling people that they should be signing their packages with GPG, however we're already pretty sure that we're not going to be using that so it is effectively going to be making work for people that they're going to want to throw away at some point.

Member

dstufft commented Sep 17, 2017

To be clear, if end users correctly managed a trust store that mapped project names to GPG keys then it is fine and that would add an additional layer of security over what currently exists.

The issue is ultimately one of impact. Due to differences in the distro vs PyPI/pip case, we do not currently have the mechanism in place to automatically map projects to gpg keys, which means that end users will be responsible for doing this themselves. It is my opinion that the vast bulk of people will simply not bother and thus we will have added this feature for little benefit except for a minority of users.

Now, one could argue that adding a feature that a user can ignore doesn't cost them anything-- but in my opinion it does. It adds additional overhead in the things they need to understand in order to actually use pip, more things they need to weed through. On the maintenance side it also adds additional complexity which means that it's harder to test and develop and maintain pip in the long run, particularly for something that we're pretty sure we're not going to be using.

The other problem here is an ecosystem one. By providing a way to validate GPG keys we're implicitly telling people that they should be signing their packages with GPG, however we're already pretty sure that we're not going to be using that so it is effectively going to be making work for people that they're going to want to throw away at some point.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 17, 2017

@obestwalter

This comment has been minimized.

Show comment
Hide comment
@obestwalter

obestwalter Sep 17, 2017

The other problem here is an ecosystem one. By providing a way to validate GPG keys we're implicitly telling people that they should be signing their packages with GPG, however we're already pretty sure that we're not going to be using that so it is effectively going to be making work for people that they're going to want to throw away at some point.

I think that is a very important point. We do not need to create busywork for FOSS maintainers if it does not really improve things.

The other problem here is an ecosystem one. By providing a way to validate GPG keys we're implicitly telling people that they should be signing their packages with GPG, however we're already pretty sure that we're not going to be using that so it is effectively going to be making work for people that they're going to want to throw away at some point.

I think that is a very important point. We do not need to create busywork for FOSS maintainers if it does not really improve things.

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 18, 2017

Member

The reason GPG signing is effective in the Linux distro case is that the main purpose of it is for the publishers of the distro itself to ensure the integrity of the link from the distro's build system to end user installations of the distro even when that link traverses untrusted systems like public mirrors and the internet: the publishing system and the consumption system are controlled by the same entity, and you have to go through some form of review process to get access to the publishing end. The meaningful assurances of trustworthiness then come from the combination of GPG content signing and pre-publication review and publisher key management, not the content signing alone. (The Linux distros also take care of ensuring that GPG key management infrastructure is available and working for both publishers and consumers, whereas 'GPG will already be available and working' is an entirely invalid assumption on non-Linux systems)

Aside from the UX train wreck that is attempting to set up GPG signature checking on non-Linux systems, the key architectural differences in the PyPI case are that there is no pre-publication review process, and no standardised process for publisher key management. Adding only the GPG content signing part without addressing either of those aspects thus becomes purely a matter of security theatre, adding minimal value beyond the link integrity protection offered by HTTPS.

The lack of end-to-end signing support (outside the embedded signature support in the wheel file format) does mean that both the PyPI admins and the Fastly CDN admins constitute an "insider threat" for all consumers of content from PyPI. Now, it is possible for us to design and develop a system to inherently neutralise that threat (and PEP 480 describes one such system), but it's also possible to neutralise it through less mathematically sophisticated methods, like folks publishing expected artifact hashes through an independent registry, and publishers explicitly checking that the artifacts that PyPI publishes are the ones they uploaded.

However, effectively designing such a system requires people to actually define and document the threat model they're attempting to defend against, and choose the appropriate tools and techniques to provide the greatest increase in integrity assurance at the lowest cost in time and effort for publishers, infrastructure maintainers, and end users, rather than simply assuming that because a particular technique (i.e. GPG content signing) works well in the context of a Linux distribution, that same technique will be able to provide meaningful assurances in the context of an open publication platform like PyPI.

Member

ncoghlan commented Sep 18, 2017

The reason GPG signing is effective in the Linux distro case is that the main purpose of it is for the publishers of the distro itself to ensure the integrity of the link from the distro's build system to end user installations of the distro even when that link traverses untrusted systems like public mirrors and the internet: the publishing system and the consumption system are controlled by the same entity, and you have to go through some form of review process to get access to the publishing end. The meaningful assurances of trustworthiness then come from the combination of GPG content signing and pre-publication review and publisher key management, not the content signing alone. (The Linux distros also take care of ensuring that GPG key management infrastructure is available and working for both publishers and consumers, whereas 'GPG will already be available and working' is an entirely invalid assumption on non-Linux systems)

Aside from the UX train wreck that is attempting to set up GPG signature checking on non-Linux systems, the key architectural differences in the PyPI case are that there is no pre-publication review process, and no standardised process for publisher key management. Adding only the GPG content signing part without addressing either of those aspects thus becomes purely a matter of security theatre, adding minimal value beyond the link integrity protection offered by HTTPS.

The lack of end-to-end signing support (outside the embedded signature support in the wheel file format) does mean that both the PyPI admins and the Fastly CDN admins constitute an "insider threat" for all consumers of content from PyPI. Now, it is possible for us to design and develop a system to inherently neutralise that threat (and PEP 480 describes one such system), but it's also possible to neutralise it through less mathematically sophisticated methods, like folks publishing expected artifact hashes through an independent registry, and publishers explicitly checking that the artifacts that PyPI publishes are the ones they uploaded.

However, effectively designing such a system requires people to actually define and document the threat model they're attempting to defend against, and choose the appropriate tools and techniques to provide the greatest increase in integrity assurance at the lowest cost in time and effort for publishers, infrastructure maintainers, and end users, rather than simply assuming that because a particular technique (i.e. GPG content signing) works well in the context of a Linux distribution, that same technique will be able to provide meaningful assurances in the context of an open publication platform like PyPI.

@rhuddleston

This comment has been minimized.

Show comment
Hide comment
@rhuddleston

rhuddleston Oct 5, 2017

gpg checks are much better than checksums. It tells me that the person who published this package continues to be the same person that published it, since when I first started using the package. Even if just the minority of people downloaded the keys via a different channels and only installed packages correctly signed with specific signatures, those minority (e.g. security professionals) would notice when popular packages keys changes and would investigate. Basically it's a second factor.

For example if I managed to login into someone's pypi account now for a popular package I could now replace this package with my slightly modified version that includes an extra backdoor or malware no one would easily notice.

The thinking that GPG is not perfect so it's better to have nothing is misguided. For example that is what happened here conda/conda#1395

It seems there has been discussion about including TUF since 2011 yet is there any imminent plans of when this might be released? If there is a commitment to implement TUF are there any major roadblocks to getting this done?

Also another thing with TUF is it's not easy for many people to understand. The only place I've seen TUF implemented is with notary and docker. For example all "official" repos on dockerhub are signed and anyone can use use DOCKER_CONTENT_TRUST=1 to pull and validate these images. On the other hand almost no one else signs images when uploading to dockerhub. I think a big part of this is because it's a lot more complicated that gpg to setup and use. I'm hoping to create some easy step by step articles to make this easier so more people will use it.

I worry that even if TUF is implemented on pypi that if it's too difficult for developers to use then no one will bother. If I build a perfect security system and then no-one uses it then overall we will not be better off. For example If we can get a majority of maintainers to add gpg signatures that will be a big improvement. If you required gpg or TUF before someone is allowed to publish to pypi this could be a big advantage.

When TUF is implemented on pypi will this be a requirement? How can we make sure this is a simple process so developers won't find it to be a burden? Basically it will need to be easier than dockerhub if that is the indication on adoption rate.

gpg checks are much better than checksums. It tells me that the person who published this package continues to be the same person that published it, since when I first started using the package. Even if just the minority of people downloaded the keys via a different channels and only installed packages correctly signed with specific signatures, those minority (e.g. security professionals) would notice when popular packages keys changes and would investigate. Basically it's a second factor.

For example if I managed to login into someone's pypi account now for a popular package I could now replace this package with my slightly modified version that includes an extra backdoor or malware no one would easily notice.

The thinking that GPG is not perfect so it's better to have nothing is misguided. For example that is what happened here conda/conda#1395

It seems there has been discussion about including TUF since 2011 yet is there any imminent plans of when this might be released? If there is a commitment to implement TUF are there any major roadblocks to getting this done?

Also another thing with TUF is it's not easy for many people to understand. The only place I've seen TUF implemented is with notary and docker. For example all "official" repos on dockerhub are signed and anyone can use use DOCKER_CONTENT_TRUST=1 to pull and validate these images. On the other hand almost no one else signs images when uploading to dockerhub. I think a big part of this is because it's a lot more complicated that gpg to setup and use. I'm hoping to create some easy step by step articles to make this easier so more people will use it.

I worry that even if TUF is implemented on pypi that if it's too difficult for developers to use then no one will bother. If I build a perfect security system and then no-one uses it then overall we will not be better off. For example If we can get a majority of maintainers to add gpg signatures that will be a big improvement. If you required gpg or TUF before someone is allowed to publish to pypi this could be a big advantage.

When TUF is implemented on pypi will this be a requirement? How can we make sure this is a simple process so developers won't find it to be a burden? Basically it will need to be easier than dockerhub if that is the indication on adoption rate.

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Oct 6, 2017

Member
Member

ncoghlan commented Oct 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment