Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip should not execute arbitrary code from the Internet #425

Closed
glyph opened this issue Jan 4, 2012 · 67 comments
Closed

pip should not execute arbitrary code from the Internet #425

glyph opened this issue Jan 4, 2012 · 67 comments
Labels
auto-locked Outdated issues that have been locked by automation type: security Has potential security implications

Comments

@glyph
Copy link

glyph commented Jan 4, 2012

When you 'pip install' something, it fetches the code from the internet, and then executes it. If you follow the advice of many projects and 'sudo pip install' something, pip executes that code from the internet as root.

pip does not do TLS certificate verification, nor does it do package signature verification, nor does it even do DNSSEC. There is no assurance whatsoever that the code being installed came from the intended source. The archetypical hipster hacker doing a 'pip install django' over some cafe's wifi will be pwned within seconds if the DNS for pypi.python.org happens to be spoofed.

I believe that this might be addressed by #402 but that deals with a bunch of other issues as well, and I felt there should be a report somewhere about this somewhat well-known deficiency in pip's download and update procedures.

@carljm
Copy link
Contributor

carljm commented Jan 4, 2012

Indeed, thanks for the concise summary. In addition to TLS cert verification and package signature verification, we should also have an option to forbid downloading any off-PyPI sdist that isn't served by HTTPS.

@glyph
Copy link
Author

glyph commented Jan 4, 2012

Ultimately, package signature verification is the main thing. If the bytes are properly signed and authenticated, the transport can be any old insecure crud and it shouldn't matter: the software being executed is the right software regardless of how it got there.

@kumar303
Copy link
Contributor

kumar303 commented Nov 2, 2012

Using requests + certifi would be an easy way to add proper cert checking.

@qwcode
Copy link
Contributor

qwcode commented Nov 2, 2012

if pip gets support for "wheel" (see this fork: https://github.com/qwcode/pip), we'd be doing this for wheels at some pt at least, since the wheel spec provides for it, but @dholth can speak to that better than me.

wheel docs: http://wheel.readthedocs.org/en/latest/

@ioerror
Copy link

ioerror commented Nov 7, 2012

The main reason not to rely on package signatures alone is that old signatures can be replayed. Defense in depth seems to be a reasonable idea when it comes to installing and updating code.

A rather good overview of the entire nightmare was written by Cappos et al:

https://www.updateframework.com/

@wyuenho
Copy link

wyuenho commented Feb 3, 2013

Any progress on this? I'd really like to see this issue be given the highest priority given the recent attack on Rubygems.org. Package authors aren't going to sign their packages unless they know the installer supports it.

@kirubakaran
Copy link

+1

3 similar comments
@beaumartinez
Copy link

+1

@byrongibson
Copy link

+1

@reidrac
Copy link

reidrac commented Feb 3, 2013

+1

@PaulMcMillan
Copy link

This is a little easier today than it was a year ago when we last talked about it. Pip no longer supports python 2.4, which caused much trouble. Python 2.5's SSL support is stoneage at best, but most users are on 2.6 or better. If pip included the backported hostname checking code from 3.2 (http://pypi.python.org/pypi/backports.ssl_match_hostname/3.2a3) and only validated certificates on python 2.6 and newer (the same way mercurial does), this might be possible with a relatively small patch.

@kumar303
Copy link
Contributor

kumar303 commented Feb 3, 2013

with that code, where does the CA bundle come from? Wouldn't pip also need something like certifi? It looks like you can't get root certs out of the box on 2.6+.

@qwcode
Copy link
Contributor

qwcode commented Feb 4, 2013

Hello, I'm one of the pip maintainers. I don't claim to have the security expertise to lead this effort, but i'm certainly interested in helping anyone who's willing to attempt pull requests when it comes to the basics of code placement and writing tests.

@zyga
Copy link

zyga commented Feb 4, 2013

Hi.

I'm writing a small subsystem that can be plugged into pip (but also into any other tools, including ruby word), that manages trust to stuff downloaded from the Internet. Ping me on twitter @ZYGOON, here on github (zyga) or irc (again zyga) if you are interested in helping out.

@reidrac
Copy link

reidrac commented Feb 4, 2013

distutils support package signing with GPG: http://docs.python.org/2/distutils/uploading.html

It creates a PACKAGE.asc file that pip could potentially download and verify with gpg (adding a flag to pip, not by default). It won't solve the key management problem, but at least if you're interested you can get the gpg key of the developer(s) and add them to your keyring so the signature can be verified.

That could be a good start.

PyPI should then encourage packagers to sign the packages (may be including a "how to" for gpg newbies; see create key, make backup, create a revocation cert, make backup, potentially export the key to a keyserver, etc).

Potentially it would be a good recommendation that the author_email from setup.py matches the gpg key email, so it can be checked by pip.

@zyga
Copy link

zyga commented Feb 4, 2013

@reidrac That is insufficient, for all it does it allows anyone to do a MITM attack by repackaging any software as "Joe User" that has a valid GPG signature (for that user).

@zyga
Copy link

zyga commented Feb 4, 2013

I've started working on a tool that could be integrated with pip (and other tools) to verify downloaded software. It does not require SSL or any trusted networking of any kind. Have a look and help me design and implement it: https://github.com/zyga/distrust

@dholth
Copy link
Member

dholth commented Feb 4, 2013

With digital signatures you would probably want a system that trusts
the signing key per-package. For example, I would accept the Django
publisher's key for Django but not for Turbogears.

On Mon, Feb 4, 2013, at 08:37 AM, Zygmunt Krynicki wrote:

[1]@reidrac That is insufficient, for all it does it allows anyone
to do a MITM attack by repackaging any software as "Joe User" that
has a valid GPG signature (for that user).

Reply to this email directly or [2]view it on GitHub.

References

  1. https://github.com/reidrac
  2. pip should not execute arbitrary code from the Internet #425 (comment)

@zyga
Copy link

zyga commented Feb 4, 2013

@dholth yes, this is exactly what distrust aims to implement

@reidrac
Copy link

reidrac commented Feb 4, 2013

@zyga There's no code in your repo, but as spec it looks interesting. Looks like a good answer for pip signature verification.

@zyga
Copy link

zyga commented Feb 4, 2013

Code is coming this evening, I'm still working on it and I'm busy doing my regular job stuff ATM

@dholth
Copy link
Member

dholth commented Feb 4, 2013

The most engineered [Python] update security system is probably https://www.updateframework.com/ . It has a lot of interesting ideas, most importantly the ability to survive certain types of key compromises.

@jsullivanlive
Copy link

+1 last PyCon (or the one before?) a speaker was going to show us how to intercept the pip communication via injecting a packet before pip could respond. I love the idea that all my pip packages are signed so I can use 3rd party repos or mirrors without worrying.

@dholth
Copy link
Member

dholth commented Feb 4, 2013

@zyga please read about:

SDSI/SPKI: http://crypto.stackexchange.com/questions/790/need-an-introduction-to-spki-or-spki-for-dummies

Wheel signatures: http://www.python.org/dev/peps/pep-0427/#signed-wheel-files
(the wheel repository at https://bitbucket.org/dholth/wheel/src/e783bb5d75fe392294e018b405a40b788fa69d5d/wheel/signatures/keys.py?at=default has a mechanism for keeping track of key - package trust). Wheel uses JSON web signatures which are very easy to implement.

http://tack.io

http://convergence.io

Did you know you can use the ssh-agent to do public key signing and verification?

@zyga
Copy link

zyga commented Feb 4, 2013

@dholth I read all of that quickly but I don't know which part of that I should find interesting. Correct me if I skipped something essential.

Wheel signatures are good but they are in no way improving over the existing signatures for source tarballs. Note that I'm not implementing a crypto system or a certificate authority replacement as that is all not really solving the problem for software distribution (so what that code is signed if anyone can sign it).

As for all the other things, how are they going to improve the situation? Code signing in itself is not useful for anything as anyone can sign everything. The idea I proposed builds a thin layer of trust semantics on top of the existing GPG system. Do you think I could reuse any of the tools you've mentioned to implement that faster/better/more correct?

@zyga
Copy link

zyga commented Feb 4, 2013

The wheel command is pretty much identical to what I've proposed but weaker as 1) It cannot take advantage of existing GPG identity network 2) has no support for improving trust to unsigned files.

It's still interesting though as other ideas seem to match exactly to what I wrote

@radiosilence
Copy link
Contributor

Signed packages and using verified SSL by default are two separate issues. The former is more difficult to do (every developer has to sign their packages), whereas the latter, I'm honestly shocked doesn't happen. Even a simple one line fix of changing the default index to https://pypi.python.org/simple/ would go some way, but verifying SSL certificates is a must.

@jsullivanlive
Copy link

Are there any all-python solutions for signing? That may make it more likely to work cross-platform without a lot of overhead (/me looks at windows).

@pnasrat
Copy link
Contributor

pnasrat commented Feb 4, 2013

Even if we used certifi the cert is a cacert one, which IIRC is not in the Mozilla bundle

https://bugzilla.mozilla.org/show_bug.cgi?id=215243#c158

subject=/CN=pypi.python.org
issuer=/O=CAcert Inc./OU=http://www.CAcert.org/CN=CAcert Class 3 Root

@nejucomo
Copy link

nejucomo commented Jul 8, 2013

My impression is that this ticket is too vaguely specified so that comments will grow endlessly, but the ticket will never be closed, or if it is, some people will complain that it should not be.

So:

I propose we replace this ticket with different tickets which are more specific and distinct, such as those for TLS verification, and those related to package signatures.

If you think of more specific issues, please create more specific tickets and cross-link them here. I'm just a pip fan, not a core part of the community, so if anyone prefers not following my suggestion, please speak up.

I'll start the ball rolling with: ticket #1035 with a package signature verification "hook" that could allow people to experiment and users to choose and opt-in to their preferred scheme.

@isislovecruft
Copy link

@dstufft Awesome. This is all really good to hear! Especially that 1.4 will allow opting out of pip's URL crawling via a flag, because then it can be forcefully enabled per project via requirements.txt. I was about one more email away from sending my bug report and POC to cve-assign@mitre.org for pip's crawling behaviour. If this is still helpful to push distros to update their packages, I can still do so. And, of course, I can give you/the PyPI team my audit notes and discussions of the current/recent CVEs. I am really glad that you all are working on more secure update mechanisms. Thanks a lot! :)

Though, I must point out -- as others have on this ticket -- that the following are separate issues:

  • SSL verification
  • package integrity checking
  • developer/code signature verification

The first is happily completed. I have yet to re-audit it to confirm.

The second...well, pip and PyPI both default to md5. In PyPI (please correct me if I missed something!), there doesn't seem to be a way to give SHA256 or similar-grade hash digests to package URLs. I was glad to find that there is now a "links" interface in PyPI for maintainers to control what URLs pip downloads from, though if there is a way to specify an alternative hash digest which is actually checked on the client end, I missed it.

The brief version of why you should not use MD5:

  • In 1993, just one year after MD5 was developed by RSA, den Boer and Bosselaers developed a partial pseudo collision for MD5. [0]
  • In 1996, the same two researchers developed a collision for the compression function, which is when cryptographers started recommending using alternatives. Wang Xiaoyun et. al. at Shandon University, China, produced collisions for full MD5 in 2005. [1] Though as is clear in a paper on replicating Wang et. al.'s methods, [2] as well as numerous other publicly available cryptography papers (no links because I'm being lazy), Wang is not exactly known for politely sharing her research. (This is especially worrisome. Details next section)
  • In 2008, several researchers, Sotirov, Stevens, Lenstra, Molnar, Appelbaum (@ioerror), Weger, and Osvik, were able to create a rouge interediary CA certificate for RapidSSL which which appeared legitimate if checked with MD5. [3]
  • In 2010, Feng and Xie posted a squeamish ossifrage, claiming to have found a single-block collision for MD5, to see if anyone could find another single-block collision. [4]
  • Marc Stevens, in 2012, found such another, reporting "[...] we are able to find collisions for MD5 in about 224.1 compressions for recommended IHV's which takes approx. 6 seconds on a 2.6GHz Pentium 4." [5]

And on to the short version of why you should not switch to SHA1:

  • In 2005, Wang et. al. (mentioned above) released two papers on partial collision attacks on full SHA1. [6] They aren't known for sharing details, sometimes for years, and I would not be the least bit surprised if they announced a full collision within the next two years.
  • In 2006, a classification of the as-then known types of disturbance vectors for collisions on SHA1 was made, with a cost-time computation for all known attacks. [7] Wang et. al.'s disturbance vector ranked highest, with an estimated time complexity of log_2 66.

For the third issue, code signature verification, there should be a way to specify which developer is allowed to sign which packages. see my above post on this ticket.

@nejucomo I am happy to split my responses into multiple tickets, as the pypa devs see fit. I'm not a contributor either, and if there is some pattern to project management I haven't been able to decipher it. Don't wanna mess with their flow. :)

References:
0. den Boer, B., & Bosselaers, A. (1994, January). Collisions for the compression function of MD5. In Advances in Cryptology—EUROCRYPT’93 (pp. 293-304). Springer Berlin Heidelberg. http://www.cosic.esat.kuleuven.be/publications/article-143.pdf

  1. Manuel, S. (2011). Classification and generation of disturbance vectors for collision attacks against SHA-1. Designs, Codes and Cryptography, 59(1-3), 247-263. http://eprint.iacr.org/2008/469.pdf
  2. Black, J., Cochran, M., & Highland, T. (2006, January). A study of the MD5 attacks: Insights and improvements. In Fast Software Encryption (pp. 262-277). Springer Berlin Heidelberg. http://www.iacr.org/archive/fse2006/40470265/40470265.pdf
  3. Sotirov, A., Stevens, M., Appelbaum, J., Lenstra, A., Molnar, D., Osvik, D. A., & de Weger, B. (2008). MD5 considered harmful today. Creating a roque CA certificate. http://www.win.tue.nl/hashclash/rogue-ca/
  4. Xie, T., & Feng, D. (2010). Construct MD5 Collisions Using Just A Single Block Of Message. IACR Cryptology ePrint Archive, 2010, 643.
  5. http://marc-stevens.nl/research/md5-1block-collision/
  6. X. Wang, Y.L. Yin, and H. Yu. Finding Collisions in the Full SHA-1. In V. Shoup, editor,
    Advances in Cryptology – CRYPTO 2005, volume 3621 of Lecture Notes in Computer Science,
    pages 17–36. Springer-Verlag, 2005. http://f3.tiera.ru/2/Cs_Computer%20science/CsLn_Lecture%20notes/Advances%20in%20Cryptology%20-%20EUROCRYPT%202005,%2024%20conf.(LNCS3494,%20Springer,%202005)(ISBN%203540259104)(588s).pdf#page=48
  7. Manuel, S. (2011). Classification and generation of disturbance vectors for collision attacks against SHA-1. Designs, Codes and Cryptography, 59(1-3), 247-263. http://eprint.iacr.org/2008/469.pdf

@dstufft
Copy link
Member

dstufft commented Jul 8, 2013

If you want to fire off a CVE for it I'll gladly include it in the release notes. I was going to figure out how to do it myself to be honest but I don't care how it happens :)

As far as Hashes go, pip itself doesn't default to anything. It can use any hashing algorithm supplied by an url that is guarenteed to be in hashlib (notably this is md5, sha1, and any of the sha-2's). I added this I think in 1.2? Maybe 1.3 so that I could use sha256 hashes on Crate.io.

The md5's come from PyPI itself and currently they are still md5's because of setuptools/easy_install which only support md5. As far as I'm aware it is not currently feasible to generate a second preimage attack against md5 (if you know of an attack that allows this please please tell me so I can use it to convince people we need to switch). This is another thing on my list of things I want to fix but I have currently put it on the back burner due to there being no preimage attack on md5 that I was aware of.

As far as pip itself goes unless ultimately a different scheme than #hashfunc=hash is devised it's already prepared to handle other hashes, it just needs the index to give them to it. There could be an argument against pip allowing md5 at all but again until there's either a preimage attack or PyPI itself switches that is unlikely to gain much traction.

As far as package signatures go #1305 was recently opened and is probably the best place to talk about that currently. Again this was something on my list and was punted to deal with more pressing issues.

I should probably also mention that any changes to the hash function or package signatures will likely need to go through distutils-sig and go through the bikeshedding contained within.

@westurner
Copy link

Though, I must point out -- as others have on this ticket -- that the following are separate issues:

Documentation links from here, reddit (2), [...] :

PEPs

JSON metadata (pymeta.json, pymeta-dependencies.json) is generated from setup.py ... http://hg.python.org/peps/file/default/pep-0426/pymeta-schema.json

Python

PIP

Setuptools

Distlib

Narrative Documentation

Glossaries

Reddit threads:

tag: https://en.wikipedia.org/wiki/DevOps

@westurner
Copy link

https://python-packaging-user-guide.readthedocs.org/en/latest/packaging_tutorial.html#create-your-first-release

Source Repository GPG

Python Package GPG (./<package>.asc)

For any archive downloaded from an index, you can retrieve any signature by just appending .asc to the path portion of the download URL for the archive, and downloading that.

Python Wheel JWS S/MIME (PEP 427)

Index Mirror DSA (PEP 381)

[Cryptographic] Hash Functions

x-post to #1035

@dstufft
Copy link
Member

dstufft commented Jul 14, 2013

I'm going to close this ticket as it has no clear goal. Pip no longer downloads things from PyPI without TLS.

If there are specific deficiencies with what pip offers per item tickets should be opened for each one.

@dstufft dstufft closed this as completed Jul 14, 2013
@radiosilence
Copy link
Contributor

Also could I point out that doing sudo pip install is something you should never do? Either install --local, user a virtualenv, or use a system package manager for installing things systemwide.

@dstufft
Copy link
Member

dstufft commented Jul 15, 2013

Using sudo pip install is fine. Not all systems with sudo even have a package manager or often times the system package manager is very outdated or missing that item all together.

pip vs OS packages is a user choice with various trade offs to both sides of the argument.

@radiosilence
Copy link
Contributor

If your system doesn't have a package manager, why not use a virtualenv?

If you are using a system with any kind of package manager, you should never sudo pip install, because then you are messing with a file structure that is supposed to be managed with the package manager, and say a system component requires a specific version of something, you are going to get file conflicts and all kinds of hell.

If a system-wide package requires a python component, the dependency should be resolved with the package manager. If not, virtualenv.

I don't think it is a personal choice, because it's a terrible habit and I've seen many systems get screwed up due to it by people who are making a "user choice" and have no idea the implications of what they are doing. If you are using, say, OSX for development, use a virtualenv, if you are creating a distributable package, then use brew (or whatever) to install Python dependencies (and if brew doesn't have the package, change that by making a brew package).

For instance - I want to run uWSGI as a system-wide daemon. The version of uWSGI in my package manager (let's say Debian wheezy) is totally out of date. So, I create a virtualenv in /opt/uwsgi and install it there, then have my init script reference /opt/uwsgi/bin/uwsgi, and don't fuck up my system.

It's just being responsible.

@dstufft
Copy link
Member

dstufft commented Jul 15, 2013

If my system doesn't have a package manager how am I supposed to install virtualenv ;) Also needing to activate a virtualenv for using command line tools is ugly.

I don't know what Linux systems you use, but in my preferred Debian based ones sudo pip install installs to /usr/local which, as far as I understand the FHS, is outside of the domain of the system package manager and is intended for just such use.

@radiosilence
Copy link
Contributor

Ok, makes sense - though I would explicitly check your distro does this first. But then, what's the problem with doing pip install --user and then adding ~/.local/bin to your $PATH?

@dstufft
Copy link
Member

dstufft commented Jul 15, 2013

It's only available to my user :) Get's annoying when I'm bouncing between different accounts for different things.

@westurner
Copy link

AFAIK, sudo pip install (like every other unchecked source of executable code) should not be run as root on an actual system. Debian mitigates around this with fakeroot and virtual containers for building packages. Files built are signed and checksummed.

If pip dropped privs to the minimum it needs to install into a path in PATH and sys.path, it would still be a risk to execute sudo pip install.

Could someone indicate to me how live.sysinternals.com is more or less of a risk than just 'sudo rm *'?

Don't open (executable) email attachments. Don't sudo pip install.

@westurner
Copy link

I like virtualenv and virtualenvwrapper alot. I like not having write permission to scripts that I execute often (e.g. --user).

Does pip support --prefix? Does pip do snapshots/backups prior to installation?

@merwok
Copy link

merwok commented Jul 15, 2013

how am I supposed to install virtualenv

You don’t have to: just download virtualenv.py and run it. From that first venv, you can get pip, venvwrapper, etc. Yes, it’s per-user.

needing to activate a virtualenv for using command line tools is ugly.

I add the bin directory of the bootstrap venv I describe above to my PATH.

@radiosilence
Copy link
Contributor

Also, perhaps one could avoid depending on pip --user (which I find irritating) by running virtualenv ~/.local

@westurner
Copy link

Also, perhaps one could avoid depending on pip --user (which I find
irritating) by running virtualenv ~/.local

For the sake of consistency, a mkvirtualenv local and a symlink to
$VIRTUAL_ENV/local make it easier to work with virtualenvwrapper.

@merwok
Copy link

merwok commented Jul 15, 2013

I’m not sure consistency is an argument here. The venvs that I manage with virtualenvwrapper are used for my Python projects and can be deleted or re-created at all time, but the global/bootstrap venv I created in ~/.usr (or ~/local in your example) is not throw-away: I depend on having things installed in there (mostly scripts/programs, not libs).

@westurner
Copy link

This is way OT. To each their own. I should have been more clear. I believe consistency to be an argument here because I like to consistently apply the same tools and processes to managing virtual environments. Commands like cpvirtualenv, cdvirtualenv, and cdsitepackages are not present with a (special-cased) plain-old virtualenv.

When I backup a virtualenv, I usually only need the pip freeze and contents of ./src. When I backup a homedir, I usually don't want to copy compiled, version-specific objects into a backup archive. (So I install scripts/programs as a dotfiles egg with a ./scripts folder and/or [console_scripts] entrypoints in setup.py.)

@nejucomo
Copy link

I consider the discussion about user permissions and install locations when running pip to be orthogonal to this ticket, which is about TLS verification. So, I created #1169 to capture that orthogonal issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: security Has potential security implications
Projects
None yet
Development

No branches or pull requests