Skip to content
This repository has been archived by the owner. It is now read-only.

Can register packages that match system packages #585

Closed
stestagg opened this issue Jan 9, 2017 · 7 comments

Comments

@stestagg
Copy link

commented Jan 9, 2017

tl;dr: Allowing people to register 'json' (or any standard lib module) package on pypi provides no advantage, and should be prevented

During a python event in London, we noticed that pypi allows registering packages with the same name as standard library packages (for example 'sys'!)

This isn't a massive security issue per-se, as pypi package contents are untrusted anyway, but it feels like something that's trivial to prevent, and stop any bad actors from exploiting this going forward.

One hypothetical attack might go something like:

  1. Attacker registers 'sys' module on pypi, uploads malicious payload
  2. Inexperienced python user runs 'pip install sys' because "you use pip to install new modules"
    Simple case:
  3. Attacker now running code on users' machine
  4. User starts up python, and runs 'import sys' and this works, because the sys module is built-in, so suspects nothing

More complex case:
3. Attacker code prompts user: 'sys package is system-related to has to be installed with sudo (even when using vagrant)', and traces back
4. User re-runs with sudo
5. Attacker owns user's machine

There is also the possibility that people have written automatic requirements.txt creators that scrape imports to work out dependencies. In this situation, imports of built-in packages will end-up in requirements files too.


The above may seem fairly unlikely, but when I noticed this issue, (I emailed the security contacts on the pypi site same-day but got no reply) I proactively registered many standard library package names*, using a dummy payload that does nothing more than raise an exception informing the user not to install from pypi.

This meant that I could use the pypi download logs to measure downloads:

The following table shows the total number of downloads of my most popular system packages during December 2016 (where the installer was pip):

project num_downloads
json 10710
sqlite3 5541
cpickle 5156
platform 4616
os 4060
sys 3441
sqlite 2152
math 1838
socket 1837
string 685
tempfile 262
stat 59

For reference, the query that generated this is:

SELECT 
  file.project, 
  count(*) as num_downloads
FROM `the-psf.pypi.downloads201612*`
WHERE LOWER(file.project) in ('cpickle','json', 'sqlite3','sqlite', 'marshal','math','os','pickle','platform','socket','stat','string','sys','tempfile') 
and details.installer.name = 'pip'
GROUP BY file.project
ORDER BY num_downloads desc

I'm sure there are many automated build scripts trying to install these packages, but the data indictates that pip is being used to install these packages from many different subnets, countries, versions of python and package installers.

As the owner of these packages, I don't mind them being taken off me, or access to them disabled as part of any fix.


(*) I thought a lot about if this was the right thing to do, but decided on this approach based on several factors:

  • There isn't any realistic useful purpose for anyone using these package names, so 'squatting' them wouldn't cause any real project any issues
  • Listing the packages on pypi could add to people's confusion, (if they saw a package listed called 'sys' then they might think it was needed) - I'm hoping the fix to this bug resolves this issue
  • Adding the dummy package (that traces back) was the most uncertain part of this, but I believe it was important to find out how significant a vector the package thing was, so I decided that having the numbers was better than not uploading a dummy package.
  • The dummy version gives users more information, and more useful information than registering the package with no uploaded version, and I know what I uploaded was safe.
  • Download volumes are fairly high on these packages, so the moment I disclosed this issue, anyone with malicious intent could start to exploit this problem. Therefore, my only option was to register as many modules as I thought might cause problems.
@nicktimko

This comment has been minimized.

Copy link

commented May 21, 2017

Saw https://hackernoon.com/building-a-botnet-on-pypi-be1ad280b8d6

I'm curious what other package managers do as saying "install X from Y" requires that you fully trust both X and Y. The common names are probably good to squat, but it's a pick-two-of-three: be open for people to register, safe, and cheap (labor-wise).

@llazzaro

This comment has been minimized.

Copy link

commented May 23, 2017

I always have the idea that someone could do this.

@berkerpeksag

This comment has been minimized.

Copy link
Member

commented May 23, 2017

Thank you for the detailed report! Could you please start a discussion on distutils-sig? Most of the packaging experts follow that list so you will more likely get a feedback there. We can still use this issue to discuss or review implementation details if an agreement is reached on the mailing list.

@nicktimko

This comment has been minimized.

Copy link

commented May 23, 2017

Related: [Taking over 17000 hosts by] Typosquatting programming language package managers. There the researcher instrumented the package's setup.py to submit information from the client's system back to a server.

Crawling the 404 logs of PyPI for multiple failed install requests across multiple users/IPs to the same package and blacklisting/squatting them would be a good proactive step. Basically: crowd-source the names to protect.

@jamadden

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

For reference, there is at least one other report in this repository about a PyPI package with a stdlib name ("logging") being available and actually causing issues. It doesn't seem to have malicious intent and I can't get it to break pip in the way the report describes, but it does still exist.

@pradyunsg

This comment has been minimized.

Copy link
Member

commented Jun 16, 2017

@dstufft I don't know if Warehouse fixes this currently?

Should I file an issue there?

@dstufft

This comment has been minimized.

Copy link
Member

commented Jun 17, 2017

Warehouse doesn't prevent it currently No. An issue would be fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants
You can’t perform that action at this time.