Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block package names that conflict with core libraries #2151

Closed
GadgetSteve opened this issue Jun 28, 2017 · 26 comments
Closed

Block package names that conflict with core libraries #2151

GadgetSteve opened this issue Jun 28, 2017 · 26 comments
Labels

Comments

@GadgetSteve
Copy link

@GadgetSteve GadgetSteve commented Jun 28, 2017

It has been pointed out online, on Hacker Noon, that the current PyPI allows people to register and upload packages with the same names as core python libraries which presents a potential attack vector as pip -U will "upgrade" the core library to the uploaded package, which may be given as a dependency of some other package.

Anybody, with the possible exception of the core python developers, trying to do this should definitely fail with an error message and possibly be flagged as suspicious activity.

I have tried to suggest blocking any upgrades to core packages at pip level, in 4527, but there is a consensus that this is really a problem at the PyPI/Warehouse end.

@jonemo
Copy link
Contributor

@jonemo jonemo commented Sep 15, 2017

What would be the correct/best way to compile the list of standard library modules that should be blocked? I am aware of the standard library module index at https://docs.python.org/3/py-modindex.html However, that only covers the CPython 3.6 standard library. Other Python implementations have additional modules (e.g. IronPython has clr for example). Occasionally, module names change between versions (e.g. xmlrpclib vs xmlrpc and copy_reg vs copyreg from 2.7 to 3.0).

In summary: The first step to dealing with this is to compile an authoritative list of package names.

It seems like the only place where the name of the uploaded package is checked is here. If that's true, the only blocked package names are requirements.txt and rrequirements.txt. Note that I'm very new to this codebase, this is definitely worth double checking.

@GadgetSteve
Copy link
Author

@GadgetSteve GadgetSteve commented Sep 15, 2017

@jonemo
Copy link
Contributor

@jonemo jonemo commented Sep 15, 2017

I am quite curious about this issue and would be willing to help move it forward, but after another half hour of background reading, I am not certain whether there is community/maintainer support for this proposal.

A few observations and thoughts (please correct me if I'm wrong):

  • In the discussion in pypa/pip#4527, the agreed on response is that pip should not be responsible for preventing user from installing (potentially) malicious code.
  • The conclusion there is that maybe PyPI can provide this functionality by blocking specific names.
  • It seems like nobody is suggesting going beyond filtering package names (e.g. by inspecting package content).
  • Examples show that package names that could have unintended or dangerous effects on one system, are useful on other systems:
  • Given that the selection of "dangerous" package names is system dependent, it cannot be performed by the package index. I know, this is the opposite conclusion from the one reached in pypa/pip#4527.
@jonemo
Copy link
Contributor

@jonemo jonemo commented Sep 16, 2017

Related PR: #2396

@dstufft
Copy link
Member

@dstufft dstufft commented Sep 17, 2017

One problem to sort out here is what do we do when a new standard library module is added which already has a namespace collision with an existing project on PyPI what should happen? What about if someone wants to backport a new module to older versions of Python?

@jonemo
Copy link
Contributor

@jonemo jonemo commented Sep 17, 2017

List of Python 3.6 standard library packages as text file: https://gist.github.com/jonemo/57c0eeff88ac5495592d4a4f9d60a96b
Script I used to check for existence and author/maintainer of each on PyPI: https://gist.github.com/jonemo/a1c0f4768f2c0aa25e31388c0fd6e377
Output of said script shortly before the timestamp of this comment: https://docs.google.com/spreadsheets/d/15WoAkoaUW1BRSVt9yAOcObHgkWhfQOqUY0_xNbkTwL8/edit?usp=sharing

Stats:

  • standard lib module names that are also PyPI package names: 71
  • of those 71, registered by @GadgetSteve by @stestagg as part of his disclosure: 13
  • standard lib module names that are not PyPI package names: 139

My (relatively uninformed newbie/bystander) suggestion is to:

Possible next steps after this:

  • Review the 58 remaining PyPI-registered packages that clash with standard library names for:
    • malicious content
    • abandoned, unused and otherwise delete-worthy content
  • Collect a list of standard library module names from previous Python versions and add to the list of banned names (e.g. xmlrpclib)
  • Collect list of standard library module names from other Python implementations to add to the list of banned names (e.g. clr from IronPython)
  • Also block obvious cases of "type-squatting" (either manually or automatically via string-similarity metric) to avoid the problem described here
@GadgetSteve
Copy link
Author

@GadgetSteve GadgetSteve commented Sep 17, 2017

@jonemo Nice report but please note that I don't have a single package registered in my name on PyPI the above sounds like I have 13 the registration of those 13 names was performed by @stestagg another Steve I know who did specifically state in pypa/pypi-legacy#585 that "As the owner of these packages, I don't mind them being taken off me, or access to them disabled as part of any fix."
I did raise an enhancement proposal to build filtering into pip pypa/pip#4527 but that was felt not to be worth pursuing at the pip end as it was not treating the root cause and would not address any other package installer hence this ticket.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Sep 17, 2017

https://pypi.org/project/stdlib-list is maintained and appears to be kept up to date. looks like it could be helpful, thanks to @jackmaney

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Sep 18, 2017

with #2409 shipped here's what I see as remaining items to wrap this up:

  • Audit currently registered packages which conflict. (thanks for analysis @jonemo)
  • Remove project names currently prohibited by the blacklist from said list
  • Determine what stdlib modules exist in other Python Interpreters, PR to stdlib_list
  • Improve messaging/documentation (https://pypi.org/help)

Anything else?

I think that

Also block obvious cases of "type-squatting" (either manually or automatically via string-similarity metric) to avoid the problem described here

Is another issue as that will be more difficult problem to get right.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Sep 18, 2017

#2410 addresses messaging/documentation

@jackmaney
Copy link

@jackmaney jackmaney commented Sep 18, 2017

Thank you for using my library (stdlib-list)! I update it after every minor version release (ie the next one will be 3.7). Please let me know if you find something that's missing in any of the lists.

@hangtwenty
Copy link

@hangtwenty hangtwenty commented Sep 20, 2017

Regarding this point

[Blocking obvious cases of typo-squatting] Is another issue as that will be more difficult problem to get right.

I understand this hesitation, but -- Perfect is the enemy of good, no? Seems like it could be gotten right enough for the top N most popular downloads. If there is a possibility of going down this path, I would be glad to enlist to help.

@jonemo
Copy link
Contributor

@jonemo jonemo commented Sep 20, 2017

Now that new uploads of stdlib-shadowing names are no longer possible, can someone with the power to do so please remove the dummy packages that have been placed there by @stestagg? See @GadgetSteve's comment for context and pypa/pypi-legacy#585 for a list of these dummy packages.

@GadgetSteve: Apologies for confusing you with @stestagg, who could have known that one Steve reports an issue previously blogged about by another Steve? 😬

@GadgetSteve
Copy link
Author

@GadgetSteve GadgetSteve commented Sep 21, 2017

@jonemo No problem on the confusion - it is not exactly new at work we have, in a different division another with the same first & surname and one in the same office with a surname that sounds similar.
@hangtwenty Just to point out that there are 2 types of typo-squatting one is things like duplicate & transposed letters, (e.g.: urlllib or urlilb), and the other, increasingly popular is UTF-8 mimicry, e.g.: a package called аррӏе (actually u"\u0430\u0440\u0440\u04cf\u0435"), could spoof one called apple. One approach to the latter would be to require all packages to be named with 7 bit ASCII or similar but that has obvious limitations and may not be desirable.

@ncoghlan
Copy link
Member

@ncoghlan ncoghlan commented Sep 21, 2017

@GadgetSteve We do indeed restrict PyPI name registrations to 7-bit ASCII: https://www.python.org/dev/peps/pep-0508/#names

While we don't spell out the reasoning there, the vast array of Unicode confusables is indeed the reason we have that restriction - with ASCII, it's mainly only l1 and O0 that you need to worry about.

As far as the actual typosquatting problem goes, my proposal in #2268 is to distribute the review workload by notifying the maintainers of the projects with similar names, rather than always notifying the PyPI admins (since admin time and attention is a very limited resource). The PyPI admins would then only get direct notifications when registered project names are close to ones on the already prohibited list.

@GadgetSteve
Copy link
Author

@GadgetSteve GadgetSteve commented Sep 21, 2017

@hangtwenty
Copy link

@hangtwenty hangtwenty commented Sep 21, 2017

This might be obvious to people but for calculating the similarity we could use Levenshtein distance.

Relevant blog post by the way:

@stestagg
Copy link

@stestagg stestagg commented Sep 23, 2017

Please only remove my packages if the name blocking is applied to pypi as well as warehouse!

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Sep 23, 2017

@stestagg blocking of names only occurs on upload of a new package name and all such uploads must now be via warehouse, so we’re good here!

@stestagg
Copy link

@stestagg stestagg commented Sep 23, 2017

ok, cool, I wasn't aware that had happened :)

@GadgetSteve
Copy link
Author

@GadgetSteve GadgetSteve commented Sep 24, 2017

Very happy with the outcome.

Apologies to @stestagg for not CCing on the original submission of this ticket.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Jun 20, 2019

Thanks to a helpful nudge from @brainwane... Audit of existing projects in conflict follows. I was able to quickly assess some modules based on authorship/ownership. Also quickly remove some of them which do not have any files or download links.

abc			deleted
argparse		valid	
ast			needs inspection
asyncio			valid
buildtools		needs inspection	
calendar		needs inspection
cd			needs inspection
chunk			needs inspection
code			deleted
colorpicker		needs inspection	
commands		deleted
compiler		needs inspection	
configparser		valid
contextvars		valid	
csv			valid
ctypes			needs inspection
dataclasses		valid
datetime		valid
device			needs inspection
dis			needs inspection
distutils		needs inspection
dl			needs inspection
email			needs inspection
enum			valid
exceptions		needs inspection	
faulthandler		valid
formatter		needs inspection	
framework		needs inspection
functools		needs inspection
gl			needs inspection
hashlib			needs inspection
hmac			needs inspection
html			needs inspection
html-parser		needs inspection
htmlparser		needs inspection
http			needs inspection
http-client		needs inspection
imp			deleted
importlib		valid	
importlib-resources	valid		
io			needs inspection
ipaddress		needs inspection
jpeg			needs inspection
logging			needs inspection
logging-config		needs inspection
mailbox			needs inspection
modulefinder		needs inspection
multiprocessing		valid
nav			needs inspection
new			needs inspection
numbers			needs inspection
parser			needs inspection
pathlib			valid
pipes			deleted
pprint			needs inspection
queue			deleted
readline		needs inspection	
repr			needs inspection
resource		needs inspection
secrets			needs inspection
select			needs inspection
selectors		needs inspection
sets			needs inspection
shelve			needs inspection
signal			needs inspection
ssl			valid
statistics		needs inspection	
test			needs inspection
time			needs inspection
token			needs inspection
trace			needs inspection
turtle			needs inspection
typing			valid
unittest		needs inspection	
uuid			needs inspection
w			needs inspection
wave			needs inspection
wsgiref			valid
xmlrpclib		needs inspection
@brainwane
Copy link
Member

@brainwane brainwane commented Jun 27, 2019

Thanks @ewdurbin - so the owners of those packages need to consider the name conflict? OK for me to open a new packaging-problems issue with that list?

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Jun 28, 2019

@brainwane next step would be for someone to take a closer look at all of those listed above as needs inspection. some will be valid and can/should remain on PyPI. I'm not sure that any action from project owners with existing conflicting names is needed.

@mertzjames
Copy link

@mertzjames mertzjames commented Dec 5, 2019

@brainwane @ewdurbin what's the status of this effort? Do you need any help?

@brainwane
Copy link
Member

@brainwane brainwane commented Dec 20, 2019

Heads-up @xmunoz in case you want to skim this issue for background

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.