Detect packages being published with typo'ish names #4998

aaronlelevier · 2018-11-02T22:27:12Z

What's the problem this feature will solve?
Prevent malicious packages being published with typo'ish names

Describe the solution you'd like
I'd like to propose an algorithm that blocks malicious packages with similar names to well known packages from being published.

Recently there were articles about 12 malicious packages found. Several of them had names very close to Django, and as an avid Django user, this got my attention.

An algorithm could be used that uses Levenshtein distance combined with other input features like number of similar file names, number of similar code lines compared to legitimate packages of a similar name. If there is a close resemblance, then the package could be initially blocked from being published until a human reviews it or permanently blocked.

The algorithm could also be a lot more sophisticated, something such as Android's algorithm that uses machine learning to detect malicious apps and measures over 700+ features I believe.

I am just proposing something of this nature if it hasn't already been proposed.

Additional context
Here is the article link that I am referencing:

https://www.zdnet.com/article/twelve-malicious-python-libraries-found-and-removed-from-pypi/

Thanks,
Aaron

di · 2018-11-03T00:03:40Z

This is more or less the same as #2268, so I'm going to close this as a duplicate, but thanks for the feature request!

aaronlelevier · 2018-11-03T00:14:06Z

@aaronlelevier thanks for taking a look at my issue. I'll check the existing issue then. Thanks.

brainwane · 2019-06-21T01:49:31Z

I'm actually going to reopen this, because I think it would be useful to have this issue (about typosquatting prevention/detection before/during upload) distinct from #2268 (which is about notifications, alerts, a "packages with similar names" widget, etc.). Thanks @aaronlelevier!

brainwane · 2019-06-21T01:56:53Z

Today I discussed this idea -- checking for typosquatting, pre-upload -- with @dstufft and @ewdurbin. It would be pretty hard to do this without LOTS of false positives. Donald mentioned a person at Netflix whose approach was: remove the dashes from popular project names, register the resulting strings.

We could increase the scope of our current normalization rules to cover more scenarios -- there will be existing collisions, including with that preemptive registration project.

In any case, this kind of checking ought to be built as part of a pipeline where automated systems run checks, and then flag packages/projects for deletion/review/ok by PyPI admins.

brainwane · 2019-09-02T17:01:05Z

Per conversation last week: We'll be addressing this problem during upcoming work on automated detection of malicious uploads/typosquatting. First we'll need to develop good tools to detect and flag the pytosquatting/typosquatting, then we'll add tools in that pipeline for PyPI to automatically prevent/reject publication of packages that hit a certain "hey, that looks dodgy" score.

xmunoz · 2020-02-18T23:47:05Z

PR #7377 has been merged. If someone wants to contribute such a malware check, the documentation for how is here: https://warehouse.pypa.io/development/malware-checks/

pradyunsg · 2020-09-17T05:38:11Z

From pypi/support#526 (comment):

One idea that has been floated before is automatically blocking any new project name that currently have a non-trivial amount of 404 requests served by PyPI for it (or at least requiring approval). This would easily identify internal-only project names like this and prevent them from being used maliciously.

brainwane · 2020-10-16T22:20:09Z

New LWN piece covers recent analysis of typosquatting attacks on PyPI and efforts to combat it.

Julian · 2021-05-13T21:53:00Z

Are we aware of this paper from March 2020 which investigated typosquatting on PyPI? Or have the authors reached out?

I've only skimmed the abstract, and haven't looked at the tool they say they developed there, but it seemed like an interesting read.

(I'm finding this issue after seeing #9527).

di closed this as completed Nov 3, 2018

brainwane added this to the Package signing & detection/verification milestone Jun 21, 2019

brainwane reopened this Jun 21, 2019

This was referenced Jun 21, 2019

admin interface for review of flagged packages #6062

Closed

post-registration alerts for packages with similar names (typosquatting) #2268

Open

brainwane added the feature request label Sep 2, 2019

brainwane changed the title ~~Prevent malicious packages being published with typo'ish names~~ Detect packages being published with typo'ish names Sep 2, 2019

di mentioned this issue Aug 5, 2020

Security research packages removed pypi/support#526

Closed

xmunoz mentioned this issue Apr 26, 2021

Productionize Malware Detection psf/fundable-packaging-improvements#38

Open

jspeed-meyers mentioned this issue May 13, 2021

Reduce Typosquatting Harm via Social Distancing for Top PyPI Packages #9527

Open

snyk-bot mentioned this issue Jul 28, 2021

[Snyk] Upgrade uglify-js from 3.2.1 to 3.13.10 MaxMood96/warehouse#3

Open

miketheman added the malware-detection Issues related to automated malware detection. label Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect packages being published with typo'ish names #4998

Detect packages being published with typo'ish names #4998

aaronlelevier commented Nov 2, 2018

di commented Nov 3, 2018

aaronlelevier commented Nov 3, 2018

brainwane commented Jun 21, 2019

brainwane commented Jun 21, 2019

brainwane commented Sep 2, 2019

xmunoz commented Feb 18, 2020 •

edited by ewdurbin

Loading

pradyunsg commented Sep 17, 2020

brainwane commented Oct 16, 2020

Julian commented May 13, 2021

Detect packages being published with typo'ish names #4998

Detect packages being published with typo'ish names #4998

Comments

aaronlelevier commented Nov 2, 2018

di commented Nov 3, 2018

aaronlelevier commented Nov 3, 2018

brainwane commented Jun 21, 2019

brainwane commented Jun 21, 2019

brainwane commented Sep 2, 2019

xmunoz commented Feb 18, 2020 • edited by ewdurbin Loading

pradyunsg commented Sep 17, 2020

brainwane commented Oct 16, 2020

Julian commented May 13, 2021

xmunoz commented Feb 18, 2020 •

edited by ewdurbin

Loading