Publish a list of malicious packages that have been taken down #4703

di · 2018-09-12T15:42:59Z

What's the problem this feature will solve?
Users who may have possibly installed malicious packages don't have insight into what packages have been taken down by PyPI administrators.

Describe the solution you'd like
PyPI should publish both a human-readable and machine-readable (API) list of malicious packages that have been taken down. Ideally the human-readable list would be sortable by package name, or by the date it was created/taken down.

Additional context
Feature request to automatically uninstall packages via this API in pip: pypa/pip#5777

The text was updated successfully, but these errors were encountered:

waseem18 · 2018-09-23T14:19:42Z

@di How do I find packages that have been taken down - from the database point of view? Is there any flag (is_removed or is_malicious) in the table?

di · 2018-09-23T14:48:54Z

@waseem18 Nope, there isn't, so we would have to add that flag and manually infer it from the comments of previously removed packages.

waseem18 · 2018-09-23T15:17:03Z

Okay, So we add the flag to the respective table and we set it to true for any projects we want to mark as malicious.

If I understand you correctly, as the data of already removed packages doesn't exist on our database, we would need to infer it from the Warehouse GH issues.

So after we add the flag, the API call would return any packages that are flagged as malicious + the list we have of already removed packages.

Please do correct me if I'm wrong.

di · 2018-09-23T16:29:34Z

There is a comment field on the BlacklistedProject model: https://github.com/pypa/warehouse/blob/97f28dfa5a4017dd1e1a7630f772ce01ec1af749/warehouse/packaging/models.py#L568

What I meant was that once we add the ability to mark a BlacklistedProject as malicious, there should be some way the administrators can go back and manually set this marker based on the comments we left. There's only about 200 projects right now, so this wouldn't be a terrible burden.

waseem18 · 2018-09-23T17:31:55Z

Gotcha!

So we can add the is_malicious flag to the BlacklistedProject model with default value as false.

And the API end point would query for the entities of BlacklistedProject table with the flag set to true.

pfalcon · 2018-10-13T09:33:15Z

Please be sure to provide the reason for each takedown case - e.g. DMCA request, government/security services involvement, somebody's whim, etc.

di · 2018-10-13T13:53:24Z

This issue is only about malicious packages, which are taken down by the PyPI admistrators at their discretion.

pradyunsg · 2018-10-16T06:16:37Z

Is anyone working on this? I would like to work toward this during the Bloomberg Sprint.

If nothing else, figuring out how this works/is exposed from Warehouse's side should be a good start.

waseem18 · 2018-10-16T06:27:08Z

I'm not working on this @pradyunsg . Feel free to pick it up.

oliviersm199 · 2018-10-27T16:05:42Z

Hey I will pick it up at the Bloomberg NYC Sprint!

oliviersm199 · 2018-10-27T20:08:37Z

#4962 relates to this issue

brainwane · 2019-09-02T16:52:27Z

Blocked on #5117.

di · 2019-09-09T19:52:38Z

Blocked on #5117.

Not necessarily, we manually remove malicious packages sometimes and the ability to automatically detect malicious packages shouldn't prevent us from publishing which packages we've manually taken down.

di · 2019-09-09T19:54:12Z

#4962 mostly implements the first step towards this, but wasn't finished.

di · 2020-05-01T18:13:10Z

Per #7840, this list should include all "blocked" packages along with the reason for blocking, if applicable.

pradyunsg · 2020-05-14T19:01:19Z

we should publish all "blocked" packages along with the reason for blocking, if applicable.

To be clear, you mean providing a publicly accessible list/table of all blocked packages and why they were blocked; and not changing/putting up new releases on that name. Correct?

di · 2020-05-14T19:21:44Z

Thanks, my comment was unclear, updated to 'this list should include all "blocked" packages', I'm not suggesting we actually publish (create releases for) these packages.

zooba · 2020-09-01T20:38:19Z

Is there a need for the flag in the database to distinguish between "blocked" reasons? As long as we're preserving PyPI admin discretion (which I agree with), it seems like that additional sort of information doesn't need to be exposed at this level.

And in my understanding, that would simplify this down to an API and possibly a formatted page (though I'm not totally convinced) that would return the list of blocked names. So all of the previous PR is not needed.

Though given #7840, perhaps we can also return a different status code for blocked names on install (rather than 404)? That would allow installers to handle an exceptional case directly, rather than having to maintain a list from our new API.

Guessing this just needs someone to work on it?

zooba · 2020-09-16T16:31:06Z

For anyone interested, my PR in #8533 works but is probably stalled on having a good path for the API. All the existing JSON APIs are under /pypi/<project-name>, which doesn't leave an obvious place to add this one (short of claiming the project name matching the API). Ernest already rejected putting it under /admin because that path is exclude from CDN.

Happy to receive any suggestions either here or there. I don't have near enough insight into PyPI's routing design to make a confident decision myself.

di · 2020-09-16T16:39:35Z

Probably blocked on #284.

zooba · 2020-09-16T17:03:22Z

If we're going to wait for a complete API redesign and potential technology change, can we just manually dump the list of banned names into a public text file somewhere until that's ready?

tueda · 2021-05-23T07:30:35Z

As far as I understand, PyPI still does not provide any reasonable way to check whether a package has been taken down (I hit a package name that is not listed on PyPI but prohibited). From the viewpoint of a package developer, this is not a good situation; the only way to check whether a package name was already taken by package name squatters and then taken down by admins is to try to squat the package name.

westonsteimel · 2021-12-03T07:11:49Z

@di what do you think about getting these feeding into PyPA Advisory Database? Then it would feed into OSV and anything else consuming those data sources. Of course it'd also be great to get pip-audit able to detect these. I realise that it would involve a few changes since I think currently everything depends on the pypi package JSON existing and it won't for these removed packages, but I think it'd be worth trying. I'm happy to have a go at getting a couple of initial advisory entries created and see what happens from there

di · 2021-12-03T14:47:23Z

@westonsteimel I had the same thought. We could either do that, which is a bit circuitous but will work, or if we decided OSV or the Advisory Database is not the right place for these types of things, we could just take the easier route of including these in the vulnerabilities field based on Warehouse's internal state about these prohibited names.

Let's raise an issue at https://github.com/pypa/advisory-db to decide whether malware/spam/etc should generate advisories.

sethmlarson · 2023-11-06T16:11:44Z

Noting here that there's now an OSV database hosted by the OpenSSF that tracks this information: https://github.com/ossf/malicious-packages

di · 2023-11-06T19:31:29Z

Also, note that that database only includes a fraction of the packages being taken down, there is currently no PyPI -> OSV link that populates that database.

miketheman · 2024-03-20T19:39:26Z

See here for an update on this topic: https://discuss.python.org/t/pypi-malware-observation-report-outcomes-private-preview/49060

di added feature request APIs/feeds labels Sep 12, 2018

oliviersm199 mentioned this issue Oct 27, 2018

Add a tag to specify the reason why a project was blacklisted. #4962

Closed

brainwane added this to the Package signing & detection/verification milestone Jun 20, 2019

brainwane added the needs discussion a product management/policy issue maintainers and users should discuss label Jun 21, 2019

brainwane mentioned this issue Jun 22, 2019

Feature request: Automatically uninstall malicious packages taken down from PyPI pypa/pip#5777

Open

di mentioned this issue Jul 17, 2019

Feed of removed packages #6201

Closed

brainwane added the blocked Issues we can't or shouldn't get to yet label Sep 2, 2019

brainwane removed this from the Package signing & detection/verification milestone Sep 2, 2019

di removed the blocked Issues we can't or shouldn't get to yet label Mar 25, 2020

di mentioned this issue May 1, 2020

redirects for Databased Backed Blacklists #7840

Closed

zooba mentioned this issue Sep 9, 2020

Fixes #4703 Implement API for obtaining prohibited names #8533

Open

westonsteimel mentioned this issue Dec 4, 2021

Should malware/spam/etc which gets removed from pypi generate advisories here? pypa/advisory-database#45

Open

miketheman added the malware-detection Issues related to automated malware detection. label Jan 31, 2024

merwok mentioned this issue Mar 3, 2024

HTTPError: 403 Forbidden from https://upload.pypi.org/legacy/ pypa/packaging-problems#731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish a list of malicious packages that have been taken down #4703

Publish a list of malicious packages that have been taken down #4703

di commented Sep 12, 2018

waseem18 commented Sep 23, 2018 •

edited

Loading

di commented Sep 23, 2018

waseem18 commented Sep 23, 2018 •

edited

Loading

di commented Sep 23, 2018

waseem18 commented Sep 23, 2018

pfalcon commented Oct 13, 2018

di commented Oct 13, 2018

pradyunsg commented Oct 16, 2018 •

edited

Loading

waseem18 commented Oct 16, 2018

oliviersm199 commented Oct 27, 2018

oliviersm199 commented Oct 27, 2018

brainwane commented Sep 2, 2019

di commented Sep 9, 2019

di commented Sep 9, 2019

di commented May 1, 2020 •

edited

Loading

pradyunsg commented May 14, 2020

di commented May 14, 2020

zooba commented Sep 1, 2020

zooba commented Sep 16, 2020

di commented Sep 16, 2020

zooba commented Sep 16, 2020

tueda commented May 23, 2021 •

edited

Loading

westonsteimel commented Dec 3, 2021

di commented Dec 3, 2021

sethmlarson commented Nov 6, 2023

di commented Nov 6, 2023

miketheman commented Mar 20, 2024

Publish a list of malicious packages that have been taken down #4703

Publish a list of malicious packages that have been taken down #4703

Comments

di commented Sep 12, 2018

waseem18 commented Sep 23, 2018 • edited Loading

di commented Sep 23, 2018

waseem18 commented Sep 23, 2018 • edited Loading

di commented Sep 23, 2018

waseem18 commented Sep 23, 2018

pfalcon commented Oct 13, 2018

di commented Oct 13, 2018

pradyunsg commented Oct 16, 2018 • edited Loading

waseem18 commented Oct 16, 2018

oliviersm199 commented Oct 27, 2018

oliviersm199 commented Oct 27, 2018

brainwane commented Sep 2, 2019

di commented Sep 9, 2019

di commented Sep 9, 2019

di commented May 1, 2020 • edited Loading

pradyunsg commented May 14, 2020

di commented May 14, 2020

zooba commented Sep 1, 2020

zooba commented Sep 16, 2020

di commented Sep 16, 2020

zooba commented Sep 16, 2020

tueda commented May 23, 2021 • edited Loading

westonsteimel commented Dec 3, 2021

di commented Dec 3, 2021

sethmlarson commented Nov 6, 2023

di commented Nov 6, 2023

miketheman commented Mar 20, 2024

waseem18 commented Sep 23, 2018 •

edited

Loading

waseem18 commented Sep 23, 2018 •

edited

Loading

pradyunsg commented Oct 16, 2018 •

edited

Loading

di commented May 1, 2020 •

edited

Loading

tueda commented May 23, 2021 •

edited

Loading