Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish a list of malicious packages that have been taken down #4703

Open
di opened this issue Sep 12, 2018 · 27 comments
Open

Publish a list of malicious packages that have been taken down #4703

di opened this issue Sep 12, 2018 · 27 comments
Labels
APIs/feeds feature request malware-detection Issues related to automated malware detection. needs discussion a product management/policy issue maintainers and users should discuss

Comments

@di
Copy link
Member

di commented Sep 12, 2018

What's the problem this feature will solve?
Users who may have possibly installed malicious packages don't have insight into what packages have been taken down by PyPI administrators.

Describe the solution you'd like
PyPI should publish both a human-readable and machine-readable (API) list of malicious packages that have been taken down. Ideally the human-readable list would be sortable by package name, or by the date it was created/taken down.

Additional context
Feature request to automatically uninstall packages via this API in pip: pypa/pip#5777

@waseem18
Copy link
Contributor

waseem18 commented Sep 23, 2018

@di How do I find packages that have been taken down - from the database point of view? Is there any flag (is_removed or is_malicious) in the table?

@di
Copy link
Member Author

di commented Sep 23, 2018

@waseem18 Nope, there isn't, so we would have to add that flag and manually infer it from the comments of previously removed packages.

@waseem18
Copy link
Contributor

waseem18 commented Sep 23, 2018

Okay, So we add the flag to the respective table and we set it to true for any projects we want to mark as malicious.

If I understand you correctly, as the data of already removed packages doesn't exist on our database, we would need to infer it from the Warehouse GH issues.

So after we add the flag, the API call would return any packages that are flagged as malicious + the list we have of already removed packages.

Please do correct me if I'm wrong.

@di
Copy link
Member Author

di commented Sep 23, 2018

There is a comment field on the BlacklistedProject model: https://github.com/pypa/warehouse/blob/97f28dfa5a4017dd1e1a7630f772ce01ec1af749/warehouse/packaging/models.py#L568

What I meant was that once we add the ability to mark a BlacklistedProject as malicious, there should be some way the administrators can go back and manually set this marker based on the comments we left. There's only about 200 projects right now, so this wouldn't be a terrible burden.

@waseem18
Copy link
Contributor

Gotcha!

So we can add the is_malicious flag to the BlacklistedProject model with default value as false.

And the API end point would query for the entities of BlacklistedProject table with the flag set to true.

@pfalcon
Copy link

pfalcon commented Oct 13, 2018

Please be sure to provide the reason for each takedown case - e.g. DMCA request, government/security services involvement, somebody's whim, etc.

@di
Copy link
Member Author

di commented Oct 13, 2018

This issue is only about malicious packages, which are taken down by the PyPI admistrators at their discretion.

@pradyunsg
Copy link
Contributor

pradyunsg commented Oct 16, 2018

Is anyone working on this? I would like to work toward this during the Bloomberg Sprint.

If nothing else, figuring out how this works/is exposed from Warehouse's side should be a good start.

@waseem18
Copy link
Contributor

I'm not working on this @pradyunsg . Feel free to pick it up.

@oliviersm199
Copy link

Hey I will pick it up at the Bloomberg NYC Sprint!

@oliviersm199
Copy link

#4962 relates to this issue

@brainwane
Copy link
Contributor

Blocked on #5117.

@di
Copy link
Member Author

di commented Sep 9, 2019

Blocked on #5117.

Not necessarily, we manually remove malicious packages sometimes and the ability to automatically detect malicious packages shouldn't prevent us from publishing which packages we've manually taken down.

@di
Copy link
Member Author

di commented Sep 9, 2019

#4962 mostly implements the first step towards this, but wasn't finished.

@di di removed the blocked Issues we can't or shouldn't get to yet label Mar 25, 2020
@di
Copy link
Member Author

di commented May 1, 2020

Per #7840, this list should include all "blocked" packages along with the reason for blocking, if applicable.

@pradyunsg
Copy link
Contributor

we should publish all "blocked" packages along with the reason for blocking, if applicable.

To be clear, you mean providing a publicly accessible list/table of all blocked packages and why they were blocked; and not changing/putting up new releases on that name. Correct?

@di
Copy link
Member Author

di commented May 14, 2020

Thanks, my comment was unclear, updated to 'this list should include all "blocked" packages', I'm not suggesting we actually publish (create releases for) these packages.

@zooba
Copy link
Contributor

zooba commented Sep 1, 2020

Is there a need for the flag in the database to distinguish between "blocked" reasons? As long as we're preserving PyPI admin discretion (which I agree with), it seems like that additional sort of information doesn't need to be exposed at this level.

And in my understanding, that would simplify this down to an API and possibly a formatted page (though I'm not totally convinced) that would return the list of blocked names. So all of the previous PR is not needed.

Though given #7840, perhaps we can also return a different status code for blocked names on install (rather than 404)? That would allow installers to handle an exceptional case directly, rather than having to maintain a list from our new API.

Guessing this just needs someone to work on it?

@zooba
Copy link
Contributor

zooba commented Sep 16, 2020

For anyone interested, my PR in #8533 works but is probably stalled on having a good path for the API. All the existing JSON APIs are under /pypi/<project-name>, which doesn't leave an obvious place to add this one (short of claiming the project name matching the API). Ernest already rejected putting it under /admin because that path is exclude from CDN.

Happy to receive any suggestions either here or there. I don't have near enough insight into PyPI's routing design to make a confident decision myself.

@di
Copy link
Member Author

di commented Sep 16, 2020

Probably blocked on #284.

@zooba
Copy link
Contributor

zooba commented Sep 16, 2020

If we're going to wait for a complete API redesign and potential technology change, can we just manually dump the list of banned names into a public text file somewhere until that's ready?

@tueda
Copy link

tueda commented May 23, 2021

As far as I understand, PyPI still does not provide any reasonable way to check whether a package has been taken down (I hit a package name that is not listed on PyPI but prohibited). From the viewpoint of a package developer, this is not a good situation; the only way to check whether a package name was already taken by package name squatters and then taken down by admins is to try to squat the package name.

@westonsteimel
Copy link

@di what do you think about getting these feeding into PyPA Advisory Database? Then it would feed into OSV and anything else consuming those data sources. Of course it'd also be great to get pip-audit able to detect these. I realise that it would involve a few changes since I think currently everything depends on the pypi package JSON existing and it won't for these removed packages, but I think it'd be worth trying. I'm happy to have a go at getting a couple of initial advisory entries created and see what happens from there

@di
Copy link
Member Author

di commented Dec 3, 2021

@westonsteimel I had the same thought. We could either do that, which is a bit circuitous but will work, or if we decided OSV or the Advisory Database is not the right place for these types of things, we could just take the easier route of including these in the vulnerabilities field based on Warehouse's internal state about these prohibited names.

Let's raise an issue at https://github.com/pypa/advisory-db to decide whether malware/spam/etc should generate advisories.

@sethmlarson
Copy link
Contributor

Noting here that there's now an OSV database hosted by the OpenSSF that tracks this information: https://github.com/ossf/malicious-packages

@di
Copy link
Member Author

di commented Nov 6, 2023

Also, note that that database only includes a fraction of the packages being taken down, there is currently no PyPI -> OSV link that populates that database.

@miketheman
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APIs/feeds feature request malware-detection Issues related to automated malware detection. needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

No branches or pull requests