Detect malicious packages, for later removal #5117

E3V3A · 2018-11-26T15:45:55Z

Looking at the simple package index, there are a number of highly questionable packages (at least so by their names.)

Packages without proper names, authors or descriptions should probably be removed. If not for bloat reasons, but for security concerns.

Stuff like this:

di · 2018-11-26T15:49:11Z

There are almost 200K projects on PyPI. We don't have the ability to manually audit each one. How do you propose this should be done?

E3V3A · 2018-11-26T17:32:49Z

There are almost 200K projects on PyPI

Exactly! -- And probably 99.9% useless, outdated, fake, deprecated (at best), or possibly containing malware, at worst!

How do you propose this should be done?

:) We are programmers so I'm sure we can figure that out!

How about about searching for packages that:

Has weird name (random or repeated ASCII)?
Has no author
Author has not provided an email
Package has:
- No valid homepage URL
- No description
- No releases in the last 3 years
- No downloads/installs in the last 2 years

That's just a start... and would probably remove a siht load of crud.
It would definitely be interesting to make such a search to see just how many hits we'd get.

E3V3A · 2018-11-26T17:36:39Z

Another related issue, is that there seem to be some kind of cyber squatting for package names going on there as well. Packages with little or meaningless content but occupies useful names.

How do you plan to deal with that?

di · 2018-11-27T21:18:17Z

Another related issue, is that there seem to be some kind of cyber squatting for package names going on there as well. Packages with little or meaningless content but occupies useful names.

How do you plan to deal with that?

See PEP 541 and #1506.

brainwane · 2019-06-21T02:30:41Z

Thanks for filing this issue, @E3V3A!

Per discussion today, we'll be addressing this problem during upcoming work on automated detection of malicious uploads. In this issue we'll be nailing down our criteria for "how do we determine what is a bad package?" and plans for removing those packages.

(Note that we're distinguishing between a malicious upload and spam, and between malware and typosquatting, and that there are other issues -- like #194, #4319 and #4004 -- that concentrate on filtering re: packages that have noncompliant metadata or no recent releases.)

brainwane · 2019-09-02T16:24:42Z

Per a discussion with @ewdurbin last week:

The work we'll do on automated detection of malicious uploads will first concentrate on finding malicious packages, and building the tools around that. Only after that will we be able to provide automated tools to help PyPI admins remove them.

di · 2019-12-05T17:02:07Z

From #7061:

What's the problem this feature will solve?
Malicious and insecure packages are a challenge in the open source community. Malicious packages have been removed several times in the last few years. Improved automated auditing techniques would make it easier for security specialists to quickly remove malicious packages. Smart bad actors would be able to use the same test suite, certainly, but it would at minimum allow for the vetting of existing packages. Likewise, this would set up an automated process which could be enhanced over time.

Describe the solution you'd like
Python's exec() function is not secure and may be a good heuristic for finding malicious packages. There may be other additional heuristics that make a package appear more suspicious, and a likely target for manual auditing. Add a badge or other indicator for packages that pass/fail these tests.

mertzjames · 2019-12-05T17:57:29Z

I'm very interested in this effort and would like to help. With the fact that there are so many packages here are a few suggestions that I have:

Most legit packages will have a few things in common:
- a readme/description
- link to source code
- 2 or more contributors
- other common fields filled out such as classifiers
A tally of the top 1000 (or more) top downloaded packages could be collected and compared to others/new
- Compare the name of the package to see if it's very similar (typo squatting)
- Compare the name of the package to see if it's very different. This is a common issue with malicious websites and there's a tool to calculate this (https://github.com/MarkBaggett/freq)
- Code analysis? This would be a very difficult thing to do I think

xmunoz · 2019-12-13T19:29:20Z

Hello friends! I will be working on the backend implementation of the system for adding malware checks. You can track the progress of this work by checking out the malware-detection label.

nicowaisman · 2019-12-23T14:40:19Z

Hey everyone.
We are currently working on a proof of concept at GitHub to detect malicious code on Package manager.
We are currently setting-up an environment to run our test, but our first step is to use a static analysis tool: CodeQL to model the way certain backdoor works to detect them as they get included into pypi.

brainwane · 2020-01-27T22:22:03Z

@xmunoz I'm excited about this work! Will we be able to discuss it with you at PyCon and/or help improve it during the sprints?

xmunoz · 2020-02-12T18:16:29Z

Yes, absolutely! I'm actually giving a charla about this system at PyCon, but for interested non-Spanish speakers, I can give the English version during the sprints. Also, I'd really love to get feedback on this contribution documentation, and this sounds like a great way to do that.

#7369

nicowaisman · 2020-02-12T18:34:23Z

@xmunoz Are there any slides of that charla?
Do you guys mantain a database of previous backdoor/malware introduced to pypi ? I have slowly start building my own collection, and I would love to expand it.

xmunoz · 2020-02-12T20:07:00Z

For the first question, I'll follow up over email :)

The second question could potentially be answered by @ewdurbin.

xmunoz · 2020-02-18T23:43:38Z

The malware-detection branch has been merged onto master with PR #7377

farhaan710 · 2022-07-25T12:10:08Z

I need to develop a tool that detects malicious repositories. can you @xmunoz help me with it?

E3V3A changed the title ~~Remove scary number of suspicious packages~~ Remove bad or suspicious packages Nov 26, 2018

di added the feature request label Nov 27, 2018

brainwane changed the title ~~Remove bad or suspicious packages~~ Find and remove malicious packages Jun 20, 2019

brainwane added this to the Package signing & detection/verification milestone Jun 20, 2019

brainwane added the needs discussion a product management/policy issue maintainers and users should discuss label Jun 21, 2019

brainwane changed the title ~~Find and remove malicious packages~~ Detect malicious packages, for later removal Sep 2, 2019

brainwane mentioned this issue Sep 2, 2019

Publish a list of malicious packages that have been taken down #4703

Open

di mentioned this issue Dec 5, 2019

Flag packages with questionable security practices #7061

Closed

xmunoz closed this as completed Feb 18, 2020

martintoreilly mentioned this issue Jun 4, 2020

Process for adding python/R/Julia/Ubuntu/etc. packages alan-turing-institute/data-safe-haven#622

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect malicious packages, for later removal #5117

Detect malicious packages, for later removal #5117

E3V3A commented Nov 26, 2018

di commented Nov 26, 2018

E3V3A commented Nov 26, 2018 •

edited

Loading

E3V3A commented Nov 26, 2018

di commented Nov 27, 2018

brainwane commented Jun 21, 2019

brainwane commented Sep 2, 2019

di commented Dec 5, 2019

mertzjames commented Dec 5, 2019

xmunoz commented Dec 13, 2019

nicowaisman commented Dec 23, 2019

brainwane commented Jan 27, 2020

xmunoz commented Feb 12, 2020

nicowaisman commented Feb 12, 2020

xmunoz commented Feb 12, 2020 •

edited

Loading

xmunoz commented Feb 18, 2020

farhaan710 commented Jul 25, 2022

Detect malicious packages, for later removal #5117

Detect malicious packages, for later removal #5117

Comments

E3V3A commented Nov 26, 2018

di commented Nov 26, 2018

E3V3A commented Nov 26, 2018 • edited Loading

E3V3A commented Nov 26, 2018

di commented Nov 27, 2018

brainwane commented Jun 21, 2019

brainwane commented Sep 2, 2019

di commented Dec 5, 2019

mertzjames commented Dec 5, 2019

xmunoz commented Dec 13, 2019

nicowaisman commented Dec 23, 2019

brainwane commented Jan 27, 2020

xmunoz commented Feb 12, 2020

nicowaisman commented Feb 12, 2020

xmunoz commented Feb 12, 2020 • edited Loading

xmunoz commented Feb 18, 2020

farhaan710 commented Jul 25, 2022

E3V3A commented Nov 26, 2018 •

edited

Loading

xmunoz commented Feb 12, 2020 •

edited

Loading