ENH: Deny/permit configuration for read_pickle to address security concerns #48049

yilun11 · 2022-08-11T22:31:40Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Pandas.read_pickle() has a security concern, and today we recommend people not to load untrusted pickles. See the disputed CVE-2020-13091 for more background. Many commercial scanning software flag pandas due to the CVE and makes it challenging to adopt/use pandas in security conscious environments.

Feature Description

Implemented at https://github.com/yilun11/fanniemae-pandas/tree/CVE-2020-13091

I added a pickle option for off/deny/permit with a default to deny. Under deny, it blocks calls to os.system/posix.system and addresses the CVE concern. I also added restrictions on builtins like eval(). It passes all the pandas tests. I think this is backwards compatible assuming pickles don't make these type of calls in the wild.

Under permit, it will only allow calls to classes in the permit list. This fails the pandas tests. This is the original Python restricted globals example, and may be of limited use (or at least will need to add wildcard support). Under off, it does not do any security checks and reverts to current behavior.

In addition to the default settings, users can configure via a YML file or set_options().

Alternative Solutions

Since read_pickle() relies on pickle.load(), it may be possible to make the changes there. I think sandboxing Python is another way to reduce the potential for bad pickles to cause damage.

Additional Context

This is a long outstanding issue with pickle, and I've tried to read the history behind it. We don't flag Python yet for vulnerabilities, yet we do so for Pandas, hence I have the changes here. My goal is to make a backwards compatible change that addressed the disputed CVE and hope this does so. I welcome any feedback on this matter.

mroeschke · 2022-08-11T22:55:31Z

Definitely understand the security concern (hence why the read_pickle docstring has a warning regarding untrusted pickles) https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html

But as you mentioned, since read_pickle is essentially a wrapper of pickle.load, IMO it would be more maintainable (and benefit the wider Python ecosystem) if these behaviors were a feature of pickle.load

yilun11 · 2022-08-12T04:58:05Z

Another idea is leaving the default option to off, so there is full backwards compatibility. I assume this is a key concern against adoption.

Many orgs interested in security will be installing packages from an internal repo/mirror. At the repo level, I can probably enable support so it installs under deny mode by default for orgs that wish it. This could apply to other packages as well where a subset wants higher security. If I went upstream to CPython, I wouldn't have this repo setting option. I'd need to take this up with Sonatype (my repo vendor) but I feel like this could work for all parties.

mroeschke · 2022-08-12T05:52:35Z

I think one of the concerns for integration into pandas is maintenance and scope of the issue. read_pickle is just calling pickle.load so the solution would not be targeting the key issue.

Delengowski · 2022-08-13T13:34:21Z

@yilun11 I work on air gapped systems where I deal with similar issues. You need to find a security poc who also has software experience and can freethink rather than just reading a CVE.

Going to nvd and performing a search of "python pickle" reveals this issue to be rampant and not as you pointed out not tied the library but the pickle protocol itself. https://nvd.nist.gov/vuln/search/results?form_type=Basic&results_type=overview&query=Python+pickle&search_type=all&isCpeNameSearch=false

Does your work pay for Matlab and use it? The same thing can happen with .mat files for classes that implement saveobj https://www.mathworks.com/help/matlab/ref/saveobj.html

What I am suggesting is you need to point out the inconsistencies that your security may or may not have. You may gain ground here depending on how you approach it. This has worked best for me. Focus should be on not allowing 3rd party pickles on the system.

I see you are aware of this http://docs.python.org/3.4/library/pickle.html#restricting-globals given your branch. As others suggested this should be done more broadly or else you'd be performing this update for any library that has similar cve.

yilun11 · 2022-08-13T16:51:33Z

I can look for another group to maintain the deny/permit lists to address the maintenance question. This will just leave the logic, which I hope will be agreeable. In terms of scope, I can implement it in cpython pickle.py but the blocker is I have no way of setting sitewide Python configurations at the org level, assuming default to off for compatibility. With packages, this is easier since I can ask private repos to do file injection to configure packages for the org that want the added security. This is the best that I could come up with, absent a PEP proposal to allow package installation configuration by repos.

I will do outreach to various groups to get more feedback. @Delengowski In some ways, this is a fool's errand. I do see audit hooks introduced in Py3.8, so there is progress on these issues.

jreback · 2022-08-13T17:02:57Z

-1 on changes in pandas to expose these types of options

this is clearly a cpython request

mroeschke · 2022-08-13T17:19:12Z

Thanks for the suggestion, but as discussed, pandas isn't really the correct scope to address and fix this issue so closing.

If pickle.load ever exposes a security feature through a keyword argument, it would definitely make sense to also include it in read_pickle.

yilun11 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2022

mroeschke added IO Pickle read_pickle, to_pickle Upstream issue Issue related to pandas dependency Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2022

mroeschke closed this as completed Aug 13, 2022

MarcoGorelli mentioned this issue Nov 21, 2022

QST: How to eliminate this threat for Pandas? #49810

Closed

2 tasks

darakian mentioned this issue Jul 31, 2023

[GHSA-cmm9-mgm5-9r42] pandas through 1.0.3 can unserialize and execute commands... github/advisory-database#2553

Closed

mroeschke mentioned this issue Jan 26, 2024

CVE 2020 13091 #57094

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Deny/permit configuration for read_pickle to address security concerns #48049

ENH: Deny/permit configuration for read_pickle to address security concerns #48049

yilun11 commented Aug 11, 2022

mroeschke commented Aug 11, 2022

yilun11 commented Aug 12, 2022

mroeschke commented Aug 12, 2022

Delengowski commented Aug 13, 2022

yilun11 commented Aug 13, 2022

jreback commented Aug 13, 2022

mroeschke commented Aug 13, 2022

ENH: Deny/permit configuration for read_pickle to address security concerns #48049

ENH: Deny/permit configuration for read_pickle to address security concerns #48049

Comments

yilun11 commented Aug 11, 2022

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

mroeschke commented Aug 11, 2022

yilun11 commented Aug 12, 2022

mroeschke commented Aug 12, 2022

Delengowski commented Aug 13, 2022

yilun11 commented Aug 13, 2022

jreback commented Aug 13, 2022

mroeschke commented Aug 13, 2022