New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Deny/permit configuration for read_pickle to address security concerns #48049
Comments
Definitely understand the security concern (hence why the But as you mentioned, since |
Another idea is leaving the default option to off, so there is full backwards compatibility. I assume this is a key concern against adoption. Many orgs interested in security will be installing packages from an internal repo/mirror. At the repo level, I can probably enable support so it installs under deny mode by default for orgs that wish it. This could apply to other packages as well where a subset wants higher security. If I went upstream to CPython, I wouldn't have this repo setting option. I'd need to take this up with Sonatype (my repo vendor) but I feel like this could work for all parties. |
I think one of the concerns for integration into pandas is maintenance and scope of the issue. |
@yilun11 I work on air gapped systems where I deal with similar issues. You need to find a security poc who also has software experience and can freethink rather than just reading a CVE. Going to nvd and performing a search of "python pickle" reveals this issue to be rampant and not as you pointed out not tied the library but the pickle protocol itself. https://nvd.nist.gov/vuln/search/results?form_type=Basic&results_type=overview&query=Python+pickle&search_type=all&isCpeNameSearch=false Does your work pay for Matlab and use it? The same thing can happen with .mat files for classes that implement What I am suggesting is you need to point out the inconsistencies that your security may or may not have. You may gain ground here depending on how you approach it. This has worked best for me. Focus should be on not allowing 3rd party pickles on the system. I see you are aware of this http://docs.python.org/3.4/library/pickle.html#restricting-globals given your branch. As others suggested this should be done more broadly or else you'd be performing this update for any library that has similar cve. |
I can look for another group to maintain the deny/permit lists to address the maintenance question. This will just leave the logic, which I hope will be agreeable. In terms of scope, I can implement it in cpython pickle.py but the blocker is I have no way of setting sitewide Python configurations at the org level, assuming default to off for compatibility. With packages, this is easier since I can ask private repos to do file injection to configure packages for the org that want the added security. This is the best that I could come up with, absent a PEP proposal to allow package installation configuration by repos. I will do outreach to various groups to get more feedback. @Delengowski In some ways, this is a fool's errand. I do see audit hooks introduced in Py3.8, so there is progress on these issues. |
-1 on changes in pandas to expose these types of options this is clearly a cpython request |
Thanks for the suggestion, but as discussed, pandas isn't really the correct scope to address and fix this issue so closing. If |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Pandas.read_pickle() has a security concern, and today we recommend people not to load untrusted pickles. See the disputed CVE-2020-13091 for more background. Many commercial scanning software flag pandas due to the CVE and makes it challenging to adopt/use pandas in security conscious environments.
Feature Description
Implemented at https://github.com/yilun11/fanniemae-pandas/tree/CVE-2020-13091
I added a pickle option for off/deny/permit with a default to deny. Under deny, it blocks calls to os.system/posix.system and addresses the CVE concern. I also added restrictions on builtins like eval(). It passes all the pandas tests. I think this is backwards compatible assuming pickles don't make these type of calls in the wild.
Under permit, it will only allow calls to classes in the permit list. This fails the pandas tests. This is the original Python restricted globals example, and may be of limited use (or at least will need to add wildcard support). Under off, it does not do any security checks and reverts to current behavior.
In addition to the default settings, users can configure via a YML file or set_options().
Alternative Solutions
Since read_pickle() relies on pickle.load(), it may be possible to make the changes there. I think sandboxing Python is another way to reduce the potential for bad pickles to cause damage.
Additional Context
This is a long outstanding issue with pickle, and I've tried to read the history behind it. We don't flag Python yet for vulnerabilities, yet we do so for Pandas, hence I have the changes here. My goal is to make a backwards compatible change that addressed the disputed CVE and hope this does so. I welcome any feedback on this matter.
The text was updated successfully, but these errors were encountered: