Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow pickle.load in read_array by default #1585

Closed
wants to merge 1 commit into from

Conversation

danigm
Copy link

@danigm danigm commented May 20, 2024

This patch adds a new optional argument to the read_array method to enable pickle. By default the pickle load is disabled.

This is based on the actual code in numpy/lib/format.py: numpy/numpy@a2bd3a7

Fix CVE-2024-34997, #1582

This patch adds a new optional argument to the read_array method to
enable pickle. By default the pickle load is disabled.

This is based on the actual code in numpy/lib/format.py:
numpy/numpy@a2bd3a7

Fix CVE-2024-34997, joblib#1582
@tomMoral
Copy link
Contributor

Hello,

Thanks for the PR, however, I don't think this feature makes sense in the context of the joblib library.

Here, the NumpyArrayWrapper is used internally to persist numpy arrays in the context of sharing objects between two processes/distributed experiments/caching.
One of joblib's goal is to allow the user to serialize any Python object to communicate tasks or results, or to cache results. To achieve this, we need to be as open as possible with our serialization. Putting this allow_pickle will reduce the type of object that can be persisted and thus break some user's code which have objects in their numpy arrays.
So we would need to add some extra code to allow them to keep their use case by specifying this option in the high-level API.

This seems overcomplicated while this feature does not make the code safer IMO.
I could pickle the initial class A from #1582 and joblib.load() will run whoami, so why add an extra feature to avoid this in a nested case?
It is well known that pickle should never be shared between different users as is, both for security reasons and because there are no guarantees that a pickle file can be opened on a different Python installation.
This extra option will not make the pickle format safer and I am therefore inclined to close this PR.

We are already making it clear that this feature should not be used for inter-user sharing on this page: https://joblib.readthedocs.io/en/stable/generated/joblib.load.html
We could also make this class private to make it clear it should not be used outside the parallel/caching context.

I am closing this PR for now but feel free to continue the discussion if you think I am missing some points.

@tomMoral tomMoral closed this May 20, 2024
@danigm
Copy link
Author

danigm commented May 21, 2024

@tomMoral Thanks for the detailed explanation. I agree with you, this change just makes the code a bit more complicated, and the CVE looks overrated in this case. The only way to exploit is when there's a bad usage of the library and it's well documented so not a major risk and as you said, maybe nothing to fix.

@tomMoral
Copy link
Contributor

Thanks for the feedback and thanks for the proposed fix :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants