Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVE] Dynamic loading of Pickle data #33

Closed
AlexAltea opened this issue Mar 7, 2023 · 4 comments
Closed

[IMPROVE] Dynamic loading of Pickle data #33

AlexAltea opened this issue Mar 7, 2023 · 4 comments

Comments

@AlexAltea
Copy link

Related: OpenBB-finance/OpenBB#4422

Not sure if to flag this as a bug or improvement (probably rather a security vulnerability). Either way, I think it's fairly dangerous to have terminals download on-demand Pickle files from random places on the Internet.

Pickle files allow for remote code execution and Python offers no sandbox mechanism. This is not so different from loading a DLL. See the warning at: https://docs.python.org/3/library/pickle.html or https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html


What's the feature or data that should be improved?

Code should move away from dynamically loading Pickle containers, e.g.:

self.data = pd.read_pickle(the_path, compression="xz")

Describe how you would like the feature improved

Same functionality using a safe serialization format.

Possibly describe the ideal way to improve this

E.g. msgpack, BSON or compressed JSON instead of Pickle.

Additional information

N/A

@AlexAltea AlexAltea changed the title [IMPROVE] M [IMPROVE] Insecure loading of external Pickle data Mar 7, 2023
@AlexAltea AlexAltea changed the title [IMPROVE] Insecure loading of external Pickle data [IMPROVE] Dynamic loading of Pickle data Mar 7, 2023
@AlexAltea
Copy link
Author

Changed the title since technically speaking it's not "external" data, since you are in control of the entire project which includes both code and Pickle files.

But I'm concerned simply because there's no need to risk RCE everytime people want to query the data.

@JerBouma
Copy link
Owner

JerBouma commented Mar 7, 2023

So I don't entirely see the issue here. As you say I am in full control of the database but I am also affiliated directly with OpenBB. There are no incentives for me whatsoever to include malicious code.

Furthermore, the database is setup in such a way that in no circumstances I would accept a PR that changes the pickles, that's what I have GitHub Actions for. You would need to add the malicious code directly into the CSV file for it to work and have it activate only if you read a pickle and not a CSV. Let alone that I shouldn't be able to notice, it doesn't affect my local repository, file sizes do not change whatsoever and all functionality still works exactly the same.

Lots of steps you need to take to achieve such a feat and therefore I don't really see the risk or major security issue. Especially since there has been no PR to improve the database in 3 years time.

In any case I'll convert to the next best thing, being compressed CSVs to eliminate the risk altogether.

@JerBouma
Copy link
Owner

JerBouma commented Mar 8, 2023

Fixed with new release.

@JerBouma JerBouma closed this as completed Mar 8, 2023
@AlexAltea
Copy link
Author

AlexAltea commented Mar 8, 2023

Thank you very much for the quick answer and your fix!

There are no incentives for me whatsoever to include malicious code.

Indeed! My concern was rather: What if a reviewer overlooks a malicious PR? What if your account gets compromised? What if GitHub or an Actions runner gets compromised? I understand chances of this are very low, but the impact of 100,000's of users dynamically downloading files that can run arbitrary code without a sandbox was nightmare material for me.

Thanks a lot for taking care of this. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants