[IMPROVE] Dynamic loading of Pickle data #33

AlexAltea · 2023-03-07T14:40:24Z

Not sure if to flag this as a bug or improvement (probably rather a security vulnerability). Either way, I think it's fairly dangerous to have terminals download on-demand Pickle files from random places on the Internet.

Pickle files allow for remote code execution and Python offers no sandbox mechanism. This is not so different from loading a DLL. See the warning at: https://docs.python.org/3/library/pickle.html or https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html

What's the feature or data that should be improved?

Code should move away from dynamically loading Pickle containers, e.g.:

FinanceDatabase/financedatabase/helpers.py

Line 47 in 3d55640

self.data = pd.read_pickle(the_path, compression="xz")

Describe how you would like the feature improved

Same functionality using a safe serialization format.

Possibly describe the ideal way to improve this

E.g. msgpack, BSON or compressed JSON instead of Pickle.

Additional information

N/A

AlexAltea · 2023-03-07T14:44:07Z

Changed the title since technically speaking it's not "external" data, since you are in control of the entire project which includes both code and Pickle files.

But I'm concerned simply because there's no need to risk RCE everytime people want to query the data.

JerBouma · 2023-03-07T22:54:25Z

So I don't entirely see the issue here. As you say I am in full control of the database but I am also affiliated directly with OpenBB. There are no incentives for me whatsoever to include malicious code.

Furthermore, the database is setup in such a way that in no circumstances I would accept a PR that changes the pickles, that's what I have GitHub Actions for. You would need to add the malicious code directly into the CSV file for it to work and have it activate only if you read a pickle and not a CSV. Let alone that I shouldn't be able to notice, it doesn't affect my local repository, file sizes do not change whatsoever and all functionality still works exactly the same.

Lots of steps you need to take to achieve such a feat and therefore I don't really see the risk or major security issue. Especially since there has been no PR to improve the database in 3 years time.

In any case I'll convert to the next best thing, being compressed CSVs to eliminate the risk altogether.

JerBouma · 2023-03-08T13:34:05Z

Fixed with new release.

AlexAltea · 2023-03-08T14:32:21Z

Thank you very much for the quick answer and your fix!

There are no incentives for me whatsoever to include malicious code.

Indeed! My concern was rather: What if a reviewer overlooks a malicious PR? What if your account gets compromised? What if GitHub or an Actions runner gets compromised? I understand chances of this are very low, but the impact of 100,000's of users dynamically downloading files that can run arbitrary code without a sandbox was nightmare material for me.

Thanks a lot for taking care of this. ❤️

AlexAltea changed the title ~~[IMPROVE] M~~ [IMPROVE] Insecure loading of external Pickle data Mar 7, 2023

AlexAltea changed the title ~~[IMPROVE] Insecure loading of external Pickle data~~ [IMPROVE] Dynamic loading of Pickle data Mar 7, 2023

JerBouma mentioned this issue Mar 7, 2023

[IMPROVE] Insecure loading of external Pickle data OpenBB-finance/OpenBB#4422

Closed

JerBouma closed this as completed Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IMPROVE] Dynamic loading of Pickle data #33

[IMPROVE] Dynamic loading of Pickle data #33

AlexAltea commented Mar 7, 2023

AlexAltea commented Mar 7, 2023

JerBouma commented Mar 7, 2023 •

edited

Loading

JerBouma commented Mar 8, 2023

AlexAltea commented Mar 8, 2023 •

edited

Loading

[IMPROVE] Dynamic loading of Pickle data #33

[IMPROVE] Dynamic loading of Pickle data #33

Comments

AlexAltea commented Mar 7, 2023

AlexAltea commented Mar 7, 2023

JerBouma commented Mar 7, 2023 • edited Loading

JerBouma commented Mar 8, 2023

AlexAltea commented Mar 8, 2023 • edited Loading

JerBouma commented Mar 7, 2023 •

edited

Loading

AlexAltea commented Mar 8, 2023 •

edited

Loading