Security issues with pickles #2522

KOLANICH · 2020-03-26T22:54:33Z

Pickles security nightmare because they alow arbitrary code execution and because this code is not explicitly visible, to extract and analyse it tools are needed that currently don't exist. They are good places to plant hardly discovered backdoors. But this lib relies heavily on them. It downloads some pickled pretrained stuff and doesn't work without it.

We need to solve this issue. There are several issues here:

Pickles are used. They should be replaced. The replacements can be some custom code and either a feature-specific binary format, or general purpose binary format, such as CBOR.
I haven't found the recepies to build the pretrained models. I mean for each pretrained model should be
- a python file that:
  - fetches the needed datasets
  - preprocesses the data and trains the model
  - evaluating its performance
  - intentionally written the way to be easily auditable
- and a JSON config file storing
  - hyperparams
  - datasets locations
  - previously achieved performance
    So, if using project- or thirdpparty devs-provided pickles is inacceptible because I cannot trust them, I should be able to recreate own pickles from scratch. Anyway, even if we replace pickles by something else, we still need the way to improve the pretrained models, i.e. by retraining them on better datasets or using better hyperparams (I have a lib for hyperparams tuning, BTW). So, IMHO for every pretrained model there should be the code reproducing its creation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security issues with pickles #2522

Security issues with pickles #2522

KOLANICH commented Mar 26, 2020 •

edited

Security issues with pickles #2522

Security issues with pickles #2522

Comments

KOLANICH commented Mar 26, 2020 • edited

KOLANICH commented Mar 26, 2020 •

edited