Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security issues with pickles #2522

Open
KOLANICH opened this issue Mar 26, 2020 · 0 comments
Open

Security issues with pickles #2522

KOLANICH opened this issue Mar 26, 2020 · 0 comments

Comments

@KOLANICH
Copy link

KOLANICH commented Mar 26, 2020

Pickles security nightmare because they alow arbitrary code execution and because this code is not explicitly visible, to extract and analyse it tools are needed that currently don't exist. They are good places to plant hardly discovered backdoors. But this lib relies heavily on them. It downloads some pickled pretrained stuff and doesn't work without it.

We need to solve this issue. There are several issues here:

  1. Pickles are used. They should be replaced. The replacements can be some custom code and either a feature-specific binary format, or general purpose binary format, such as CBOR.
  2. I haven't found the recepies to build the pretrained models. I mean for each pretrained model should be
    • a python file that:
      • fetches the needed datasets
      • preprocesses the data and trains the model
      • evaluating its performance
      • intentionally written the way to be easily auditable
    • and a JSON config file storing
      • hyperparams
      • datasets locations
      • previously achieved performance
        So, if using project- or thirdpparty devs-provided pickles is inacceptible because I cannot trust them, I should be able to recreate own pickles from scratch. Anyway, even if we replace pickles by something else, we still need the way to improve the pretrained models, i.e. by retraining them on better datasets or using better hyperparams (I have a lib for hyperparams tuning, BTW). So, IMHO for every pretrained model there should be the code reproducing its creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant