Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security review of pickle/marshal docs #35339

Closed
tim-one opened this issue Oct 16, 2001 · 31 comments
Closed

Security review of pickle/marshal docs #35339

tim-one opened this issue Oct 16, 2001 · 31 comments
Assignees
Labels
extension-modules C modules in the Modules dir

Comments

@tim-one
Copy link
Member

tim-one commented Oct 16, 2001

BPO 471893
Nosy @tim-one, @warsaw, @akuchling
Files
  • pickle-docs.patch: Patch to pickle and marshal docs.
  • pickle-docs.patch: Updated version of patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/akuchling'
    closed_at = <Date 2003-05-14.16:52:05.000>
    created_at = <Date 2001-10-16.22:42:24.000>
    labels = ['extension-modules']
    title = 'Security review of pickle/marshal docs'
    updated_at = <Date 2003-05-14.16:52:05.000>
    user = 'https://github.com/tim-one'

    bugs.python.org fields:

    activity = <Date 2003-05-14.16:52:05.000>
    actor = 'akuchling'
    assignee = 'akuchling'
    closed = True
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2001-10-16.22:42:24.000>
    creator = 'tim.peters'
    dependencies = []
    files = ['155', '156']
    hgrepos = []
    issue_num = 471893
    keywords = []
    message_count = 31.0
    messages = ['6959', '6960', '6961', '6962', '6963', '6964', '6965', '6966', '6967', '6968', '6969', '6970', '6971', '6972', '6973', '6974', '6975', '6976', '6977', '6978', '6979', '6980', '6981', '6982', '6983', '6984', '6985', '6986', '6987', '6988', '6989']
    nosy_count = 7.0
    nosy_names = ['tim.peters', 'nobody', 'jhylton', 'barry', 'akuchling', 'dcjim', 'phr']
    pr_nums = []
    priority = 'high'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue471893'
    versions = []

    @tim-one
    Copy link
    Member Author

    tim-one commented Oct 16, 2001

    Paul Rubin points out that the security implications
    of using marshal and/or pickle aren't clear from the
    docs. Assigning to Jeremy as he's more sensitive to
    such issues than I am; maybe Barry would like to get
    paranoid too <wink>.

    A specific example: the pickle docs say that pickle
    doesn't support code objects, and "at least this
    avoids the possibility of smuggling Trojan horses into
    a program". However,

    1. The marshal docs don't mention this vulnerability
      at all.

    while

    1. The pickle docs don't spell out possible dangers
      due to things pickle does that marshal doesn't (like
      importing modules, and running class constructors).

    @tim-one tim-one closed this as completed Oct 16, 2001
    @tim-one tim-one added the extension-modules C modules in the Modules dir label Oct 16, 2001
    @tim-one tim-one closed this as completed Oct 16, 2001
    @tim-one tim-one added the extension-modules C modules in the Modules dir label Oct 16, 2001
    @phr
    Copy link
    Mannequin

    phr mannequin commented Oct 16, 2001

    Logged In: YES
    user_id=72053

    Certainly anyone unserializing potentially malicious data
    with pickle, marshal, or anything else, should check the
    results before doing anything dangerous with them (like
    executing code). However, unpickle can potentially do
    damage before it even returns, by creating loading modules
    and creating initialized class instances. So pickle.loads
    should never be used on untrusted strings, except possibly
    in a rexec wrapper (as proposed by Tim). Another
    possibility (also by Tim) is to override the load_inst
    method of the Pickler class, though I don't think you can
    do that for cPickle.

    A sample exploit for unpickle can be found at
    <http://www.nightsong.com/phr/python/pickletest.py\>.
    Unpickling the test string runs penguin.__init__ contrary
    to the doc's saying no initialization unless there's a
    __getinitargs__ method in the class def.

    The "exploding penguin" class is artificial, but
    applications are vulnerable if there's an unsafe
    constructor anywhere in any class of the application or in
    the python library (example: the NNTP constructor opens an
    IP connection to an arbitrary address, so a malicious
    imported string can send a message through your firewall
    when you import it).

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 7, 2001

    Logged In: NO

    Irmen de Jong points out that the standard cookie module
    uses pickling for serial and smart cookies. The 2.1.1
    cookie module docs explicitly say not to use those
    classes because of the security hole--that they're provided
    for backward compatibility only (but with what?! Any
    server that uses those classes on open ports needs to be
    changed right away).

    Irmen's library, http://pyro.sourceforge.net, also uses
    unpickle insecurely (he's aware of this now and is figuring
    out a fix).

    IMO this is really a code bug rather than a documentation
    bug, and should be fixed in the code rather than the docs.
    Documenting the bug rather than fixing it leaves a
    deficiency in the Python library: obvious uses of pickling,
    like Pyro and the cookie module, can't be implemented
    using cPickle and have to resort to a slower Python
    deserializer, or use marshal and have possible compatibility
    problems between versions (and not be able to serialize
    class instances).

    Paul

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Nov 7, 2001

    Logged In: YES
    user_id=31392

    What's the code bug? Your last message has a lot of gloom
    and doom <wink>, but I'm not sure what specific problem
    you'd like to see fixed. Are you saying that something
    needs to be fixed in cPickle and not in pickle?

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 7, 2001

    Logged In: NO

    IMO it's a code bug that you can't unpickle strings from
    untrusted sources. Pyro and the cookie module are examples
    of programs that got bitten by this bug. Whether it's
    really a bug is a matter of opinion--I had a big email
    exchange with Guido and Tim about it, and they felt it
    was enough to fix the pickle documentation.

    Pickle has the same problem as cPickle, but with pickle
    you can subclass the pickler and override the method that
    unpickles class objects, and work around the (IMO) bug.
    The workaround doesn't help cPickle since cPickle can't
    be subclassed. See bug bpo-467384 for some related discussion.

    Paul

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Nov 8, 2001

    Logged In: YES
    user_id=31392

    I don't think of the issue you describe as a bug in the
    code. You're suggesting a new feature for pickle. As far
    as I can tell, the original design requirements for pickle
    did not include the ability to securely load a pickle from
    an untrusted source.

    It may be a legitimate feature request, but it's too late to
    make it into Python 2.2. I suggest we look at the design
    issues for Python 2.3 and decide if it's a feature we want
    to support. I imagine a PEP may be necessary to lay out the
    issues and the solution. Do you want to have a hand in that
    PEP?

    I still don't understand what it means that Pyro and cookie
    were bit by a bug. It sounds like they were using pickle in
    ways that pickle was not intended to support. A careful
    analysis of how those two applications use pickle would be
    helpful for generating requirements.

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 9, 2001

    Logged In: NO

    Well, Guido and Tim agree with you that it's not a pickle
    bug. I still feel it is one, because its docs currently
    make you think you can securely load untrusted pickles, and
    because it's a natural, non-obscure thing to want to do
    (see pyro and the cookie module), but whatever. If it's
    not a code bug then I feel it's a significant functionality
    shortcoming in the python library.

    Pyro uses pickle to serialize data for RPC calls over the
    internet. A malicious client could make a hostile pickle
    take over the server. The cookie module lets web
    applications store user session state in browser cookies.
    Its SerialCookie and SmartCookie classes let you put
    arbitrary Python objects into the user session, and
    serializes them when pickle. Again, a malicious client
    can make a hostile pickle, send it in a cookie header to
    the http server, and take over the server when the
    application unpickles the cookie.

    The current documentation for the pickle module makes it
    very clear to me that the doc writer thought it was safe
    to unpickle untrusted cookies. If pickle wasn't designed
    for that, then there was a communication failure between
    the designer and the doc writer.

    Yes, I'm willing to help with a PEP for fixing this
    situation.

    Paul

    @warsaw
    Copy link
    Member

    warsaw commented Nov 10, 2001

    Logged In: YES
    user_id=12800

    I'm going to agree with Paul that this is a problem needing
    fixing, however there are really several issues.

    1. Cookie module makes it too easy to code exploits. Cookie
      exports a class, also called Cookie, which is aliased to
      SmartCookie, so that a naive program will simply pass cookie
      data to Cookie.Cookie() and you're screwed. So, Cookie
      module's defaults should be for more security rather than
      less, and Cookie.Cookie should be aliased to SimpleCookie
      instead.

    2. There is no built-in safe mechanism for de-serializing
      data from untrusted sources. You can't use pickle without
      overloading a "magic" method. You can't use cPickle because
      you can't do the overloading trick. You can't use marshal
      because it isn't bulletproof against recursive
      datastructures. So how /do/ you do it?

    I think it just may be serious enough to deal with in Python
    2.2, and I volunteer to address it (so I'll steal this bug
    report). Without looking at the code, or the level of
    effort necessary, I would make the following suggestions:

    1. Add to the public interface of pickle and cPickle, a flag
      that either disables the unpickling of instances altogether,
      or disables calling any code with unpickled data, e.g.
      constructors.

    2. Fix marshal to be bulletproof against recursive
      datastructures.

    3. Update the docs for both pickle/cPickle and marshal to
      explain how to safely write de-serializers of untrusted strings.

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 10, 2001

    Logged In: NO

    See bug bpo-467384 for discussion about marshal. Besides the
    recursion issue, marshal's format is explicitly undocumented
    and subject to change--you can't rely on it to interoperate
    between two different Python versions, so it's no good as
    an RPC serializer. The format has kludges (e.g. the
    representation of long ints) that make it undesirable to
    freeze and document it and force future versions to be
    backward compatible.

    Adding a pickle.loads flag to prevent instance unpickling
    isn't perfect but is probably the best alternative on
    your list. Perhaps the flag can have a value that allows
    unpickling the instances by restoring the instance
    attributes rather than calling the initializer. That's
    not always the right way to unpickle an instance (that's
    why the unpickler no longer works that way) but it's good
    enough a lot of the time.

    There's another issue with pickle/cPickle which is that they
    unpickle quoted strings by evaling them. This is scary.
    While I don't see an immediate exploit, I also haven't
    examined the 1000's of lines of code I'd need to examine
    to convince myself that there's NOT an exploit. I think
    the unpickler should be changed to never call eval but just
    parse the string as it needs to.

    Guido seemed to think pickle might have other possible
    exploits. I don't know what he had in mind but before
    declaring it safe for untrusted data I think it needs to
    be gone over with a fine toothed comb.

    Paul

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Nov 10, 2001

    Logged In: YES
    user_id=31392

    I don't think we should be doing anything about marshal.
    Maybe we should name in pyclib or something <0.9 wink>. It
    works fine for .pyc files, but I don't see a reason for it
    to do anymore than is necessary for that purpose.

    I think the notion of an unpickler that only handles
    builtin datatypes is the most attractive option you offer,
    but Paul has a good point about eval for strings. (It
    currently has some hacks that address specific exploits,
    but I doubt they are sufficient.) I also wonder how hard
    it is to handle builtin types and avoid subclasses of
    builtin types.

    If there are any changes to pickle, I think we need to be
    careful about how it is described. If we claim that an
    unpickler is safe for untrusted pickles, we've made a
    fairly strong claim. I still think such a design change
    requires a PEP that includes some requirements and use
    cases and a thorough analysis of potential exploits.

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 12, 2001

    Logged In: NO

    I like marshal and think it's much cleaner than pickle.
    I'd like to see its format cleaned up and documented
    so it can be a portable transport format. Even the
    recursion issue isn't so compelling--there's only a danger
    on marshalling, not unmarshalling, and the application
    has control over what it marshals. So I think the idea
    of documenting marshal (bug bpo-467384) should be kept alive.
    However, it shouldn't be changed in 2.2.

    If the Cookie class is still really aliased by default to
    SmartCookie, that's a bad bug and should definitely be
    fixed ASAP. The Cookie docs clearly say not to use
    SmartCookie or SerialCookie. In fact I think SmartCookie
    and SerialCookie should be disabled altogether. Any
    applications using them on the open internet have a security
    but and should be fixed immediately.

    Here's a possible temporary approach to fixing SmartCookie
    and SerialCookie: append a cryptographic message
    authentication code (MAC) to each pickled cookie. That
    means the application has some secret string K as a
    configuration parameter (it should be 10 or more random
    8-bit characters, or 20 or more random hex digits, etc).
    Then a smart cookie would be a pickle p, followed by
    SHA(p+K). The Cookie module would validate the MAC before
    calling unpickle, and raise an exception if the MAC wasn't
    valid.

    The security of this scheme rests on K being kept secret
    from attackers, which is generally not an easy thing to
    manage, but it's something to think about.

    Paul

    @dcjim
    Copy link
    Mannequin

    dcjim mannequin commented Nov 12, 2001

    Logged In: YES
    user_id=73023

    It sounds like there are some documentation bugs:

    • The security ramifications are not discussed, nor are the
      remedies.

    • The cPickle module isn't documented very well. I submitted
      some
      documentation a long time ago that never got incorporated
      AFAIK.
      I wish I still had it. :)

    • cPickle has a feature for turning off instance support and
      for
      restricting which classes can be unpickled. You can set
      the find_global
      attribute on a cPickle.Unpickler. The find_global
      attribute can be
      a function or None. If it is None, then no instances can
      be
      unpickled. If it is a function, then it should accept a
      module and name
      and return the corresponding global. It is responsible
      for looking
      up the global and can raise an error to prevent a
      global's use.

      See the ZEO storage server implementation for an example
      of using this hook.

    @tim-one
    Copy link
    Member Author

    tim-one commented Nov 12, 2001

    Logged In: YES
    user_id=31435

    Why are people (Paul, Jeremy) concerned about eval'ing
    strings? cPickle and pickle both check that they're
    properly quoted, and this isn't sh or Perl: Python has
    no "dynamic" gimmicks buried in string literals. All
    eval'ing a string literal can do is produce a binary blob
    via interpeting simple escape sequences. They're like C
    strings this way -- maybe we'll run out of memory, but
    that's it.

    I would agree that Python should be refactored internally
    to supply a clean function for changing string literals
    into binary blobs, but that would be for clarity and
    efficiency, not security.

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 13, 2001

    Logged In: NO

    It's possible that the eval is safe, but validating that
    takes a line by line code review of all the paths that
    evaling a string can go through. It's also brittle in
    that maybe someone will change the evaluator (say to
    support Perl-like interpolation) in a future Python version
    and not remember to change the unpickler. Something like
    that has already happened with the Cookie module. My
    guess is that it happened with the pickle module--the
    whole point of that quoted string check is that the
    original pickle implementer didn't trust the input.
    The stuff about unpickling class instances was added
    later (maybe by another person) without remembering the
    security issue.

    Using eval this way is like storing a vat of cyanide in
    your child's bedroom. Sure, maybe if you check the seals
    and locks on the vat carefully enough, you can convince
    yourself that your child won't be able to get to the
    cyanide. But wouldn't you feel safer just not having the
    vat there at all? That's basic safety-consciousness.
    Security consciousness works the same way. Try to keep
    dangerous ingredients and attackers as far away from each
    other as possible.

    Paul

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 13, 2001

    Logged In: NO

    The find_global variable sounds encouraging and if it
    fixes this problem, that's great. I'd still feel better
    if the eval call goes away. Is removing it allowed
    under the 2.2 feature freeze? It doesn't affect anything
    visible, so no tests would have to change, etc.

    Paul

    @warsaw
    Copy link
    Member

    warsaw commented Nov 13, 2001

    Logged In: YES
    user_id=12800

    Assigning to Jeremy so he'll remember to forward me Jim's
    email. Jeremy can ping-pong it to Fred if he wants, and
    then reassign to me after forwarding the email.

    @warsaw
    Copy link
    Member

    warsaw commented Nov 16, 2001

    Logged In: YES
    user_id=12800

    I have rewritten the pickle documentation, and updated the
    marshal, cPickle, and copy_reg documentation. I believe all
    the documentation issues raised here have been fixed.

    Implementation issues will be pushed to Python 2.3, and the
    plan is to write a PEP to plan future work.

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 16, 2001

    Logged In: NO

    Barry, can you also do something about the Cookie module,
    or at least about its documentation? If Cookie is aliased
    to SmartCookie, I think that needs mention.
    --Paul

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 17, 2001

    Logged In: NO

    Are the new docs downloadable from somewhere? thanx --Paul

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Nov 18, 2001

    Logged In: NO

    Barry - the new docs just went up and they're a big
    improvement over the old. However the sentence
    "The issue here is usually not that a class's constructor
    will get called -- it won't by the unpickler -- but that the
    class's destructor (i.e. its __del__() method) might get
    called when the object is garbage collected." isn't
    correct.

    There's a flag in the pickle stream that tells
    the unpickler that the pickler found a __getinitargs__
    method at pickling time. If the flag is set in the pickle,
    then the unpickler calls the class constructor, whether
    there's a __getinitargs__ method in the class or not.

    The sample exploit that I posted earlier on,
    <http://www.nightsong.com/phr/python/pickletest.py\>,
    is an example of an artificially concocted pickle that
    makes a class constructor get called.

    @warsaw
    Copy link
    Member

    warsaw commented Nov 18, 2001

    Logged In: YES
    user_id=12800

    You're right of course. I meant to fix that and forgot.
    Will do so asap. Glad you like the rest of it, though! :)

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    Re-opening. With the changes to pickling in Python 2.3,
    the security material should be replaced with a warning to
    not unpickle untrusted data. Patch attached.

    @warsaw
    Copy link
    Member

    warsaw commented Feb 5, 2003

    Logged In: YES
    user_id=12800

    Karmically (no, not comically) reassigning to Tim who
    started this whole thing, and who happens to be dumping his
    pickles these days.

    @tim-one
    Copy link
    Member Author

    tim-one commented Feb 5, 2003

    Logged In: YES
    user_id=31435

    Andrew, didn't you go overboard in deleting text here? The
    __safe_for_unpickling__ check is gone in 2.3, but most of the
    rest of the text still applies. In particular, Cookie is still a
    problem for people inclined to worry about this, and overriding
    methods like find_global and load_global is still valuable stuff
    (the unpickler still never imports anything that doesn't come
    from an opcode triggering one of these methods).

    @warsaw
    Copy link
    Member

    warsaw commented Feb 5, 2003

    Logged In: YES
    user_id=12800

    I'll just mention that anybody using anything other than
    Cookie.SimpleCookie is insane, and even using the Cookie
    module at all to parse real-world cookies is mildly nuts
    because it's way too strict. I prefer m&m's in my cookies.

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    The Cookie classes that use pickle have DeprecationWarnings in
    2.3, and should disappear in 2.4. If {find,load}_global are
    still going to be supported, though, then they should still be documented (though, given that you shouldn't be unpickling untrusted data, why would you ever bother overriding find_global?)

    @tim-one
    Copy link
    Member Author

    tim-one commented Feb 6, 2003

    Logged In: YES
    user_id=31435

    I think there are several reasons to override these methods.
    The one most relevant to this bug report is that, while Python
    has stopped pretending that pickles are secure by default,
    the choke points are still there, and motivated users can still
    expolit them.

    For example, search pickle.py for __import__. The only
    occurrence of __import__ in the Unpickler class is in method
    find_class(), and that's by design. If a user overrides
    find_class(), the only imports the Unpickler *can* do are
    those the user explicitly performs in their own find_class()
    implementation. So if that's a notion of "security" a user is
    happy with, they can still have it. The docs trying to describe
    this are still valid. It's only the "by magic" safety checks that
    have gone away (and they were buggy anyway, so no loss).

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    OK; here's a reworked version that retitles the "Pickle security"
    section to "Subclassing Unpicklers", leaving the discussion of subclassing alone except for a few typo fixes.

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    Can I check this in?

    @tim-one
    Copy link
    Member Author

    tim-one commented May 13, 2003

    Logged In: YES
    user_id=31435

    Yes, please do! And thank you.

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    Checked in.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants