Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_swap_bodies is unsafe #47027

Closed
Rhamphoryncus mannequin opened this issue May 7, 2008 · 16 comments
Closed

set_swap_bodies is unsafe #47027

Rhamphoryncus mannequin opened this issue May 7, 2008 · 16 comments
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@Rhamphoryncus
Copy link
Mannequin

Rhamphoryncus mannequin commented May 7, 2008

BPO 2778
Nosy @rhettinger
Files
  • brokensetswap.py
  • python-setswap.diff
  • python-setswap-2.diff
  • python-setswap-3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2008-05-08.17:21:09.070>
    created_at = <Date 2008-05-07.02:19:30.625>
    labels = ['type-bug']
    title = 'set_swap_bodies is unsafe'
    updated_at = <Date 2008-05-14.22:11:52.663>
    user = 'https://bugs.python.org/Rhamphoryncus'

    bugs.python.org fields:

    activity = <Date 2008-05-14.22:11:52.663>
    actor = 'Rhamphoryncus'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2008-05-08.17:21:09.070>
    closer = 'rhettinger'
    components = []
    creation = <Date 2008-05-07.02:19:30.625>
    creator = 'Rhamphoryncus'
    dependencies = []
    files = ['10204', '10217', '10315', '10321']
    hgrepos = []
    issue_num = 2778
    keywords = ['patch']
    message_count = 16.0
    messages = ['66349', '66358', '66371', '66395', '66401', '66404', '66406', '66426', '66427', '66428', '66429', '66430', '66772', '66824', '66833', '66835']
    nosy_count = 2.0
    nosy_names = ['rhettinger', 'Rhamphoryncus']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2778'
    versions = []

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 7, 2008

    set_swap_bodies() is used to cheaply create a frozenset from a set,
    which is then used for lookups within a set. It does this by creating a
    temporary empty frozenset, swapping its contents with the original set,
    doing the lookup using the frozenset, then swapping the contents back
    and releasing the temporary frozenset.

    Unfortunately, the lookup can invoke arbitrary code, which may examine
    the original set and find it empty (until the lookup completes). The
    lookup may also save a reference to the temporary frozenset, which
    mutates to being empty after the lookup completes.

    The purpose seems to be allowing "someset in someotherset" to
    automatically convert someset to a frozenset. A brief search didn't
    reveal a rationale for this, and in fact PEP-218's history section
    claims the opposite: "Auto-conversion between mutable and immutable sets
    was dropped." Perhaps this is a forgotten vestige of that?

    set_intersection_update uses set_swap_bodies for a different purpose and
    it may be safe. It depends on whether subclasses of set may retain a
    reference to the tmp set somehow.

    @Rhamphoryncus Rhamphoryncus mannequin added the type-bug An unexpected behavior, bug, or error label May 7, 2008
    @rhettinger
    Copy link
    Contributor

    set_swap_bodies() is fine as it involves pure C with no possible
    callbacks. The issue is in its use in the __contains__() check. I'll
    take a look at it and see what if anything needs to be changed.

    Am lowering the priority because you really have to be trying to create
    this behavior.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 7, 2008

    The intended use is unsafe. contains, remove, and discard all use it
    for a lookup, which can't be fixed.

    Upon further inspection, intersection_update is fine. Only a temporary
    set (not frozenset!) is given junk, which I don't see as a problem.

    If someone can confirm that contains/remove/discard's usage is an
    artifact I'll submit a patch that removes it.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 8, 2008

    I decided not to wait. Here's a patch.

    Several of set's unit tests covered the auto-conversion, so I've
    modified them.

    @rhettinger
    Copy link
    Contributor

    Rejecting this patch which simply disables a feature that some users
    consider to be important.

    I will look at it further. Right now, I'm inclined to simply document
    that the three temporary autoconversions deny meaningful
    contemporaneous access to a set used as a key.

    The set_swap_bodies() function itself is fine -- it behaves just like
    an atomic version of the pure python sequence: t=set(a); a.clear();
    a.update(b); b.clear(); b.update(t); del t.

    The issue is simply that the swap/search/swap dance allows the
    possibility that a determined user could graft onto the search step and
    access but not modify the temporary swapped-in frozenset. It doesn't
    crash; it simply produces an undefined result. I'm not losing sleep
    over this scenario.

    I'm am entertaining an alternative where contains/discard/remove would
    duplicate instead of swap the set bodies; however, that approach may do
    more harm than good.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 8, 2008

    PEP-218 explicitly dropped auto-conversion as a feature. Why should
    this be an exception?

    @rhettinger
    Copy link
    Contributor

    If needed, I'll update the PEP to be more clear. The sets.py module
    had two protocols: __as_immutable__() and __as_temporarily_immutable__
    (). The first protocol was the one that was dropped -- it triggered
    with something like "s1.add(s2)" and it automatically created a
    frozenset copy of s2. The second protocol is used in
    contains/remove/discard. Alex Martelli successfully lobbied for its
    retention.

    @rhettinger
    Copy link
    Contributor

    Added documentation in r62873. Leaving the code as-is.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 8, 2008

    I've been unable to find any discussion on this feature. If anything, I
    think when PEP-218 was discussed and accepted (and PEP-351 rejected),
    the assumption was it didn't exist. Adding it now should be regarded as
    a new feature and discussed on python-dev.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 8, 2008

    Nevermind that the current implementation *is* broken, even if you
    consider fixing it to be a low priority. Closing the report with a doc
    tweak isn't right.

    @rhettinger
    Copy link
    Contributor

    Sorry, you don't like the search with autopromotion feature. It has
    been around since sets were first coded in C. It is a natural
    extension/consequence of the idea that frozenset('abc')==set('abc').

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 8, 2008

    So why doesn't set() in {} work? Why was PEP-351 rejected when it would
    do this properly?

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 13, 2008

    Here's another approach to avoiding set_swap_bodies. The existing
    semantics are retained. Rather than creating a temporary frozenset and
    swapping the contents, I check for a set and call the internal hash
    function directly (bypassing PyObject_Hash).

    I even retain the current semantics of PySet_Discard and PySet_Contains,
    which do NOT do the implicit conversion (and have unit tests to verify
    that!)

    I do have some concern that calling PySet_Check on every call may be too
    slow. It may be better to only call it on failure (which is
    more-or-less what the old code did.)

    set_swap_bodies has only one remaining caller, and their use case could
    probably be significantly simplified.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 14, 2008

    Revised again. sets are only hashed after PyObject_Hash raises a TypeError.

    This also fixes a regression in test_subclass_with_custom_hash. Oddly,
    it doesn't show up in trunk, but does when my previous patch is applied
    to py3k.

    @rhettinger
    Copy link
    Contributor

    By replacing temporary immutability with temporary hashability, does
    this approach create the possibility that someone could mutate the key-
    set during a search? Is it possible to get the __eq__ check out-of-
    sync with the __hash__ value?

    Also, after a quick look at the patch, I'm not too keen on any
    modifications to set_swap_bodies() nor with changing the signature of
    set_contains_key().

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented May 14, 2008

    There is no temporary hashability. The hash value is calculated, but
    never stored in the set's hash field, so it will never become out of
    sync. Modification while __hash__ or __eq__ is running is possible, but
    for __eq__ that applies to any mutable type.

    set_contains_key only has two callers, one for each value of the
    treat_set_key_as_frozen argument, so I could inline it if you'd prefer that?

    set_swap_bodies has only one remaining caller, which uses a normal set,
    not a frozenset. Using set_swap_bodies on a frozenset would be visible
    except in a few special circumstances (ie it only contains builtin
    types), so a sanity check against that seems appropriate. The old code
    reset ->hash to -1 in case one of the arguments was a frozenset -
    impossible now, so I sanity check that it's always -1.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant