Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly specify MyClass.__subclasses__() returns classes in definition order #78986

Closed
pekkaklarck mannequin opened this issue Sep 26, 2018 · 21 comments
Closed

Explicitly specify MyClass.__subclasses__() returns classes in definition order #78986

pekkaklarck mannequin opened this issue Sep 26, 2018 · 21 comments
Assignees

Comments

@pekkaklarck
Copy link
Mannequin

pekkaklarck mannequin commented Sep 26, 2018

BPO 34805
Nosy @gvanrossum, @rhettinger, @MojoVampire, @pablogsal, @miss-islington, @tirkarthi, @BNMetrics
PRs
  • bpo-34805: Guarantee that __subclasses__() is in definition order. #23844
  • [3.9] bpo-34805: Guarantee that __subclasses__() is in definition order. (GH-23844) #23850
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2020-12-19.01:17:58.448>
    created_at = <Date 2018-09-26.06:18:03.007>
    labels = []
    title = 'Explicitly specify `MyClass.__subclasses__()` returns classes in definition order'
    updated_at = <Date 2020-12-19.01:17:58.447>
    user = 'https://bugs.python.org/pekkaklarck'

    bugs.python.org fields:

    activity = <Date 2020-12-19.01:17:58.447>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2020-12-19.01:17:58.448>
    closer = 'rhettinger'
    components = []
    creation = <Date 2018-09-26.06:18:03.007>
    creator = 'pekka.klarck'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34805
    keywords = ['patch']
    message_count = 21.0
    messages = ['326420', '327800', '327815', '329389', '329395', '329400', '329402', '329437', '329438', '329439', '329441', '329445', '329446', '329448', '352241', '352242', '352243', '352247', '352254', '383341', '383350']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'rhettinger', 'pekka.klarck', 'josh.r', 'pablogsal', 'miss-islington', 'xtreak', 'BNMetrics']
    pr_nums = ['23844', '23850']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34805'
    versions = []

    @pekkaklarck
    Copy link
    Mannequin Author

    pekkaklarck mannequin commented Sep 26, 2018

    I had a use case where MyClass.__subclasses__() returning classes in the definition order would have made the code simpler. They seemed to be returned in that order, but because the docs didn't say anything about it [1] I did some searching to find out can I trust the order to stay the same also in the future.

    Based on my searches it seems that prior to Python 3.5 the information was stored in a list and that preserved the order. In Python 3.5 the information was stored into a dict for performance reasons as part of bpo-17936 [2]. Back then dicts weren't ordered, so the order is undefined in Python 3.5, but in Python 3.6+ the order is again preserved. In practice Python 3.5 is the only current version where the order of __subclasses__() is not defined.

    My proposal is to make the current, and old, behavior of returning classes from __subclassses__() in definition order explicit. If nobody has anything against that, I'd be willing to provide a pull request updating docs and adding unit tests making sure the order is not messed up in the future. Let me also know if even this kind of simple changes would need to go through python-ideas or python-dev. The PEP process feels overkill when there are no code changes required.

    PS: I know both Antoine and Guido commented bpo-17936 [3] that they don't know any use cases for the order. I can explain my use case if someone is interested.

    [1] https://docs.python.org/3/library/stdtypes.html?highlight=__subclasses__#class.\_\_subclasses__
    [2] https://bugs.python.org/issue17936
    [3] https://bugs.python.org/issue17936#msg189959

    @pablogsal
    Copy link
    Member

    Hi and thank you for the proposal. Could you detail a bit more about your use-case? Thanks

    @pekkaklarck
    Copy link
    Mannequin Author

    pekkaklarck mannequin commented Oct 16, 2018

    My use case was implementing conversion of strings to different objects based on type information got from function arguments. Initially I had a converter class with methods for each conversion (e.g. _convert_bool, _convert_int) but this class got so big that I decided to split each converter into a separate class (e.g. BooleanConverter, IntegerConverter) with a common base class. The base class also has a converter_for classmethod that is a factory that returns a concrete converter for a certain type.

    For the factory to be able to return a correct converter, it obviously needs to know what converters exist and what types they handle. My first idea was to simply go through cls.__subclasses__() and return the first one that handles the specified type, but because order matters for us this doesn't work. The reason order matters is that we handle both Boolean and integer types, and bool being a subclass of int means we need to handle the former first.

    Because I couldn't use cls.__subclasses__(), I needed to implement a custom way for the converters to register themselves. That wasn't particularly complicated but some extra work anyway.

    The code I wrote happens to be open source and visible at [1] if you are interested to look it more closely. The @TypeConverter.register stuff could have been avoided is cls.__subclasses__() were ordered. It could also be removed once we drop Python 3.5 support if order is guaranteed to be preserved going forward.

    [1] https://github.com/robotframework/robotframework/blob/master/src/robot/running/arguments/typeconverters.py

    @BNMetrics
    Copy link
    Mannequin

    BNMetrics mannequin commented Nov 6, 2018

    Hi Pekka Klärck, I would like to work on this issue and make a PR if you haven't done it yet.

    Thanks!:)

    @pekkaklarck
    Copy link
    Mannequin Author

    pekkaklarck mannequin commented Nov 6, 2018

    Haven't created a PR yet. Go ahead and great one if you have time Luna! We'd need a decision about this too, but if the decision is no, then it would nevertheless be a good idea to mention in the docs that the order is not guaranteed.

    @pablogsal
    Copy link
    Member

    I would kindly suggest waiting until there is some consensus on this before doing any Pull Request.

    @gvanrossum
    Copy link
    Member

    Since this is how it works since 3.6, I think it's reasonable to make this part of the spec. For dictionary order we waited a release between implementing it that way and making it part of the spec; effectively we've already waited two releases (3.6 and 3.7) for this. So I say let's do it. This isn't something that's PEP-worthy anyway.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 7, 2018

    First off, the OP's original case seems like a use case for functools.singledispatch. Not really related to the problem, just thought I'd mention it.

    Secondly, are we sure we want to make such a guarantee? That restricts the underlying storage to ordered types (list/dict; possibly tuple at the cost of making modifications slightly more expensive), or an unordered type with additional ordering layered on it (like old-school OrderedDict).

    That does tie our hands in the future. For example, it seems like it would be a perfectly reasonable approach for the internal collection of subclasses to be implemented as a weakref.WeakSet (some future version of it implemented in C, rather than the current Python layer version) so as to reduce code duplication and improve handling when a subclass disappears. Right now, tp_subclasses is a dict keyed by the raw memory address of the subclass (understandable, but eww), with a value of a weakref to the subclass itself. There is tons of custom code involved in handling this (e.g. the dict only self-cleans because the dealloc for classes explicitly removes the subclass from the parent classes, but every use of the dict still has to assume weakrefs have gone dead anyway, because of reentrancy issues; these are solved problems in WeakSet which hides all the complexity from the user). Being able to use WeakSets would mean a huge amount of special purpose code in typeobject.c could go away, but guaranteeing ordering would make that more difficult (it would require providing an ordering guarantee for WeakSet, which, being built on set, would likely require ordering guarantees for sets in general, or changing WeakSet to be built on dicts).

    There is also (at least) one edge case that would need to be fixed (based on a brief skim of the code). type_set_bases (which handles assignment to __bases__ AFAICT, admittedly a niche use case) simplified its own implementation by making the process of changing __bases__ be to remove itself as a subclass of all of its original bases, then add itself as a subclass of the new bases. This is done even if there are overlaps in the bases, and even if the new bases are the same.

    Minimal repro:

        >>> class A: pass
        >>> class B(A): pass
        >>> class C(A): pass
        >>> A.__subclasses__()  # Appear in definition order
        [__main__.B, __main__.C]
    
        >>> B.__bases__ = B.__bases__    # Should be no-op...
        >>> A.__subclasses__()           # But oops, order changed
        [__main__.C, __main__.B]

    I'm not going to claim this is common or useful (I've done something like this exactly once, interactively, while making an OrderedCounter from OrderedDict and Counter back before dicts were ordered; I got the inheritance order wrong and reversed it after the fact), but making the guarantee would be more than just stating it; we'd either have to complicate the code to back it up, or qualify the guarantee with some weird, possibly CPython-specific details.

    @gvanrossum
    Copy link
    Member

    Thanks for the long post! Clearly there is more here than the eye can easily see.

    Nevertheless, I feel that, *in this case*, it's not likely that such a re-implementation will ever happen, so I think it is okay to constrain the future so we can guarantee (the ordering aspect of) the current behavior. The current behavior also *feels* natural, regardless of the validity of the OP's use case.

    The edge case of assignment to __bases__ is a good one to call out (in the docs and in the test) but I don't think the current behavior there is sufficiently dicey to change it or to exclude it from the guarantee.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 7, 2018

    I'm also a little skeptical of the OP's proposed use case for other reasons. In any circumstance other than "all classes are defined in the same module", you can't really make useful guarantees about subclass definition order, because:

    1. If the converters are defined in multiple modules in a single package, the module with IntConverter could be imported first explicitly, and now BoolConverter will come second.

    2. If all the provided converters occur in a single monolithic module, and some other package tries to make a converter for their own int subclass, well, IntConverter is already first in the list of subclasses, so the other package's converter will never be called (unless it's for the direct subclass of int, rather than a grandchild of int, but that's an implementation detail of the OP's project).

    Essentially, to write such a class hierarchy properly, you'd need to rejigger the ordering each time a class was registered such that any converter for a parent class was pushed until after the converter for all of its descendant classes (and if there is multiple inheritance involved, you're doomed).

    Even ignoring all that, their use case doesn't require explicit registration if they don't want it to. By making a simple metaclass for the root class, the metaclass's __new__ can perform registration on the descendant class's behalf, either with the definition time ordering of the current design, or with a more complicated rejiggering I described that would be necessary to ensure parent classes are considered after child classes (assuming no multiple inheritance).

    @gvanrossum
    Copy link
    Member

    Josh, I have to ask -- do you have plans to rewrite the __subclasses__
    implementation? Because at this point I don't really feel like arguing over
    the OP's use case. It looks like you have a strong objection over the
    requested feature that's quite independent from the validity of any use
    cases, and I'm curious why. The argument "but then we couldn't change the
    implementation in some future version" feels pretty weak.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 7, 2018

    I wrote the response to the OP's use case before I saw your response; it wasn't really intended as an additional critique of the proposed change or a counterargument to your post, just a note that the required behavior could be obtained on all versions of Python via metaclasses, including on 3.5.

    I have no specific plans to rewrite the typeobject.c, nor make a C implemented WeakSet. I'm just leery of adding language guarantees that limit future development when they:

    1. Provide very little benefit (I doubt one package in 10,000 even uses __subclasses__, let alone relies on its ordering)

    2. The benefit is achievable without herculean efforts with existing tools (metaclasses can provide the desired behavior with minimal effort at the trivial cost of an additional side-band dict on the root class)

    If the guarantee never limits a proposed change, then our best case scenario is we provided a guarantee that benefits almost no one (guaranteed upside minimal). But if it limits a proposed change, we might lose out on a significant improvement in performance, code maintainability, what have you (much larger potential downside). I'm just not seeing enough of a benefit to justify the potential cost.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 7, 2018

    Keep in mind, had this guarantee been in place in 3.4, the improvement in 3.5 couldn't have been made, and bpo-17936 might have been closed and never addressed, even once dicts were ordered, simply because we never rechecked it. It makes the whole "potential downside" more obvious, because we would have paid that price not so long ago. Knowing that 3.5 improved by breaking this guarantee was part of what made me cautious here.

    @gvanrossum
    Copy link
    Member

    OK, I don't have time to keep arguing, and I doubt any other core devs
    care. We should probably just close it as Won't Fix, since it's not
    important enough to keep deliberating.

    Luna, I recommend that you pick another project.

    @rhettinger
    Copy link
    Contributor

    If there are no objections, I would like to revive this. All we need to do is add a one-line guarantee to the docs and a test to back it up.

    Except for the aberration on Py3.5, add_subclass() tracks new subclasses in the order created. This behavior is intuitive and potentially useful. Unless there is a compelling reason to switch the underlying container to a set object, no other reasonable implementation choice would upset the current behavior.

    I think the OP's request was reasonable. For us, guaranteeing the current behavior is not a difficult thing to do. If we don't, the alternative for the user isn't reasonable. They would need to write a new metaclass that does almost exactly what we already do, except that they can guarantee the use of an ordered collection. This seems silly when we already use an ordered collection but haven't made it a promise.

    @rhettinger rhettinger reopened this Sep 13, 2019
    @rhettinger rhettinger self-assigned this Sep 13, 2019
    @gvanrossum
    Copy link
    Member

    Thank you Raymond.

    Luna, are you still interested?

    @BNMetrics
    Copy link
    Mannequin

    BNMetrics mannequin commented Sep 13, 2019

    Yes I am! :)
    Should I start looking into this?

    @gvanrossum
    Copy link
    Member

    It's just a small doc change right? I'd just makle a PR and see if Raymond
    accepts it.

    @pekkaklarck
    Copy link
    Mannequin Author

    pekkaklarck mannequin commented Sep 13, 2019

    First of all, thanks Raymond for the revival. Secondly, I agree with Josh that there are better ways to handle my original use case (e.g. functools.singledispatch) but __subclasses__() preserving the definition order could nevertheless be useful in other cases.

    The main reason I proposed this issue was to get the behavior documented one way or the other. I'd prefer the current behavior if it doesn't cause any/much extra work nor or later. It's hard for me to see why subclasses weren't stored in a dict in the future, but problem assigning to __bases__ pointed out by Josh might be worth a though. Not sure is it OK to just consider this kind of side effects OK in such a niche case, should the related code be changed, or is this big enough problem to prevent preserving subclass order in the first place.

    Anyway, if the decision is to preserve the order, I'd say a test making sure the order is actually preserved would be needed in addition to the doc change. If the decision is to keep the order undefined, then the docs should preferably be updated anyway to make the situation clear.

    @rhettinger
    Copy link
    Contributor

    New changeset 51f4688 by Raymond Hettinger in branch 'master':
    bpo-34805: Guarantee that __subclasses__() is in definition order. (GH-23844)
    51f4688

    @rhettinger
    Copy link
    Contributor

    New changeset 7826658 by Miss Islington (bot) in branch '3.9':
    bpo-34805: Guarantee that __subclasses__() is in definition order. (GH-23844) (GH-23850)
    7826658

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants