Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design of HWAddress equality and hashing #3

Closed
duynhaaa opened this issue Jan 19, 2022 · 21 comments
Closed

Design of HWAddress equality and hashing #3

duynhaaa opened this issue Jan 19, 2022 · 21 comments

Comments

@duynhaaa
Copy link
Contributor

duynhaaa commented Jan 19, 2022

This issue is the continuation of the discussion that happens within the pull request #2. Here's @mentalisttraceur's dump of thoughts:

Quick question: do you have any thoughts on different classes, or classes with different names, hashing differently vs the same?

I was thinking that the right default is for subclass instances to compare equal to their base class instances by default (they don't right now, despite the __eq__ docstring saying that they do, due to a bug that I haven't gotten around to committing a fix for) and this means that subclasses should probably hash the same way as their parent class.

Couple examples of this:

  1. The MAC and EUI48 classes. I think the most sensible behavior is that they're effectively aliases, so much so that maybe I should have just done MAC = EUI48 instead of class MAC(EUI48): .... However, maybe someone sees a good use-case for them to count as two different members of a set?

  2. MAC and the MACAllowsTrailingDelimiters example - this seems like a totally clear-cut case of a situation where the difference of type or even just type name shouldn't cause objects to compare or hash differently. Of course, it was kinda bad on my part to suggest changing formats by subclassing. If I provide and encourage a better way, then this becomes irrelevant.

But the current idea of class-aware equality comparison creates a weird hashing and set membership situation.

Sets and dictionaries and so on consider things the same if they are equal, but they expect and rely on equal things to hash the same, right?

So for example, if we have two subclasses of MAC, then they both compare equal to MAC instances with the same address (once the bug I described is fixed) but they compare unequal to each other. Even if we get hashing to be the same, this means we can get different sets purely based on the order of adds to those sets! That's messed up.

For another example, if two HWAddress types compare equal, then you have to know how they hash, and if they hash differently then I think it becomes indeterministic if they get counted as one member or two in the set (deterministic only if we know the underlying implementation details and all operations done, like how many buckets are in the underlying hash table at any given moment and how the hash method's hash is mapped to each bucket).

My current thoughts are:

  1. If addresses just reject anything that's not exactly the class that they are, it breaks the use of subclasses for various stuff, but is easy to fix by overwriting equality operations as appropriate, which could "cast" self to the class that we want to be equivalent to forward to that equality method. This is consistent with a hash by type and address value. The only reasons right now for classes to do subclassing like this are the MAC/EUI-48 situation (honestly I should probably just alias them), the formatting situation (I should probably expose the relevant logic so people could do it without subclassing - how often is there a good reason for this to actually come up anyway? never?)

  2. If classes compare equal by just size and address value, then this is consistent with hashing by size and value. It might also break situations people have with identically-sized addresses that they want to be distinct (when though? is that a real case? I can imagine cases, but real ones? Like what if you have token ring and Ethernet MAC addresses - you probably want logic to distinguish them at the type level, but for equality or hashing purposes you might not? If you have two different hardware address types that have the same bit pattern, are you more benefitted from a "if equal: bad" case being easy to implement, or from keeping them both from colliding in a set or as dictionary keys?)

Of course I should also keep my eyes on the common cases! Those should guide a lot of these choices. The uncommon cases should just be reasonably doable.

The common case has no subclassing beyond what this library does.

The commonest appropriate use case for subclassing a HWAddress subclass would presumably be to add unrelated enriching behavior, and probably should not effect equality or hashing.

Treating MAC and EUI48 as equivalent in all ways seems like it is the right behavior more often than not. (In particular, a reasonable situation is that a user writes code like if is instance(something, Mac address.MAC), and this should not break if something was initialized by someone who used macaddress.EUI48.) So I think I've decided that MAC and EUI48 should be aliases of each other. This is a minor breaking change that would probably effect no one. The one downside of this is that error messages use the class name, so you could get error messages that refer to the other class name. Maybe this means that MAC just shouldn't exist as a name exported by this library, but that's a much bigger breaking change.

Alright that concludes this chunk of the thought dump.

@duynhaaa
Copy link
Contributor Author

That's quite a train of thoughts you have there. I'll try my best to response to all of them so if something is missing or needs clarification, please let me know.

@duynhaaa duynhaaa reopened this Jan 19, 2022
@duynhaaa
Copy link
Contributor Author

Please ignore me accidentally closing this issue

I was thinking that the right default is for subclass instances to compare equal to their base class instances by default (they don't right now, despite the __eq__ docstring saying that they do, due to a bug that I haven't gotten around to committing a fix for) and this means that subclasses should probably hash the same way as their parent class.

I believe that the comparison should only be possible for concrete base classes and their subclasses, but not for abstract base classes if there's one. For example, comparing a MAC instance with a HWAddress should not be possible because "hardware address" is an abstract concept while MAC address is very concrete. On the other hand, comparing a MAC with any other derived MAC should be sensible, because at the end of the day, they are still MAC addresses. Well, unless the derived MAC is no longer a MAC address - when the subclass discards the original purpose and only reuses the representation and behaviors - but that's the story of bad designing and programming.

To conclude this, comparisons should only be implemented starting from the concrete class rather than the abstract base class to prevent leaking logic to other concrete classes. This is where I draw the line.

Hashing is a different story though. As far as I'm aware, the main purpose of hashing is to differentiate objects when used data structures such as set, frozenset, and dict source. Therefore, I believe that instances of different classes should produce different hashes, whether they share the same base class or not. For example, when a MAC instance and a DerivedMAC instance with the same value are added to a set, it is expected to see these two objects in the set, rather than just only one. After all, they are, semantically different objects.

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Jan 20, 2022

Oh I see, sorry, I didn't make this clear enough: in Python, it is not allowed for two objects to both

  1. compare equal, and
  2. hash differently.

From the Python documentation for __hash__:

The only required property is that objects which compare equal have the same hash value

Basically __eq__ and __hash__ must be consistent: if two things compare equal, they must hash to the same value (if they don't compare equal, they are allowed hash to either a different value or the same value, but if too many unequal objects hash to the same value it is bad for the performance of dictionaries, sets, and so on).

Otherwise you can get a bug that is effectively random and rare, where those two objects usually count as different objects, but sometimes count as the same object, in things like dictionaries or sets.

(This makes sense if you consider that the normal way to implement things like dictionaries and sets is with a hash table. Hash tables internally have some number of "buckets", which are lists in which elements are stored, and the number of buckets changes dynamically depending on stuff like how many objects are stored in the table. The hash of an object controls which bucket an object gets placed in, and it is normal for many hash values to map to the same bucket. A simple version would be to do hash_value % number_of_buckets to decide which bucket the object goes into. But then since buckets can hold more than one object, equality checks are used to decide if an object is the same as any other object in the bucket. So if two objects hash differently but compare equal, they may end up counting as the same object or not, depending on what their hash values are and how many buckets there are and how the implementation uses the hash value to decide what bucket to put the object into.)

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Jan 20, 2022

You may notice that the world could actually solve the "hash must be consistent with equality" requirement. A hash table implementation could simply do hash(thing) == hash(bucket[i]) and thing == bucket[i] instead of just doing thing == bucket[i], once it has found the right bucket for a thing and is checking if it's already in it or not.

This would of course require recalculating or caching the hash value for every item in the bucket. That's probably one reason why it's not done. It adds overhead to the implementation, just to enable having more notion of "is this thing the same as that thing" in the language.

(Although it seems possible for the caching to be worth doing anyway for a speedup in cases where the hash table has a lot of rebalancing (changing the number of buckets), if the extra space overhead doesn't cause the whole thing to get big enough to lose more speed thanks to cache misses.)

Of course, there is some design benefit to minimizing the number of different but similar "is this thing the same as that thing" concepts in a language. The less different concepts and details developers have to understand and remember just to use something, the less room there is for errors in code due to edge cases that don't match human expectations.

Plus, we don't need set membership and dict keys to be based on something other than equality, because it's very easy to wrap equal objects to add a dimension of difference: for example, (type(my_object), my_object) if we want a set member or dict key to be distinct from another if are different classes.

@duynhaaa
Copy link
Contributor Author

Thank you for enlightening me more on how hashed collections work. Do you have the reference link so that I indulge myself into?

Now that I know that two objects cannot be equal and produce different hashes at the same time, I'm leaning more on the "different class, different hash" side. As in your original thought, there will be a problem with the EUI48 and MAC since they basically are aliases. But because of you decided on the "subclass route", I, as an end-user, will always expect both EUI48 and MAC objects to exist separately in a hashed collections. I guess that's a trade-off though, either you force users to learn that EUI48 and MAC are the same and educate themselves on the latest IEEE specification in the "alias route", or spoiling them with convenient debugging messages in the "subclass route". That's your call. Even if this introduces some breaking changes, I think it's worth it because technology changes and breaks more frequently than ever.

Other than that, I don't see any further problem, assuming that you still uses both the integer address value and the format structure of the address as hashing component.

Also, I don't think there is a need to create additional subclass for what the library already provided. Honestly, there shouldn't be. And if there's, that's insane!

@mentalisttraceur
Copy link
Owner

Do you have the reference link so that I indulge myself into?

Well, the implementation of set in the Python source might be a good example, if you're fine with lower-level C code and want to work your way through the exact optimizations.

@duynhaaa
Copy link
Contributor Author

Do you have the reference link so that I indulge myself into?

Well, the implementation of set in the Python source might be a good example, if you're fine with lower-level C code and want to work your way through the exact optimizations.

I'm totally not fine at all with C, so it will probably take a while for me to digest and dipping into the low level stuffs, but thanks for the tip.

Anyway, I'm not well versed in networking, so it's your call on how you want your library to be. But with the exploding number of networking devices due to IoT, I would prefer following the IEEE specification, and, maybe, guiding users to a more updated standard?

@mentalisttraceur mentalisttraceur changed the title Make HWAddress hashable Design of HWAddress equality and hashing Jan 20, 2022
@mentalisttraceur
Copy link
Owner

Verdict (when+if I have time+motivation, I will post more of my reasoning for these choices):

  1. In version 1.* releases, equality behavior will stay the same. The docstring's claim about subclasses being equal, which was what I originally intended but which is inconsistent with the behavior I actually had in every release so far, will be treated as mistaken.

  2. I will release version 1.2.0 with hashing. Because equality behavior will not change within 1.*, instances of different classes, even if one is a subclass of the other, will be distinct as set members and dictionary keys, regardless of hashing choice.

  3. In version 2.0.0, equality will be changed so that any HWAddress instance, including all subclasses, can be equal to any other, if the address size and address value are equal. Of the provided classes, this only effects EUI48 and MAC behavior, because all of the others have a distinct size.

  4. In version 2.0.0, I will hash on size and address integer, to match the changes in equality logic.

  5. In version 1.2.0, I will hash on the class and address integer. Because ideally we maximize how often instances that won't compare equal hash differently.

  6. The exact hash choice in both versions will be an implementation detail - not a guaranteed part of the API. So this can be changed later if there is good reason.

I'm closing this issue because I like open/closed status to mean "does this need resolution?" and my mind is made up now to my satisfaction, but further comments or arguments are always welcome, even if an issue is closed.

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Jan 27, 2022

Re: the EUI-48 vs "MAC" issue specifically:

I created #9 for further discussion of that topic in its own right.

As far as it concerns this issue, I saw the cases like this:

  1. If we decide on no separate MAC subclass of EUI48, then these are totally independent decisions.
  2. If we decide that a MAC subclass instance should be equal to an EUI48 class instance of the same address (same as set member and mapping key), then that is evidence in favor of the subclasses-equal way.
  3. If we see a good reason for them to be unequal for the same address (or to be distinct as set members or mapping keys), then this is evidence in favor of the subclasses-not-equal way.

Options 1 and 2 are equivalent for practical purposes with regard to having this library provide a MAC vs EUI-48 distinction out-of-the-box. (Either way if you want that distinction, you have to take a conscious subclassing/composition step to create an address class which does not compare equal and does not hash equal. In one of them you would be forced to explicitly cause inequality.)

To a large extent I share @gytqby's suggestion/ preference for guiding users towards the "EUI-48" name and the EUI48 class over the "MAC" name and MAC class. This suggests option 1.

If not option 1, then option 2 seems by far more useful than option 3 to me. More on that later.

@mentalisttraceur
Copy link
Owner

Re:

I don't think there is a need to create additional subclass for what the library already provided. Honestly, there shouldn't be. And if there's, that's insane!

Yeah, I think in the typical case, no one needs any other addresses.

However, there is the occasional good use-case.

For example, I looked around in the projects listed by GitHub's "used by" feature on this repo, and found a real-world usage of subclassing to support custom formats.

(In that situation, the subclass is just being used for validation, but we could easily imagine a similar use-case where the constructed address class is used throughout the program to represent the address, and equality and so on is actually relied-upon.)

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Feb 4, 2022

@gytqby I made a simplified Python sketch of how a mapping type like dict could be implemented using a hash table:

[click to show code]
class Map:
    def __init__(self):
        # At its core, a hash table is just a list of
        # "buckets" - each bucket is itself a small
        # list that holds a few of the items in the
        # hash table.
        self._bucket_list = []

    def __setitem__(self, key, value):
        number_of_buckets = len(self._bucket_list)

        # If there are no buckets yet, we need to add
        # one so that the new item has a place to go.
        if number_of_buckets < 1:
            self._bucket_list.append([])
            number_of_buckets = 1

        # So here's the key point about hash tables:
        # they try to evenly distribute items into
        # the buckets. This is where the speedup of
        # hash tables over simple lists comes from:
        # if we break up the list of all the items
        # into many little lists, and have some way
        # to pick the right little list for an item,
        # we never need to work with one big list.
        # And a hash function is one way to pick.
        bucket_index = hash(key) % number_of_buckets
        bucket = self._bucket_list[bucket_index]

        # This is why:
        #
        # 1. The hash function is supposed to produce
        #    evenly distributed numbers. If it did
        #    not, there would be a bias in which
        #    bucket things go into. For example, if
        #    the hash is always zero, then all keys
        #    will end up going in the first bucket.
        #
        # 2. The hash function needs to be immutable
        #    and deterministic. If the hash for the
        #    same key object changes, we could get
        #    into situations where a key is already
        #    in one bucket, but its hash function
        #    currently maps it to another bucket.
        #    So gets, sets, and deletes would have
        #    to search every bucket anyway, losing
        #    all the speedups relative to a list.
        #    This is also why mutable objects are
        #    either not hashable, or do not hash on
        #    their mutable parts.

        # (By the way, notice how Python's `hash` and
        # `__hash__` make it so that our hash table
        # code doesn't need to know how to hash each
        # possible type of key. Instead, each class
        # brings its own appropriate hashing.)

        # We store both the key and the value in the
        # bucket. The next step will explain why.
        item = (key, value)

        # In case other items have already been added to
        # the hash table, we need to check if the key is
        # already in the bucket, so that we can update
        # that item instead of adding a duplicate item:
        index = _find_key_in_bucket(bucket, key)
        if index == -1:
            # `key` not in bucket:
            bucket.append(item)
        else:
            bucket[index] = item

        # If we keep adding items, the buckets get
        # bigger, and we get closer to performance
        # of a simple list, because loops like the
        # one in `_find_key_in_bucket` have to
        # look at each item in the bucket.

        # The solution is to "rebalance" the hash table
        # whenever it gets too big: create more buckets
        # and evenly redistribute the items across the
        # new number of buckets - more on that later.
        self._rebalance_if_needed()

    def __getitem__(self, key):
        number_of_buckets = len(self._bucket_list)
        if number_of_buckets < 1:
            raise KeyError('key not in map')

        bucket_index = hash(key) % number_of_buckets
        bucket = self._bucket_list[bucket_index]

        index = _find_key_in_bucket(bucket, key)
        if index == -1:
            raise KeyError('key not in map')
        item = bucket[index]
        key, value = item
        return value

    def __delitem__(self, key):
        number_of_buckets = len(self._bucket_list)
        if number_of_buckets < 1:
            raise KeyError('key not in map')

        bucket_index = hash(key) % number_of_buckets
        bucket = self._bucket_list[bucket_index]

        index = _find_key_in_bucket(bucket, key)
        if index == -1:
            raise KeyError('key not in map')
        del bucket[index]

        # If we keep removing items, the buckets get
        # empty, and we end up wasting space, both
        # for the bucket list reference and for any
        # memory that the bucket has allocated which
        # it is still holding onto.

        # The solution is to "rebalance" the hash table
        # whenever it gets too small: reduce the number
        # of buckets and evenly redistribute the items.
        self._rebalance_if_needed()

    def items(self):
        # To get at all items we can just go into
        # each bucket and grab everything in it,
        # we don't need to bother with hashes.
        for bucket in self._bucket_list:
            for item in bucket:
                yield item
                # So for example, if `self._bucket_list`
                # is `[[foo, bar], [qux]]`, then `items`
                # yields `foo`, `bar`, `qux`.

    def keys(self):
        for key, _ in self.items():
            yield key

    def values(self):
        for _, value in self.items():
            yield value

    def __len__(self):
        return sum(map(len, self._bucket_list))

    def _rebalance_if_needed(self):
        # For this teaching example we'll target one
        # bucket for each item, and rebalance on any
        # deviation from that. But since rebalancing
        # is costly, in a mature real-world hash
        # table we would use some logic to minimize
        # the amount of rebalancings and to do them
        # when it brings the most benefit.
        desired_bucket_amount = len(self)
        current_bucket_amount = len(self._bucket_list)
        if desired_bucket_amount == current_bucket_amount:
            return
        new_bucket_list = [[] for _ in range(desired_bucket_amount)]
        for item in self.items():
            key, _ = item
            bucket_index = hash(key) % desired_bucket_amount
            bucket = new_bucket_list[bucket_index]
            bucket.append(item)
        self._bucket_list = new_bucket_list

    def __repr__(self):
        return '<Map ' + repr(self._bucket_list) + '>'


def _find_key_in_bucket(bucket, key):
    for index, preexisting_item in enumerate(bucket):
        preexisting_key, preexisting_value = preexisting_item
        if preexisting_key == key:
            return index
    return -1

You may find this more helpful as an introductory reference than the C Python source.

@mentalisttraceur
Copy link
Owner

@gytqby I came across a nice real-world example of the same equality and hashing problem: https://bidict.readthedocs.io/en/main/learning-from-bidict.html#python-surprises

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Oct 21, 2022

I finally wrote up rationale for "The docstring's claim about subclasses being equal, which was what I originally intended but which is inconsistent with the behavior I actually had in every release so far, will be treated as mistaken" in #24 .

Now here's a short rationale for "equality will be changed so that any HWAddress instance, including all subclasses, can be equal to any other, if the address size and address value are equal":

I think per something like the subclass substitution principle, a subclass instance should compare equal unless deliberately made to not compare equal. It's not exactly violating Liskov substitutability to have subclasses compare not-equal by default (after all, even if you change the answer you're still fulfilling the interface of subclasses being comparable with == and !=), but it can definitely be very unintuitive. It is certainly the case that if subclassing

This point does depend on subclassing being supported:

  1. is subclassing worth doing in ways that don't change equality behavior often enough? I think yes, partially covered above and partially in Alternatives for custom formats instead of subclassing? #18 and maybe Class vs ABC vs metaclass vs decorator vs __init_subclass__ #4 .
  2. per Hyrum's Law, or more specifically defensive coding to mitigate the bad results of Hyrum's Law, if subclasses aren't supported then they ought to be somehow prevented or at least more difficult and inconvenient than the supported alternatives - the ramification is that not supporting something needs to be worth the preventative measures, and if it isn't then it's better to accept that it's going to happen and support it... and in Python, the costs of impeding subclassing are higher than seems worth it (partially covered in Class vs ABC vs metaclass vs decorator vs __init_subclass__ #4 I think).

@mentalisttraceur
Copy link
Owner

I'm going to consider the design choices explained enough now, unless someone has further questions.

@mentalisttraceur
Copy link
Owner

Actually, two more rationale pieces from my old draft notes:

  1. Consider 0 == 0.0 == False == 0. Those all collide as set members and dictionary keys, even though they are different types of object. This isn't always the right design but it's a good example of how equality across classes can be useful. It is also a good example of how Python does things, so when thinking with Python design expectations and conventions, this feels more consistent and natural.

  2. The language gives us these two distinct things - type queries like isinstance and comparison queries like ==. When equality considers type too specifically, it answers the same questions that the type system is already designed to answer, instead of answering some other orthogonal questions that might be useful to developers. This seems like inefficient use of the available affordances of the language.

@mentalisttraceur
Copy link
Owner

I'm finding myself really drawn to a middle ground:

  • class Foo(EUI48): ... should equal (and thus hash the same) and freely cast to and from EUI48, but
  • class Qux(HWAddress): size = 48 absolutely should not be assumed to be the same as EUI48.

Unfortunately, there's no easy way in Python to implement this - not without either

  1. worse ergonomics for users who need to make custom address classes (adding an explicit class attribute, having to overload __init__, etc),
  2. traversing the inheritance hierarchy (all methods for inspecting it that I know of don't work on minimal Pythons like MicroPython, where I imagine a library like this being pretty useful),
  3. adding a metaclass or __init_subclass__ (again doesn't work on minimal Pythons like MicroPython - so __init_subclass__ is only viable for progressive enhancement like detecting nice-to-catch-in-testing usage errors - not for stuff that's necessary to make intended functionality work at all), or
  4. replacing inheritance from HWAddress with a decorator.

If I could have a working version of same class or one is subclass of the other compatibility I would do that (possibly while retaining the same size as an additional sanity-check constraint).

But here's a somewhat not-awful compromise that gets us most of the way there:

  1. keep equality (and hashing, why not?) behavior the same as it is in v1 (ordering could use an enhancement, so that classes of the same size are still grouped by class name rather than being arbitrarily mixable, but that's a separate matter),
  2. adjust copy construction to allow same class, subclass, or superclass arguments (so long as they have the same size) (this way two unrelated subclasses of addresses can't accidentally convert into each other, but intentional conversion can still happen by first casting through a known-acceptable and instantiatable superclass - so if you have a MAC subclass, you know any MAC address is acceptable, and you trust third-party subclasses to be substitutable in all relevant ways, you can do my_mac_instance = MyMAC(MAC(their_mac_instance))).

(I am also starting to take for granted that subclassing-for-formatting will not be the main way to do custom formats in the future, and also that subclasses with different formats are perhaps best thought of as not interchangeable with other arbitrary subclasses.)

I need to sleep on it, but this feels like the correct way to go now.

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Oct 28, 2022

Oh, I forgot to reply to this by @duynhaaa

I believe that the comparison should only be possible for concrete base classes and their subclasses, but not for abstract base classes if there's one.

Yeah I've always 100% agreed that this is the ideal, and conceptually I totally see HWAddress as an abstract base, which was part of the reason for #4 (maybe I said as much there) - it's just a big challenge to do that in Python without downsides that don't sit well with me.

(Although... the original ergonomics bar was "hopefully we can think of something that Just Works automatically because that's what's normally desirable", but now since I'm leaning towards keeping our initial equality behavior by default, and have convinced myself that it's actually the right behavior most of the time, the bar is lower.)

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Oct 31, 2022

Sightly annoying nit: if equality is based on type, but comparison/ordering is just based on size, then I think it's not strictly-speaking a "total order", because there can be addresses such that neither of foo < bar, foo > bar, and foo == bar are true. I still gotta check if that matters anywhere, primarily if it breaks functools.total_ordering in some edge case.

@mentalisttraceur
Copy link
Owner

The flip side is that I could tighten ordering by adding type(self) vs type(other) to the comparison, but then the edge case of subclasses with the same name would sort weirdly.

@mentalisttraceur
Copy link
Owner

mentalisttraceur commented Nov 2, 2022

Actually it would sort fine enough, I don't know what I was thinking (though I do know that I was very mentally tired at the time I wrote that, enough that simulating the cases was getting seriously hard): if I just add the id of the type as the last value of the ordering tuple, then this will restore consistency of ordering with equality.

The only sorting quirks would be when there are addresses of exactly the same length and numerical/binary value, all different classes, and one of them has a different name than the others (which should come up approximately never (and if anyone ever does have a need for better sorting of equal-but-for-type address classes, we can also add the class name to the sorting tuple after the aligned address integer and size and before class ID, and that would eliminate most problems - this is also an extremely self-serviceable problem for the approximately zero users who will ever have this edge case)).

@mentalisttraceur
Copy link
Owner

I do still think that the ideal is for hardware address classes to compare and hash equal when they are subclasses of the one common concrete base class.

In the case of HWAddress, the easiest "anchor" for abstract-vs-concrete is size - as if HWAddress had an @abc.abstractproperty-decorated size, except I am disinclined to use the abc stuff, at least in part due to metaclass downsides (I can't remember if I had other reasons).

So if I went down the road of making this real/enforced in the code, I would do size = None in HWAddress and then down each branch of the inheritance tree, the most rootward class to define size = {{something that quacks like an integer}} instead of None becomes a concrete base class.

Of course I imagine the size becoming "locked" within each concrete base class - a subclass of EUI48 would be allowed to leave size unset, or to set it to some size that still passes == EUI48.size, but it would not be allowed to set it to anything else (except perhaps None, to create yet another abstract subclass, for common behavior in further concrete subclasses, but those would still be limited to having their size == EUI48.size).

But in order to do that I'd need to implement some mutating stuff in __init_subclass__, so I'm going to save that for later - another one of those "wait and see if there's a good use-case or enough demand for it".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants