Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OrderedSet as default set_class #1896

Merged
merged 8 commits into from
Jul 20, 2023
Merged

Use OrderedSet as default set_class #1896

merged 8 commits into from
Jul 20, 2023

Conversation

lafrech
Copy link
Member

@lafrech lafrech commented Oct 27, 2021

Use OrderedSet as default set_class.

Closes #1744.

We could do that already but I'd rather wait until we drop Python 3.6 so that the change is easier to document: all users will be affected.

No need to change ordered at this stage. In marshmallow 4 we can do the same with dict_class and let users override if they really want an OrderedDict.

This change generates mypy issues. I'd be happy to get help on this.

@lafrech
Copy link
Member Author

lafrech commented Mar 6, 2022

I can't seem to find a way to silence mypy, here. Help welcome.

@kkirsche
Copy link
Contributor

Mypy error for those curious:

src/marshmallow/schema.py:919: error: Incompatible types in assignment (expression has type "OrderedSet", variable has type "Optional[Union[Sequence[str], Set[str]]]")
src/marshmallow/schema.py:924: error: Incompatible types in assignment (expression has type "OrderedSet", variable has type "Set[Union[Any, str]]")
src/marshmallow/schema.py:972: error: Incompatible types in assignment (expression has type "AbstractSet[Any]", variable has type "OrderedSet")

The first error is because self.only is configured to be StrSequenceOrSet | set_class | None so when you assign an instance variable to it, it's no longer valid as that type, as StrSequenceOrSet is defined as StrSequenceOrSet = typing.Union[typing.Sequence[str], typing.Set[str]] which doesn't match the MutableSet being subclassed (set vs. mutable set being the key error here).

Updating StrSequenceOrSet to the following fixes the first issue:

StrSequenceOrSet = typing.Union[
    typing.Sequence[str], typing.Set[str], typing.MutableSet[str]
]

The second one is similar in nature, self.exclude is inferred as Set[Any] because of:

        self.exclude = set(self.opts.exclude) | set(exclude)

To address this, you need to allow mutable sets because that's what the OrderedSet is subclassing.

        self.exclude: typing.Set[typing.Any] | typing.MutableSet[typing.Any] = set(
            self.opts.exclude
        ) | set(exclude)

Finally, to address the last issue you need to tell Python that the field_names is ok being less specific (an AbstractSet rather than either a Mutable or immutable set:

            field_names: typing.AbstractSet[typing.Any] = self.set_class(self.only)

With that, you'll pass all tests. Though, the OrderedSet is still only partially defined as it's missing the generic parameters and whatnot

@kkirsche
Copy link
Contributor

I should note, per https://docs.python.org/3/library/typing.html typing.AbstractSet is preferred over typing.Set when possible:

To annotate arguments it is preferred to use an abstract collection type such as AbstractSet.

This is because AbstractSet is the base collection object and doesn't limit you to a specific type of set. If you need more specific behaviors though, that's when reaching for MutableSet or Set can be helpful. This aligns with how it's a best practice not to accept things like List[str] or Dict[str, str] as parameters, instead preferring the protocols like Collection[str] and Mapping[str, str] which allow for various types to satisfy their constraints

@kkirsche
Copy link
Contributor

kkirsche commented Aug 23, 2022

https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes can help understand the hierarchy of base collection types, https://docs.python.org/3/library/typing.html#typing.AbstractSet recommends collections.abc.Set as of 3.9, to connect that back to it

EDIT: connected how AbstractSet relates here

@kkirsche
Copy link
Contributor

This is a rough re-write of OrderedSet with type information. I didn't make a PR as I'm not confident enough in how map and end are being used to ensure that the types are correct. The use of lists instead of tuples makes typing a little more difficult as well, because we can't type locations in a list while we can with a tuple.

# OrderedSet
# Copyright (c) 2009 Raymond Hettinger
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation files
# (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge,
# publish, distribute, sublicense, and/or sell copies of the Software,
# and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
#     The above copyright notice and this permission notice shall be
#     included in all copies or substantial portions of the Software.
#
#     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
#     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
#     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
#     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
#     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
#     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
#     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
#     OTHER DEALINGS IN THE SOFTWARE.
from collections.abc import MutableSet
from typing import Optional, Generator, Iterable, TypeVar

_Key = TypeVar("_Key")


class OrderedSet(MutableSet[_Key]):
    def __init__(self, iterable: Optional[Iterable[_Key]] = None) -> None:
        self.end = end = []  # type: ignore
        end += [None, end, end]  # sentinel node for doubly linked list
        self.map = {}  # key --> [key, prev, next] # type: ignore
        if iterable is not None:
            self |= set(iterable)

    def __len__(self) -> int:
        return len(self.map)

    def __contains__(self, key: object) -> bool:
        return key in self.map

    def add(self, key: _Key) -> None:
        if key not in self.map:
            end = self.end
            curr = end[1]
            curr[2] = end[1] = self.map[key] = [key, curr, end]

    def discard(self, key: _Key) -> None:
        if key in self.map:
            key, prev, next = self.map.pop(key)
            prev[2] = next
            next[1] = prev

    def __iter__(self) -> Generator[_Key, None, None]:
        end = self.end
        curr = end[2]
        while curr is not end:
            yield curr[0]
            curr = curr[2]

    def __reversed__(self) -> Generator[_Key, None, None]:
        end = self.end
        curr = end[1]
        while curr is not end:
            yield curr[0]
            curr = curr[1]

    def pop(self, last: bool = True) -> _Key:
        if not self:
            raise KeyError("set is empty")
        key: _Key = self.end[1][0] if last else self.end[2][0]
        self.discard(key)
        return key

    def __repr__(self) -> str:
        if not self:
            return f"{self.__class__.__name__}()"
        return f"{self.__class__.__name__}({list(self)!r})"

    def __eq__(self, other: object) -> bool:
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        if isinstance(other, Iterable):
            return set(self) == set(other)
        return False


if __name__ == "__main__":
    s = OrderedSet("abracadaba")
    t = OrderedSet("simsalabim")
    print(s | t)
    print(s & t)
    print(s - t)

Hope this helps :)

@lafrech
Copy link
Member Author

lafrech commented Aug 23, 2022

Thank you so much for looking into this!

Regarding the OrderedSet type hints, feel free to send a PR even as draft, if only for history. Also, you might want to have a look at ordered-set lib, see how they did id: https://github.com/rspeer/ordered-set/blob/master/ordered_set/__init__.py (and perhaps this SO question: https://stackoverflow.com/questions/52569356/provide-orderedsetint-like-types-using-stub-file-without-modifying-ordered-set).

@kkirsche
Copy link
Contributor

Thanks, submitted a few PR's to her repo to address some incorrect type usage (specifically, incorrect usage of Any). If you're curious, this comment explains the difference between using Any and object as a parameter:
python/typeshed#8469 (comment)

greyli added a commit to apiflask/apiflask that referenced this pull request Dec 24, 2022
greyli added a commit to apiflask/apiflask that referenced this pull request Dec 24, 2022
@lafrech lafrech force-pushed the ordered_set_default branch 2 times, most recently from cfa54e8 to 52d7e24 Compare December 30, 2022 23:25
@lafrech lafrech marked this pull request as ready for review December 30, 2022 23:30
@lafrech
Copy link
Member Author

lafrech commented Dec 30, 2022

I think this is ready to go.

I removed the section about ordered from the quickstart because it is outdated and the option is now only required for a very specific purpose: using an OrderedDict for output, which doesn't need to be documented in a quickstart.

Not sure how and where to document ordered, now.

Does this change warrant an "upgrading to ..." section?

@lafrech
Copy link
Member Author

lafrech commented Jul 2, 2023

TODO:

  • Add test showing how to specify OderedDict as a dict class as class attribute.
  • Deprecate ordered
  • Deprecate dict_class argument of get_declared_fields

@lafrech
Copy link
Member Author

lafrech commented Jul 16, 2023

test_options.py is quite focused on the ordered option and deprecating it would require changes there. Besides, those tests would need to be kept to ensure they don't break until 4.0, and I'm not sure how to silent the deprecation warnings that would be raised.

I don't want to engage in this right now. We can deal with this later. In fact, we can even skip the deprecation phase and remove the parameter and those tests in 4.0.

@sloria I shall merge/release this soonish. Ping me if you want time to review.

@lafrech lafrech merged commit 03889c6 into dev Jul 20, 2023
8 checks passed
@lafrech lafrech deleted the ordered_set_default branch July 20, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] set_class = OrderedSet by default?
2 participants