[RFC] set_class = OrderedSet by default? #1744

lafrech · 2021-02-12T00:05:03Z

I assume the performance impact would be minimal. From a quick look, it would only affect schema instantiation.

The benefit is that when using Python 3.7+, all schemas would be ordered. They still wouldn't return OrderedDict instances unless ordered Meta is passed, but they would keep fields declaration order in their output and in their schema list, so that order would be respected in API responses and apispec users would get their fields documented in declaration order.

Currently, users can achieve this in a base schema with

class OrderedBaseSchema(Schema):
    set_class = OrderedSet

My gut feeling is that this change would have no functional downside and a negligible perf impact at field init (less that we care about when adding a new feature), while using ordered=True does have an impact on serialization performance (I didn't measure it) and functionality (equality test depends on order), hence my proposal. There would still be an interest in allowing the use of OrderedDict.

If we go this route, we could use OrderedSet by default in marshmallow 3.x, and in marshmallow 4 perhaps rework the ordered feature because schemas would already be ordered so it would be specifically for users really needing an OrderedDict instance. Perhaps drop the ordered Meta option and let the user specify a dict class of their choice another way. But that's another story.

See marshmallow-code/flask-smorest#228 (comment).

The text was updated successfully, but these errors were encountered:

sloria · 2021-03-18T03:55:40Z

I'm not opposed to this, so long as there is no perf impact (I think your intuition on this is right).

lafrech · 2021-04-08T15:08:28Z

Or since this is easily achieved in user code thanks to a base schema as shown above, we postpone this to marshmallow 4 as described in my last paragraph.

Adding v4 milestone for now.

deckar01 · 2021-10-26T18:53:02Z

using ordered=True does have an impact on serialization performance

I believe that is caused by OrderedDict. None of the OrderedSet instance attributes are ever accessed during dump(), only the OrderedDict dump_fields. Even though dict is ordered now, it is still faster than OrderedDict. Waiting until 3.7+ would avoid using OrderedDict also.

OrderedDict Benchmark

#!/usr/bin/env python3.9

from datetime import datetime
from collections import OrderedDict


def run(D):
    A = [('a', None), ('b', None), ('r', None), ('c', None), ('d', None)]
    B = [('1', None), ('2', None), ('3', None)]
    t0 = datetime.now()
    for i in range(5000000):
        s = D(A)
        s['e'] = None
        del s['e']
        s.update(B)
    t = datetime.now() - t0
    return t


a = run(dict)
b = run(OrderedDict)
m = min(a, b)

print(f'dict\t\t{a}\t{100*(a/m-1):+.0f}%')
print(f'OrderedDict\t{b}\t{100*(b/m-1):+.0f}%')

dict            0:00:02.280595  +0%
OrderedDict     0:00:04.608753  +102%

OrderedSet is significantly slower than set.

OrderedSet Benchmark

#!/usr/bin/env python3.9

from datetime import datetime
from marshmallow.orderedset import OrderedSet


def run(D, i=200000):
    A = ['a', 'b', 'c', 'd']
    B = D(['1', '2', '3'])
    t0 = datetime.now()
    for _ in range(i):
        s = D(A)
        s.add('e')
        s.remove('e')
        s &= B
    return datetime.now() - t0


a = run(set)
b = run(OrderedSet)
m = min(a, b)

print(f'set\t\t{a}\t{100*(a/m-1):+.0f}%')
print(f'OrderedSet\t{b}\t{100*(b/m-1):+.0f}%')

set             0:00:00.065251  +0%
OrderedSet      0:00:01.575174  +2314%

If an application has schema instantiation in its hot loop I think the vast majority of the execution time is going to be consumed by memory allocation. +1 for dropping (or deprecating + ignoring) ordered in 4.0 and using OrderedSet (and dict). I have read that there are more performant implementations of OrderedSet out there that are wrappers around the 3.7+ ordered dict.

lafrech · 2021-10-27T07:31:09Z

I believe that is caused by OrderedDict. None of the OrderedSet instance attributes are ever accessed during dump(), only the OrderedDict dump_fields.

Yes, this is exactly what I meant. Hence my proposal. In 4.0, I'd drop ordered and just allow setting set and dict classes as class attributes. Meanwhile, we can already use OrderedSet by default.

deckar01 · 2021-10-27T15:54:45Z

@lafrech I was intrigued by #1891 (comment), so I took a stab at replacing set_class with list. The only set operations we use are unique, intersection, and difference. I implemented them as generators with a set membership check in 4-5 line utilities. Some operations may be more performant than the existing unordered set logic, because they only build 1 set instead of (up to) 3.

dev...deckar01:1744-list-attributes

lafrech · 2021-11-08T22:57:25Z

@deckar01 Nice.

On the short run (marshmallow 3.x) we'll probably be keeping set_class if only for backwards compatibility. There seems to be consensus with this so we may go on with #1896. This can wait until we drop Python 3.6.

The question of whether we keep vendoring the OrderedSet recipe or we import it from a 3rd-party package is independent (#1891).

Later on (marshmallow 4), we can either keep set_class as a class property, either hide it as an implementation detail. Then we may consider switching to an implementation such as yours. I have no strong feeling. Some would call that reinventing the wheel, some would like the simplicity. The code is small enough to be error-proofed, but arguably it is a little less straight-forward when used from Schema.

greyli · 2022-12-24T04:16:45Z

+1 for this. It will become a key issue when the user generates the local OpenAPI spec file with apispec while the spec output is not deterministic (apiflask/apiflask#373).

lafrech mentioned this issue Feb 12, 2021

Docs: inaccurate claim on schema field ordering marshmallow-code/flask-smorest#228

Closed

lafrech added this to the 4.0 milestone Apr 8, 2021

lafrech mentioned this issue Oct 26, 2021

Meta not inherited #1894

Closed

lafrech mentioned this issue Oct 27, 2021

Use OrderedSet as default set_class #1896

Merged

lafrech mentioned this issue Nov 8, 2021

unbundle ordered-set #1891

Closed

lafrech mentioned this issue Aug 8, 2022

Why is ordered=True not the default for Schemas? #2025

Closed

lafrech mentioned this issue Jul 16, 2023

Remove ordered option #2146

Open

lafrech closed this as completed in #1896 Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] set_class = OrderedSet by default? #1744

[RFC] set_class = OrderedSet by default? #1744

lafrech commented Feb 12, 2021

sloria commented Mar 18, 2021

lafrech commented Apr 8, 2021

deckar01 commented Oct 26, 2021 •

edited

lafrech commented Oct 27, 2021

deckar01 commented Oct 27, 2021 •

edited

lafrech commented Nov 8, 2021

greyli commented Dec 24, 2022 •

edited

[RFC] set_class = OrderedSet by default? #1744

[RFC] set_class = OrderedSet by default? #1744

Comments

lafrech commented Feb 12, 2021

sloria commented Mar 18, 2021

lafrech commented Apr 8, 2021

deckar01 commented Oct 26, 2021 • edited

lafrech commented Oct 27, 2021

deckar01 commented Oct 27, 2021 • edited

lafrech commented Nov 8, 2021

greyli commented Dec 24, 2022 • edited

deckar01 commented Oct 26, 2021 •

edited

deckar01 commented Oct 27, 2021 •

edited

greyli commented Dec 24, 2022 •

edited