New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization drops first item of custom iterators #353

Closed
jmcarp opened this Issue Dec 4, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@jmcarp
Contributor

jmcarp commented Dec 4, 2015

from marshmallow import Schema, fields

class Range:

    def __init__(self, max):
        self.value = 0
        self.max = max

    def __iter__(self):
        return self

    def __next__(self):
        self.value += 1
        if self.value > self.max:
            raise StopIteration
        return {'value': self.value}

class RangeSchema(Schema):
    value = fields.Int()

RangeSchema(many=True).dump(Range(5)).data
# [{'value': 2}, {'value': 3}, {'value': 4}, {'value': 5}]

list(Range(5))
# [{'value': 1}, {'value': 2}, {'value': 3}, {'value': 4}, {'value': 5}]

I'm guessing this is the same problem described in #343, but this should be easier to reproduce. This could be solved by detecting that the value is an iterator and casting to a list, as for generators (which might not be straightforward), or using something like itertools.tee to get separate iterators (one for serializing, one for inspecting the first item). Or maybe this is all overkill--maybe values serialized with many should have to be eager sequences, so that it isn't marshmallow's responsibility to cast to lists or inspect iterators without advancing them.

@sloria sloria added the bug label Dec 6, 2015

@sloria

This comment has been minimized.

Member

sloria commented Dec 6, 2015

Ideally, marshmallow would handle iterators and generators transparently. Maybe we disallow implicit field creation if serializing a generator?

@jmcarp

This comment has been minimized.

Contributor

jmcarp commented Dec 6, 2015

I think generators are handled correctly already because of this:

if isinstance(obj, types.GeneratorType):
    obj = list(obj)

The problem is with non-generator iterables like the example in the issue. Some iterables should probably be cast to lists and eagerly evaluated, like the example, but others shouldn't--strings, dicts, namedtuples, etc. What if dump cast list-like iterables to lists, but only when many is true? That might be a bit more intuitive than always listifying generators.

Another option would be to peek into the iterable without advancing it, which could be done using itertools.tee.

@sloria

This comment has been minimized.

Member

sloria commented Dec 6, 2015

Both of those solutions sound reasonable.

jmcarp added a commit to jmcarp/marshmallow that referenced this issue Dec 6, 2015

Eagerly evaluate list-like iterators with `many`.
Eagerly evaluate non-string iterators on `dump` when `many` is passed.
This ensures that serializing many values doesn't advance an iterator
and drop the first item. Values are only evaluated eagerly when `many`
is set so that we don't inappropriately cast other iterables (dicts,
namedtuples, etc) to lists.

[Resolves marshmallow-code#353]

@sloria sloria closed this in #355 Dec 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment