New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a Schema containing a dict of nested Schema? #483

Closed
lafrech opened this Issue Jun 29, 2016 · 25 comments

Comments

Projects
None yet
9 participants
@lafrech
Member

lafrech commented Jun 29, 2016

Hi. I've been digging around and couldn't find the answer to this.

Say I've got a model like this:

class AlbumSchema(Schema):
    year = fields.Int()

class ArtistSchema(Schema):
    name = fields.Str()
    albums = ...

I want albums to be a dict of AlbumSchema, so that ArtistSchema serializes as

{ 'albums': { 'Hunky Dory': {'year': 1971},
              'The Man Who Sold the World': {'year': 1970}},
  'name': 'David Bowie'}

Naively, I would expect syntaxes like this to work:

fields.List(Schema)
fields.Dict(Schema)

or maybe

fields.List(fields.Nested(Schema))
fields.Dict(fields.Nested(Schema))

Serializing a list of Schema can be achieved through Nested(Schema, many=True), which I find less intuitive, and I don't know about a dict of Schema.

Is there any way to do it? Or a good reason not to do it?

(Question also asked on SO.)

@deckar01

This comment has been minimized.

Member

deckar01 commented Jun 30, 2016

I want albums to be a dict of AlbumSchema. Is there any way to do it?

Currently you must either provide explicitly named fields or use the Dict field and abandon all notion of an underlying schema.

Or a good reason not to do it?

Taking a collection of like objects, plucking out a key, and using it as the key in a dictionary destroys the order.

Since marshmallow seems to strive for expressivity, I think this use case represents a void in the marshmallow interface. If an API can index homogeneous collections using strings, marshmallow probably should too.

I would call this interface NestedDict and implement it as a thin wrapper around Nested(many=True).

class NestedDict(Nested):
    def __init__(self, nested, key, *args, **kwargs):
        super(NestedDict, self).__init__(nested, many=True, *args, **kwargs)
        self.key = key

    def _serialize(self, nested_obj, attr, obj):
        nested_list = super(NestedDict, self)._serialize(nested_obj, attr, obj)
        nested_dict = {item[self.key]: item for item in nested_list}
        return nested_dict

    def _deserialize(self, value, attr, data):
        raw_list = [item for key, item in value.items()]
        nested_list = super(NestedDict, self)._deserialize(raw_list, attr, data)
        return nested_list

The usage would look very similar to Nested except that a field name is provide to index the dictionary, and many=True is implicitly applied.

from marshmallow import fields, Schema

class AlbumSchema(Schema):
    name = fields.Str()
    year = fields.Int()

class ArtistSchema(Schema):
    name = fields.Str()
    albums = fields.NestedDict(AlbumSchema, key='name')

artist_schema = ArtistSchema()

obj, errors = artist_schema.load({
    'name': 'Artist Name',
    'albums': {
        'Album A': {'name': 'Album A', 'year': 1999},
        'Album B': {'name': 'Album B', 'year': 2005}
    }
})
print(obj)
# {'name': 'Artist Name', 'albums': [{'name': 'Album A', 'year': 1999}, {'name': 'Album B', 'year': 2005}]}

data, errors = artist_schema.dump(obj)
print(data)
# {'name': 'Artist Name', 'albums': {'Album A': {'name': 'Album A', 'year': 1999}, 'Album B': {'name': 'Album B', 'year': 2005}}}
@jta

This comment has been minimized.

jta commented Jun 30, 2016

👍 , would find NestedDict very useful.

@lafrech

This comment has been minimized.

Member

lafrech commented Jun 30, 2016

Hi @deckar01. Thank you for your feedback.

I think I have been unclear in my question.

My point is not to serialize a list as a dict but to serialize/deserialize a dict of like objects of known schema. In the original object, the data is stored as a dict already.

In other words, how would you write a Schema to serialize/deserialize such an object?

{'name': 'Artist Name', 'albums': {'Album A': {'year': 1999}, 'Album B': {'year': 2005}}}

It should be close to this, but there is a missing piece:

class AlbumSchema(Schema):
    year = fields.Int()

class ArtistSchema(Schema):
    name = fields.Str()
    albums = ...

Currently, we don't know how to serialize this. We had to modify the object to let albums be a list and the album name be a name attribute of each album. The downside of this is that we can't call artist['Album_A'], obviously. We have to either recreate a dict or search the list.

In fact, I don't really care how the object is serialized, as long as I get it deserialized properly. I just don't see any reason not to represent it like this:

{ 'albums': { 'Hunky Dory': {'year': 1971},
              'The Man Who Sold the World': {'year': 1970}},
  'name': 'David Bowie'}

I hope this is clearer now.

@deckar01

This comment has been minimized.

Member

deckar01 commented Jul 6, 2016

I suspect this use case has not occurred before, because most users are serializing records from a relational database.

It sounds like you can control the schema since you made it a list, but you still want a dictionary to aide in the lookup process. Most users would probably just dump the records in their database and query by the name if they needed to.

Can you provide more specific details about where your data is coming from, why you are accessing it through a dictionary, and what you are doing with the data when you are done?

@lafrech

This comment has been minimized.

Member

lafrech commented Jul 6, 2016

Indeed, the use case does not involve a database.

My colleague is writing an application and he wants a way to store user data. (Basically, the application runs numeric simulations, so the user data is made of simulation parameters and sets of results.)

He can do that quick and dirty using pickle, but I suggested him to serialize his objects into text files. And since I use Marshmallow on other projects (for database or API related stuff), I introduced him to Marshmallow.

The only issue was that dict. He currently made it a list as a workaround, losing the lookup feature in the process.

Besides that, it all went smooth and it helped him get the serialization part out of his business objects, so he's happy with it and he'll most probably be sticking to Marshmallow anyway. (Unless it is a wrong choice because it was not designed for such use cases ?)

I was a bit surprised to be blocking on what I thought would be a rather simple use case.

Do you think it would make sense to add such a possibility to Marshmallow?

@deckar01

This comment has been minimized.

Member

deckar01 commented Jul 6, 2016

Do you think it would make sense to add such a possibility to Marshmallow?

It seems like a reasonable feature to me.

The implementation may be more complex than the requirements make it sound though. Marshmallow is built on the assumption that homogeneous collections are lists.

For a Nested field to be able to handle a dictionary when many=True, the marshmallow core would need to use the more generic iterable interface instead of the list interface.

I am going to mull this over and make sure there is not a simpler solution I am overlooking.

@deckar01

This comment has been minimized.

Member

deckar01 commented Jul 6, 2016

fields.Dict(fields.Nested(Schema))

This did not fully sink it when I first read the issue. I like this. This would give new purpose to the otherwise unstructured Dict field.

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Jul 6, 2016

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Jul 6, 2016

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Jul 6, 2016

@deckar01

This comment has been minimized.

Member

deckar01 commented Jul 6, 2016

Proof of concept: dev...deckar01:483-structured-dict

It doesn't handle ordered dicts yet, but I thought I would get some feedback before it gets too involved.

@sloria If this looks like a viable option I can flesh out the docs and add support for ordered dicts.

@lafrech

This comment has been minimized.

Member

lafrech commented Jul 7, 2016

This looks great. Thanks @deckar01.

As a sidenote, for someone discovering Marshmallow, an equivalent syntax for List

fields.List(fields.Nested(Schema))

could be more intuitive than fields.Nested(Schema, many=True), especially if fields.Dict(fields.Nested(Schema)) is implemented.

I don't mean to break everything in Marshmallow's core. And maybe it would be hiding the underlying principles too much.

It's all about

Marshmallow is built on the assumption that homogeneous collections are lists.

Once this is understood, the fields.Nested(Schema, many=True) makes sense. But coming from another serialization lib, it can be surprising.

MongoEngine's ListField, for instance, works like this:

class Page(Document):
    tags = ListField(StringField(max_length=50))

I just checked in docs/source/tests. MongoEngine has a both a DictField that acts like Marshmallow's Dict and a MapField that enforces a given field type for its items. However, it looks like MapField(StringField()) is just a shortcut for DictField(field=StringField()). I guess there are historical reasons for this. But I believe it is clearer to just have Dict with an optional (and first positional) field type argument.

@lsenta

This comment has been minimized.

lsenta commented Sep 18, 2016

+1 for this feature, my use case is for a wizard where a user can create a bunch of "phases" for a workflow, they'll be serialized to Yaml and I wanted to use marshmallow for that.

Why not having validation for keys? I'll need to check some properties in the keys of the dict too. Something like fields.dict(key=fields.Str(), values=fields.Nested(SomeSchema)) would be helpful.

@lafrech

This comment has been minimized.

Member

lafrech commented Mar 12, 2017

I like fields.Dict(key=fields.Str(), values=fields.Nested(SomeSchema)).

We use Marshmallow in a MongoDB ODM: uMongo.

If I want to serialize this in Mongo

    {2011: 12, 2012: 15, 2013: 16, 2014: 18}

using a Dict field does not allow me to enforce a schema. I need to create a dedicated nested structure and put it in a list:

    class DatedValue(MyBaseObject):
        year = Int()
        value = Int()

    class MyObject(MyBaseObject):
        dated_values = List(DatedValue)

and then I only get a list I can't access by keys (unless I create a dict from the list each time I load the object).

It would definitely be much less cumbersome if I could just write:

    class MyObject(MyBaseObject):
        values = Dict(keys=Int(), values=Int())

Other benefits:

  • allows validation of both keys and values
  • allows serialization of keys other than strings (any hashable and serializable object can be used as a key)

Edit: Actually, using a Schema for keys in uMongo would be a bad idea for MongoDB specific reasons, so we'll stick to string keys there, but adding schemas to keys could make sense in Marshmallow anyway.

@sloria sloria added the needs review label May 9, 2017

@sloria sloria added this to the 3.0 milestone May 9, 2017

@sloria

This comment has been minimized.

Member

sloria commented May 9, 2017

This is something I'd like to review for 3.0. I think there are valid use cases for validating keys, and I think @lsenta 's and @lafrech 's proposed API is reasonable.

@sloria sloria added the help wanted label May 9, 2017

@sloria

This comment has been minimized.

Member

sloria commented May 13, 2017

@lafrech Would you like to send a PR implementing your proposed API?

@lafrech

This comment has been minimized.

Member

lafrech commented May 15, 2017

@sloria I'm afraid I won't be able to do this any soon, but if I get the time, I'll be happy to give it a go.

Note that it wouldn't be a breaking change, so it could be added in a later 3.x.

Did you get the chance to look at @deckar01's proposal?

@sloria

This comment has been minimized.

Member

sloria commented May 28, 2017

No problem, @lafrech . @deckar01 's proposal is on the right track; I think it would also be nice to have validation for keys, as suggested in #483 (comment)

@lnunno

This comment has been minimized.

lnunno commented Jun 12, 2017

Is there any way to get around this limitation right now? I need this functionality replicated. Can this be achieved with a pre_dump? It appears to load ok somehow, but doesn't know how to dump to the correct schema.

@aldanor

This comment has been minimized.

aldanor commented Sep 26, 2017

This is a bugger (and a surprise it doesn't work out of the box). Anyone care to revive @deckar01's suggestion?

@sloria

This comment has been minimized.

Member

sloria commented Sep 27, 2017

@deckar01 Would you be up for sending a PR with your proposal?

@christian-storm

This comment has been minimized.

christian-storm commented Dec 6, 2017

+1 for suggestion....Another use case is (de)serializing to protocol buffer's map field that accept string, bool, or int as key and any type as value (enum, message (~ python class), scalar types).
I'm trying to decide between Cerebrus and Marshmallow. Cerebrus has one critical feature of allowing one to set keyschema and valueschema. I'm more keen on Marshmallow's ecosystem and prefer its class based schemas. A bit dead in the water without this but trying to find a workaround. It strikes me as odd that a foundational data structure, mappings, isn't a consideration. Anybody found a way to support this?

@deckar01

This comment has been minimized.

Member

deckar01 commented Dec 6, 2017

I will rebase my branch and work on a PR.

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 6, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 6, 2017

@ArthurPBressan

This comment has been minimized.

ArthurPBressan commented Dec 6, 2017

I needed something similar to what @christian-storm needs, and I ended up taking parts (or maybe all, can't remember) of what @deckar01 had worked on a few months ago, and hacked on it until it did what i needed it to do: https://gist.github.com/ArthurPBressan/4f6dc8b7826e352884f0561ac79d6898

Maybe it's useful for someone as a starting point, since I removed some functionality that I didn't really need, and didn't implement tests.

@deckar01

This comment has been minimized.

Member

deckar01 commented Dec 6, 2017

@sloria How should deserialization errors be communicated for invalid keys? Maybe prefix the error message to indicate that the message is for the key? Invalid key: {}?

@deckar01

This comment has been minimized.

Member

deckar01 commented Dec 6, 2017

@sloria How should I handle the error message when a key and it's value have errors? Concatenate the error message lists together?

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 7, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 7, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 7, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 7, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 8, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 10, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 10, 2017

deckar01 pushed a commit to deckar01/marshmallow that referenced this issue Dec 12, 2017

@sloria sloria closed this in #700 Dec 30, 2017

@sloria

This comment has been minimized.

Member

sloria commented Dec 30, 2017

This is released in version 3.0.0b5. Thanks everyone for your feedback!

@lafrech

This comment has been minimized.

Member

lafrech commented May 15, 2018

This is unrelated to this issue. Please open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment