Memory growth fast when schema is defined in a function #657

jackeyvzhang · 2017-07-12T03:25:13Z

When I use the following code, I observe that memory growth is very fast, but when I define the schema outside the function, there is no problem.

def test_schema():
    class FooSchema(Schema):
        foo = fields.String()
    # not do anything
    return "xxx"
if __name__ == '__main__':
    while True:
        print test_schema()```

The text was updated successfully, but these errors were encountered:

justanr · 2017-07-12T04:36:36Z

For two reasons:

You're defining a new class with each invocation. I'm fuzzy on the details, but I don't think classes would be referenced out due to their relationship with the module in which they are created. Even if they would be normally GC'd that is made irrelevant by the schema registry.
Internally, marshmallow keeps a record of all schemas defined so it can do resolution when you reference a schema by name rather than by class (e.g. you do 'MySchema' rather than just MySchema - note the quotation marks). Schemas are stored as a map of name to list of schema class AND fullpath to schema class.

Note that because of reference counting in CPython (I'm making an assumption on your environment here), this doesn't result in double memory for a single registry entry.

tl;dr your infinite loop causes infinite registry entries which uses infinite memory. And despite the registry, this might happen anyways due to module-class relationships.

jtillman · 2017-07-21T08:32:00Z

Hey @justanr, thanks for the reasoning and details explanation here which I appreciate. I too have hit this issue which causes high memory issues in my production environment. Now that I am aware of it I would like to know the suggested solution, in what I see as a valid use case here.

I typically use inheritance to generate classes to give standard packaging.

class ResponseObject(Schema):
    @classmethod
    def load_paged_result(cls, data):
        class ResponsePageObject(Schema):
            page_number = fields.Integer()
            total = fields.Integer()
            items = fields.List(fields.Nested(cls))
        return ResponsePageObject.loads(data)

class UserResponse(ResponseObject):
    name = fields.String()

This way I can easily standardize my response formats across my schemas.

UserResponse.load_paged_result(response.content)

Is there a suggested solution here. I was thinking about manually checking things in spots, but I fear its hard to train others users to follow suit and avoid the memory leaks.

sloria · 2017-07-21T12:34:41Z

@jtillman You can prevent marshmallow from storing your generated classes in its internal registry of schemas by generating the classes without a name.

class ResponseObject(Schema):
    @classmethod
    def load_paged_result(cls, data):
        fields = {
            'page_number': fields.Integer(),
            'total': fields.Integer(),
            'items': fields.List(fields.Nested(cls)),
        }
        attrs = dict(fields, Meta=Meta)
        # Create a nameless class to prevent storing the class in
        # marshmallow's in-memory registry
        ResponsePageObject = type(str(''), (ma.Schema,), attrs)
        return ResponsePageObject(strict=True).loads(data)

class UserResponse(ResponseObject):
    name = fields.String()

I've also proposed #660 , which would expose a more straightforward way to bypass the registry. In the meantime, the above workaround should get you most of the way.

jtillman · 2017-07-22T18:47:57Z

Thanks for the workaround. In thinking about this, it feels like a global registry should be something that people have to opt into rather than out of. Simply because its a feature that you have to dig for to know about and basic use of the library that generate schemas, can lead to huge memory issues that are a pain to pinpoint (which was the case for me).

lafrech · 2018-04-03T22:15:16Z

Let's follow the discussion in #660.

lafrech closed this as completed Apr 3, 2018

lafrech mentioned this issue Apr 3, 2018

Proposal: Add a class Meta option to prevent adding a class to the internal registry #660

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory growth fast when schema is defined in a function #657

Memory growth fast when schema is defined in a function #657

jackeyvzhang commented Jul 12, 2017

justanr commented Jul 12, 2017

jtillman commented Jul 21, 2017 •

edited

Loading

sloria commented Jul 21, 2017

jtillman commented Jul 22, 2017

lafrech commented Apr 3, 2018

Memory growth fast when schema is defined in a function #657

Memory growth fast when schema is defined in a function #657

Comments

jackeyvzhang commented Jul 12, 2017

justanr commented Jul 12, 2017

jtillman commented Jul 21, 2017 • edited Loading

sloria commented Jul 21, 2017

jtillman commented Jul 22, 2017

lafrech commented Apr 3, 2018

jtillman commented Jul 21, 2017 •

edited

Loading