Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory growth fast when schema is defined in a function #657

Closed
jackeyvzhang opened this issue Jul 12, 2017 · 5 comments
Closed

Memory growth fast when schema is defined in a function #657

jackeyvzhang opened this issue Jul 12, 2017 · 5 comments

Comments

@jackeyvzhang
Copy link

When I use the following code, I observe that memory growth is very fast, but when I define the schema outside the function, there is no problem.

def test_schema():
    class FooSchema(Schema):
        foo = fields.String()
    # not do anything
    return "xxx"
if __name__ == '__main__':
    while True:
        print test_schema()```
@justanr
Copy link
Contributor

justanr commented Jul 12, 2017

For two reasons:

  • You're defining a new class with each invocation. I'm fuzzy on the details, but I don't think classes would be referenced out due to their relationship with the module in which they are created. Even if they would be normally GC'd that is made irrelevant by the schema registry.

  • Internally, marshmallow keeps a record of all schemas defined so it can do resolution when you reference a schema by name rather than by class (e.g. you do 'MySchema' rather than just MySchema - note the quotation marks). Schemas are stored as a map of name to list of schema class AND fullpath to schema class.

Note that because of reference counting in CPython (I'm making an assumption on your environment here), this doesn't result in double memory for a single registry entry.

tl;dr your infinite loop causes infinite registry entries which uses infinite memory. And despite the registry, this might happen anyways due to module-class relationships.

@jtillman
Copy link

jtillman commented Jul 21, 2017

Hey @justanr, thanks for the reasoning and details explanation here which I appreciate. I too have hit this issue which causes high memory issues in my production environment. Now that I am aware of it I would like to know the suggested solution, in what I see as a valid use case here.

I typically use inheritance to generate classes to give standard packaging.

class ResponseObject(Schema):
    @classmethod
    def load_paged_result(cls, data):
        class ResponsePageObject(Schema):
            page_number = fields.Integer()
            total = fields.Integer()
            items = fields.List(fields.Nested(cls))
        return ResponsePageObject.loads(data)

class UserResponse(ResponseObject):
    name = fields.String()

This way I can easily standardize my response formats across my schemas.

UserResponse.load_paged_result(response.content)

Is there a suggested solution here. I was thinking about manually checking things in spots, but I fear its hard to train others users to follow suit and avoid the memory leaks.

@sloria
Copy link
Member

sloria commented Jul 21, 2017

@jtillman You can prevent marshmallow from storing your generated classes in its internal registry of schemas by generating the classes without a name.

class ResponseObject(Schema):
    @classmethod
    def load_paged_result(cls, data):
        fields = {
            'page_number': fields.Integer(),
            'total': fields.Integer(),
            'items': fields.List(fields.Nested(cls)),
        }
        attrs = dict(fields, Meta=Meta)
        # Create a nameless class to prevent storing the class in
        # marshmallow's in-memory registry
        ResponsePageObject = type(str(''), (ma.Schema,), attrs)
        return ResponsePageObject(strict=True).loads(data)

class UserResponse(ResponseObject):
    name = fields.String()

I've also proposed #660 , which would expose a more straightforward way to bypass the registry. In the meantime, the above workaround should get you most of the way.

@jtillman
Copy link

Thanks for the workaround. In thinking about this, it feels like a global registry should be something that people have to opt into rather than out of. Simply because its a feature that you have to dig for to know about and basic use of the library that generate schemas, can lead to huge memory issues that are a pain to pinpoint (which was the case for me).

@lafrech
Copy link
Member

lafrech commented Apr 3, 2018

Let's follow the discussion in #660.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants