Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dictionary based webargs usage 'leaks' memory #101

Closed
frankslaughter opened this issue Apr 8, 2016 · 7 comments

Comments

@frankslaughter
Copy link

commented Apr 8, 2016

argmap2schema in core.py creates an ArgSchema each time a tornado request handler get/post/etc. method is called. Each ArgSchema is registered with the marshmallow class registry, so it is not dereferenced when the request handler completes.

I don't have a suggestion for how to deal with this. The workaround I chose was to switch all of my webargs usage to either the use_args decorator, or to create the arguments Schema explicitly.

Below is some demonstration code:

tornado server

import pdb
import resource
import tornado.ioloop
import tornado.web

from webargs import fields
from webargs.tornadoparser import parser, use_args

from marshmallow import Schema


class ArgsSchema(Schema):
    name = fields.String()


args_schema = ArgsSchema(strict=True)
args_dict = dict(name=fields.String())


class DictHandler(tornado.web.RequestHandler):
    def get(self):
        args = parser.parse(args_dict, self.request)
        response = dict(message="Hello %s" % args.get('name'))
        self.write(response)


class SchemaHandler(tornado.web.RequestHandler):
    def get(self):
        args = parser.parse(args_schema, self.request)
        response = dict(message="Hello %s" % args.get('name'))
        self.write(response)


class UseArgsHandler(tornado.web.RequestHandler):
    @use_args(args_dict)
    def get(self, args):
        response = dict(message="Hello %s" % args.get('name'))
        self.write(response)


class MemUseHandler(tornado.web.RequestHandler):
    def get(self):
        response = dict(
            memory_used=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        )
        self.write(response)


class DebugHandler(tornado.web.RequestHandler):
    def get(self):
        pdb.set_trace()


def make_app():
    return tornado.web.Application(
        [
            (r"/dict", DictHandler),
            (r"/schema", SchemaHandler),
            (r"/useargs", UseArgsHandler),
            (r"/memuse", MemUseHandler),
            (r"/debug", DebugHandler),
        ]
    )


if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

exerciser

import requests
import sys

MEMUSE_URL = "http://localhost:8888/memuse"
USE_ARGS_URL = "http://localhost:8888/useargs"
SCHEMA_URL = "http://localhost:8888/schema"
DICT_URL = "http://localhost:8888/dict"


def main():
    count = int(sys.argv[1])

    print "initial memory used: %s" % memuse()

    for i in range(count):
        requests.get(USE_ARGS_URL, params=dict(name="Ed Meese"))

    print "memory used after %s '/use_args' calls: %s" % (count, memuse())

    for i in range(count):
        requests.get(SCHEMA_URL, params=dict(name="Ross Perot"))

    print "memory used after %s '/schema' calls: %s" % (count, memuse())

    for i in range(count):
        requests.get(DICT_URL, params=dict(name="Ralph Nader"))

    print "memory used after %s '/dict' calls: %s" % (count, memuse())


def memuse():
    r = requests.get(MEMUSE_URL)
    return r.json()['memory_used']


if __name__ == "__main__":
    main()

exerciser output

(quick_tests) vagrant@frank-dev-vm:/tiptap/experiments/quick_tests/webargs_memory_leak$ python test.py 10000
initial memory used: 15672
memory used after 10000 '/use_args' calls: 16588
memory used after 10000 '/schema' calls: 16588
memory used after 10000 '/dict' calls: 100116

@sloria sloria added the performance label Apr 8, 2016

@sloria

This comment has been minimized.

Copy link
Member

commented Apr 8, 2016

Thanks for the thorough report. This is somewhat of a known issue. One solution is to prevent ArgSchema from getting added to marshmallow's class registry, though I haven't given much thought to implementation yet.

I would certainly review and merge a PR for this.

@sloria sloria changed the title dictionary based webargs usage with with tornadoparser 'leaks' memory dictionary based webargs usage 'leaks' memory Apr 9, 2016

@sloria

This comment has been minimized.

Copy link
Member

commented Apr 9, 2016

Also, this isn't specific to the Tornado parser. I hope you don't mind that I've updated the issue title accordingly.

@frankslaughter

This comment has been minimized.

Copy link
Author

commented Apr 9, 2016

I was pretty sure that was the case, but since I only tested it with Tornado I didn't want to overstate my claim. Thanks for fixing the title.

sloria added a commit to marshmallow-code/marshmallow that referenced this issue Apr 9, 2016

Only add Schema classes to registry if name is provided
Allows for Schema classes to be created dynamically without getting
added to the class registry, which saves memory usage in turn.

The impetus for this change comes from marshmallow-code/webargs#101
@sloria

This comment has been minimized.

Copy link
Member

commented Apr 9, 2016

I think I have a fix for this. It requires a minor change in marshmallow. I'll release the fix with marshmallow 2.7.1 and webargs 1.3.1.

@frankslaughter

This comment has been minimized.

Copy link
Author

commented Apr 9, 2016

That's a slick fix. I like it.

@sloria sloria closed this in af9210e Apr 9, 2016

@sloria

This comment has been minimized.

Copy link
Member

commented Apr 9, 2016

The fix is now on dev. Unfortunately, there seems to a be a problem with uploading tarball distributions to PyPI at the moment, so I'm going to hold off on releasing 1.3.1 for today. Hopefully that gets resolved soon so we can release this hotfix.

@AlekseyEf

This comment has been minimized.

Copy link

commented Apr 12, 2016

Incorrect test above.

class MemUseHandler(tornado.web.RequestHandler):
    def get(self):
        response = dict(
            memory_used=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        )
        self.write(response)

"ru_maxrss maximum resident set size" (https://docs.python.org/2/library/resource.html#resource.getrusage)

import psutil

class MemUseHandler(tornado.web.RequestHandler):
    def get(self):
        process = psutil.Process()
        response = dict(
            memory_used=process.memory_info()[0]
        )
        self.write(response)

https://pythonhosted.org/psutil/#psutil.Process.memory_info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.