Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cache meachanism to speed up module search #175

Merged
merged 7 commits into from Mar 6, 2018

Conversation

hltbra
Copy link
Collaborator

@hltbra hltbra commented Mar 6, 2017

This should fix the performance issues reported in issue #69.

I'd like to hear from other people that experienced the slowdowns to see if this pull request will fix their issues.

cc @Deimos

@coveralls
Copy link

coveralls commented Mar 6, 2017

Coverage Status

Coverage increased (+0.6%) to 96.456% when pulling b23129e on feature/speed-up-module-search into 88f6990 on master.

@Deimos
Copy link

Deimos commented Mar 7, 2017

Works without any modifications necessary now, results seem about the same as I mentioned in there with the previous version - about 0.25s duration on the first test, and then it's difficult to tell if there's any change to other ones. Seems to just add about 0.25s to the run time in total, from a few unscientific runs.

freezegun/api.py Outdated
if _get_module_attributes_hash(module) == module_hash:
return cached_attrs
else:
return _get_module_attributes(module)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to be setting these back on global_modules_cache to update the cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

freezegun/api.py Outdated
result = {}
for mod_name, module in list(sys.modules.items()):
# ignore this module
if mod_name == __name__:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spulec: in the _freeze_time.start method I saw a similar logic (if mod_name is None or module is None) but I didn't understand the None checks and didn't put them here. When are they None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answering myself: module is None when it's a freezegun module.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 96.474% when pulling 199c655 on feature/speed-up-module-search into 88f6990 on master.

1 similar comment
@coveralls
Copy link

coveralls commented Mar 16, 2017

Coverage Status

Coverage increased (+0.6%) to 96.474% when pulling 199c655 on feature/speed-up-module-search into 88f6990 on master.

@hltbra hltbra force-pushed the feature/speed-up-module-search branch from 199c655 to 8d5193e Compare March 16, 2017 20:00
@coveralls
Copy link

coveralls commented Mar 16, 2017

Coverage Status

Coverage increased (+0.6%) to 96.474% when pulling 8d5193e on feature/speed-up-module-search into 88f6990 on master.

@hltbra
Copy link
Collaborator Author

hltbra commented Mar 20, 2017

I will test this against moto later. I think there's a bug after running this against a work (private) project.

@nolanlum
Copy link

nolanlum commented May 9, 2017

Any update on merging this PR? I'm also interested in the speedups this provides, but I'm running into a strange bug with the module cache:

module = <AliasModule 'py.error' for 'py._error.error'>

    def _setup_module_cache(module):
        global _GLOBAL_MODULES_CACHE
        date_attrs = []
        all_module_attributes = _get_module_attributes(module)
        for attribute_name, attribute_value in all_module_attributes:
            if id(attribute_value) in _real_time_object_ids:
                date_attrs.append((attribute_name, attribute_value))
>       _GLOBAL_MODULES_CACHE[module.__name__] = (_get_module_attributes_hash(module), date_attrs)
E       AttributeError: __name__

/usr/local/lib/python2.7/site-packages/freezegun/api.py:85: AttributeError

for attribute_name, attribute_value in all_module_attributes:
if id(attribute_value) in _real_time_object_ids:
date_attrs.append((attribute_name, attribute_value))
_GLOBAL_MODULES_CACHE[module.__name__] = (_get_module_attributes_hash(module), date_attrs)
Copy link
Contributor

@azmeuk azmeuk Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered this bug:

local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:557: in wrapper
    with self:
local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:434: in __enter__
    return self.start()
local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:490: in start
    module_attrs = _get_cached_module_attributes(mod_name, module)
local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:96: in _get_cached_module_attributes
    global_modules_cache = _get_global_modules_cache()
local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:53: in _get_global_modules_cache
    _setup_modules_cache()
local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:75: in _setup_modules_cache
    _setup_module_cache(module)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

module = <AliasModule 'py.error' for 'py._error.error'>

    def _setup_module_cache(module):
        global _GLOBAL_MODULES_CACHE
        #if not hasattr(module, "__name__"):
        #    return
    
        date_attrs = []
        all_module_attributes = _get_module_attributes(module)
        for attribute_name, attribute_value in all_module_attributes:
            if id(attribute_value) in _real_time_object_ids:
                date_attrs.append((attribute_name, attribute_value))
>       _GLOBAL_MODULES_CACHE[module.__name__] = (_get_module_attributes_hash(module), date_attrs)
E       AttributeError: __name__

all_module_attributes = []
date_attrs = []
module     = <AliasModule 'py.error' for 'py._error.error'>

local.virtualenv/lib/python2.7/site-packages/freezegun/api.py:88: AttributeError

It seems ok with the following code:

def _setup_modules_cache():
    for mod_name, module in list(sys.modules.items()):
        if mod_name == __name__ or not mod_name or not module or not hasattr(module, "__name__"):
            continue

    ...

@azmeuk
Copy link
Contributor

azmeuk commented Aug 29, 2017

@hltbra @spulec I am very interested in improving the performances of freezegun. Is there something I can do to help with this PR?

nolanlum pushed a commit to lyft/freezegun that referenced this pull request Mar 1, 2018
@spulec spulec merged commit 652bb7d into master Mar 6, 2018
@spulec spulec deleted the feature/speed-up-module-search branch March 6, 2018 01:17
@spulec
Copy link
Owner

spulec commented Mar 6, 2018

Thanks!

This was referenced Mar 6, 2018


def _get_module_attributes_hash(module):
return '{0}-{1}'.format(id(module), hashlib.md5(','.join(dir(module)).encode('utf-8')).hexdigest())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 63, you catch a TypeError when you call dir(module), here you don't. If a TypeError is raised there, that doesn't seem to prevent this function from being called. Is that an issue?
Why don't you pass the already existing all_module_attributes to this function and extract the names from that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed with 689a7bb.

freezegun/api.py Outdated
@@ -66,7 +66,7 @@ def _setup_modules_cache():
# FIXME: move this definition to be at the top-level
real_time_object_ids = set(id(obj) for obj in real_date_objects)
result = {}
for mod_name, module in sys.modules.items():
for mod_name, module in list(sys.modules.items()):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unnecessary - if sys.modules is dict , then sys.modules.items() is iterable containing pairs of key and value in both py2 and py3
casting it to list will not break it, but slow down a bit

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without creating the copy, I'm occasionally getting a RuntimeError: dictionary changed size during iteration, which makes sense since additional modules can get loaded.

Ideas?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that the size of that collection is small enough so that casting to list once does not cause a relevant slowdown.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so commit message is misleading

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the PR title is about an architectural optimization that saves seconds of run time, not microoptimizations that make a difference of a few microseconds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this particular commit, not entire PullRequest
Reading Make sys.modules.items() compatible with python 3 I expect some py3 adjustment, but I get runtime error prevention.
That was my point, sir :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is not misleading. In python 2, items() returns a list (copy), which is what is needed here. But in python 3 it returns an iterator, which causes faulty behavior. So it is casted to list for python 3 support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pzelnip
Copy link

pzelnip commented Mar 7, 2018

With 0.3.10 over 0.3.9 I'm seeing about a 16% speedup. Not bad.

@mariuszthecoder
Copy link

How do you measure speed improvement?

@pzelnip
Copy link

pzelnip commented Mar 9, 2018

I ran the tests and looked at the time that nose said it took to run them. Did that 4 times and took the average. Then uninstalled 0.3.9 & installed 0.3.10 and did the same. The difference between the two averages was about 16%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants