New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Remove or defer calls to get_loc on large indices. #1504

Merged
merged 1 commit into from Sep 21, 2016

Conversation

Projects
None yet
4 participants
@ssanderson
Member

ssanderson commented Sep 21, 2016

Mitigation for #1503.

This was debugged primarily by adding the following lines to zipline/tests/__init__.py:

+import os
+from unittest import TestCase
+
+import psutil
+import humanize
+import __builtin__
+
+pid = os.getpid()
+proc = psutil.Process(pid)
+
+def get_mem_usage():
+    return humanize.naturalsize(proc.memory_full_info().uss)
+__builtin__.gmem = get_mem_usage
+
+
+real_doCleanups = TestCase.doCleanups
+
+OUTPUT_FILE = open('/home/ssanderson/pandas18_memory_usage.txt', 'w')
+
+
+def patched_doCleanups(self, *args, **kwargs):
+    OUTPUT_FILE.write("{} {}\n".format(get_mem_usage(), self))
+    OUTPUT_FILE.flush()
+    return real_doCleanups(self, *args, **kwargs)
+
+TestCase.doCleanups = patched_doCleanups
@coveralls

This comment has been minimized.

coveralls commented Sep 21, 2016

Coverage Status

Coverage increased (+0.02%) to 86.606% when pulling e86fffc on pandas18-memory-stopgaps into 7441369 on master.

@@ -703,10 +697,17 @@ def _get_history_daily_window_data(
return daily_data
def _handle_history_out_of_bounds(self, bar_count):
def _handle_minute_history_out_of_bounds(self, bar_count):
first_trading_minute_loc = (

This comment has been minimized.

@jbredeche

jbredeche Sep 21, 2016

Member

Can we save this calculated value somewhere so that we don't have to do it again later in the simulation?

This comment has been minimized.

@jbredeche

jbredeche Sep 21, 2016

Member

maybe move this into a @lazyval?

This comment has been minimized.

@ssanderson

ssanderson Sep 21, 2016

Member

I thought about caching this like we were doing before, but we're only calculating this offset so that we can give a useful error message in an out of bounds exception. We don't expect this to be needed again. The only case where this would help performance would be someone who's making invalid history calls in a loop. Do you think it's still worth having the lazyval?

This comment has been minimized.

@jbredeche

jbredeche Sep 21, 2016

Member

I'm not certain enough to say we should do it now. LGTM. Can always tweak it later.

@ssanderson ssanderson merged commit 50457e2 into master Sep 21, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ssanderson ssanderson deleted the pandas18-memory-stopgaps branch Sep 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment