Equity caching #830

mtydykov · 2015-11-11T16:39:25Z

Adding an extension to AssetFinder that caches all equities and overrides lookup_symbol to look up equities in the cache rather than in the database. This is meant to improve performance for anything that needs to lookup symbols for a large quantity of data.

I tested performance by doing a databazaar historical load of a 10k-line csv file. The load took about 2 minutes without caching, and about 3 seconds with caching.

mtydykov · 2015-11-11T16:40:30Z

@kglowinski

jfkirk · 2015-11-11T17:46:35Z

zipline/assets/assets.py

+        for equity in self.hashed_equities:
+            fuzzy_symbol = equity['fuzzy_symbol']
+            if fuzzy_symbol not in self.fuzzy_symbol_hashed_equities:
+                self.fuzzy_symbol_hashed_equities[fuzzy_symbol] = []


Could this be simplified by using a setdefault?

self.fuzz_symbol_hashed_equities.setdefault(fuzzy_symbol, []).append(asset)

jfkirk · 2015-11-11T18:38:48Z

I know nothing would make @llllllllll 's day like a whatsnew entry

jfkirk · 2015-11-11T18:40:07Z

zipline/assets/assets.py

+        asset = Equity(**data)
+        return asset
+
+    def lookup_symbol(self, symbol, as_of_date, fuzzy=False):


This method seems to have a great deal of overlap with the lookup_symbol that it is overriding. Would it be possible to factor out the actual selections from both AssetFinders, and have the main AssetFinder's lookup_symbol call those factored-out selectors, so then any changes to the lookup_symbol logic need only happen at one level?

Sure, I think that should be possible.

Ok, I've refactored the selection logic in lookup_symbol so that AssetFinder gets candidates from the database and AssetFinderCachedEquities gets candidates from its dictionaries. Let me know if this kind of refactoring is what you had in mind.

mtydykov · 2015-11-11T22:10:12Z

Done

jfkirk · 2015-11-12T15:08:33Z

zipline/assets/assets.py

+        asset = Equity(**data)
+        return asset
+
+    def get_fuzzy_candidates(self, fuzzy_symbol):


Since these methods are internal API, you can prepend them with _ as in

def _get_fuzzy_candidates(self, fuzzy_symbol):

so that it is clear, when working with an AssetFinder, that these methods are not meant to be accessed directly.

jfkirk · 2015-11-12T15:36:38Z

Once Jenkins passes, LGTM 👍

richafrank · 2015-11-12T15:51:37Z

zipline/assets/assets.py

@@ -678,3 +730,113 @@ class AssetConvertible(with_metaclass(ABCMeta)):

 class NotAssetConvertible(ValueError):
    pass
+
+
+class AssetFinderCachedEquities(AssetFinder):


Do you think we should name this with something more specific to Equities, since the behavior it adds to the base AssetFinder doesn't apply to all Assets?

@richafrank Are you thinking something like "EquityFinder"? I was trying to think of other names, but nothing really jumped out at me as this object does still support all of the non-Equity functions of the AssetFinder.

Oh you know what - I totally missed the word Equities in the name as it currently stands. I think it's a good name as is.

…and uses that cache in lookup_symbol.

Equity caching

mtydykov assigned jfkirk Nov 11, 2015

jfkirk reviewed Nov 11, 2015
View reviewed changes

jfkirk reviewed Nov 12, 2015
View reviewed changes

richafrank reviewed Nov 12, 2015
View reviewed changes

Maya Tydykov added 2 commits November 12, 2015 11:02

ENH: add extension to AssetFinder that caches all equities in memory …

d0cb5bd

…and uses that cache in lookup_symbol.

DOC: add whatsnew entry for AssetFinderCachedEquities.

df492ec

mtydykov force-pushed the equity_caching branch from 7f88bc7 to df492ec Compare November 12, 2015 16:04

mtydykov pushed a commit that referenced this pull request Nov 12, 2015

Merge pull request #830 from quantopian/equity_caching

1fe4dfe

Equity caching

mtydykov merged commit 1fe4dfe into master Nov 12, 2015

mtydykov deleted the equity_caching branch November 12, 2015 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equity caching #830

Equity caching #830

mtydykov commented Nov 11, 2015

mtydykov commented Nov 11, 2015

jfkirk Nov 11, 2015

jfkirk commented Nov 11, 2015

jfkirk Nov 11, 2015

mtydykov Nov 11, 2015

mtydykov Nov 11, 2015

mtydykov commented Nov 11, 2015

jfkirk Nov 12, 2015

jfkirk commented Nov 12, 2015

richafrank Nov 12, 2015

jfkirk Nov 12, 2015

richafrank Nov 12, 2015

Equity caching #830

Equity caching #830

Conversation

mtydykov commented Nov 11, 2015

mtydykov commented Nov 11, 2015

Choose a reason for hiding this comment

jfkirk commented Nov 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtydykov commented Nov 11, 2015

Choose a reason for hiding this comment

jfkirk commented Nov 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment