Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equity caching #830

Merged
merged 2 commits into from
Nov 12, 2015
Merged

Equity caching #830

merged 2 commits into from
Nov 12, 2015

Conversation

mtydykov
Copy link
Contributor

Adding an extension to AssetFinder that caches all equities and overrides lookup_symbol to look up equities in the cache rather than in the database. This is meant to improve performance for anything that needs to lookup symbols for a large quantity of data.

I tested performance by doing a databazaar historical load of a 10k-line csv file. The load took about 2 minutes without caching, and about 3 seconds with caching.

@mtydykov
Copy link
Contributor Author

@kglowinski

for equity in self.hashed_equities:
fuzzy_symbol = equity['fuzzy_symbol']
if fuzzy_symbol not in self.fuzzy_symbol_hashed_equities:
self.fuzzy_symbol_hashed_equities[fuzzy_symbol] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified by using a setdefault?

self.fuzz_symbol_hashed_equities.setdefault(fuzzy_symbol, []).append(asset)

@jfkirk
Copy link
Contributor

jfkirk commented Nov 11, 2015

I know nothing would make @llllllllll 's day like a whatsnew entry

asset = Equity(**data)
return asset

def lookup_symbol(self, symbol, as_of_date, fuzzy=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems to have a great deal of overlap with the lookup_symbol that it is overriding. Would it be possible to factor out the actual selections from both AssetFinders, and have the main AssetFinder's lookup_symbol call those factored-out selectors, so then any changes to the lookup_symbol logic need only happen at one level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think that should be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've refactored the selection logic in lookup_symbol so that AssetFinder gets candidates from the database and AssetFinderCachedEquities gets candidates from its dictionaries. Let me know if this kind of refactoring is what you had in mind.

@mtydykov
Copy link
Contributor Author

Done

asset = Equity(**data)
return asset

def get_fuzzy_candidates(self, fuzzy_symbol):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these methods are internal API, you can prepend them with _ as in

def _get_fuzzy_candidates(self, fuzzy_symbol):

so that it is clear, when working with an AssetFinder, that these methods are not meant to be accessed directly.

@jfkirk
Copy link
Contributor

jfkirk commented Nov 12, 2015

Once Jenkins passes, LGTM 👍

@@ -678,3 +730,113 @@ class AssetConvertible(with_metaclass(ABCMeta)):

class NotAssetConvertible(ValueError):
pass


class AssetFinderCachedEquities(AssetFinder):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should name this with something more specific to Equities, since the behavior it adds to the base AssetFinder doesn't apply to all Assets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richafrank Are you thinking something like "EquityFinder"? I was trying to think of other names, but nothing really jumped out at me as this object does still support all of the non-Equity functions of the AssetFinder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you know what - I totally missed the word Equities in the name as it currently stands. I think it's a good name as is.

mtydykov pushed a commit that referenced this pull request Nov 12, 2015
@mtydykov mtydykov merged commit 1fe4dfe into master Nov 12, 2015
@mtydykov mtydykov deleted the equity_caching branch November 12, 2015 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants