-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equity caching #830
Equity caching #830
Conversation
for equity in self.hashed_equities: | ||
fuzzy_symbol = equity['fuzzy_symbol'] | ||
if fuzzy_symbol not in self.fuzzy_symbol_hashed_equities: | ||
self.fuzzy_symbol_hashed_equities[fuzzy_symbol] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified by using a setdefault?
self.fuzz_symbol_hashed_equities.setdefault(fuzzy_symbol, []).append(asset)
I know nothing would make @llllllllll 's day like a whatsnew entry |
asset = Equity(**data) | ||
return asset | ||
|
||
def lookup_symbol(self, symbol, as_of_date, fuzzy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to have a great deal of overlap with the lookup_symbol that it is overriding. Would it be possible to factor out the actual selections from both AssetFinders, and have the main AssetFinder's lookup_symbol call those factored-out selectors, so then any changes to the lookup_symbol logic need only happen at one level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think that should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've refactored the selection logic in lookup_symbol so that AssetFinder gets candidates from the database and AssetFinderCachedEquities gets candidates from its dictionaries. Let me know if this kind of refactoring is what you had in mind.
Done |
asset = Equity(**data) | ||
return asset | ||
|
||
def get_fuzzy_candidates(self, fuzzy_symbol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these methods are internal API, you can prepend them with _ as in
def _get_fuzzy_candidates(self, fuzzy_symbol):
so that it is clear, when working with an AssetFinder, that these methods are not meant to be accessed directly.
Once Jenkins passes, LGTM 👍 |
@@ -678,3 +730,113 @@ class AssetConvertible(with_metaclass(ABCMeta)): | |||
|
|||
class NotAssetConvertible(ValueError): | |||
pass | |||
|
|||
|
|||
class AssetFinderCachedEquities(AssetFinder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should name this with something more specific to Equities, since the behavior it adds to the base AssetFinder doesn't apply to all Assets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richafrank Are you thinking something like "EquityFinder"? I was trying to think of other names, but nothing really jumped out at me as this object does still support all of the non-Equity functions of the AssetFinder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you know what - I totally missed the word Equities in the name as it currently stands. I think it's a good name as is.
…and uses that cache in lookup_symbol.
7f88bc7
to
df492ec
Compare
Adding an extension to AssetFinder that caches all equities and overrides lookup_symbol to look up equities in the cache rather than in the database. This is meant to improve performance for anything that needs to lookup symbols for a large quantity of data.
I tested performance by doing a databazaar historical load of a 10k-line csv file. The load took about 2 minutes without caching, and about 3 seconds with caching.