New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve price performance for 1000+ asset universes. #2108

Merged
merged 5 commits into from Feb 14, 2018

Conversation

Projects
None yet
3 participants
@ehebert
Member

ehebert commented Feb 13, 2018

Improve performance for algorithms which consider and/or hold positions for 1000+ assets, especially liquid assets.

PERF: Optimize price for liquid assets.

When using price, the call to last_traded_dt for every value retrieved
became a noticeable bottleneck in algorithms which used over 1000 assets.

Instead of calling last_traded_dt before every retrieval of a close for the
price field, assume that close will retrieve a non-empty value, and then
forward fill if it is empty.

This change optimizes for the case of a tradeable universe which is predominately
composed of liquid assets.

EDIT: (The PR originally had a change to which column is used for the minute bar reader's last_traded_dt, but it was removed; to be implemented in a separate PR.)

ehebert added some commits Feb 13, 2018

PERF: Use one field when getting the last traded dt.
Most often, when requesting the `last_traded_dt`, it is for retrieving the
forward filled price, which is based on the `close` column.

Instead of using `volume` to get the `last_traded_dt`, use the `close`. This
should particularly help workloads which are retrieving the `price` for many
assets, by reducing the number of I/O operations and making it less likely that
the desired `close` blocks stay in the page cache, since the `volume` and
`close` blocks would not be competing for space.
PERF: Optimize price for liquid assets.
When using `price`, the call to `last_traded_dt` for every value retrieved
became a noticeable bottleneck in algorithms which used over 1000 assets.

Instead of calling `last_traded_dt` before every retrieval of a `close` for the
`price` field, assume that `close` will retrieve a non-empty value, and then
forward fill if it is empty.

This change optimizes for the case of a tradeable universe which is predominately
composed of liquid assets.

@ehebert ehebert requested a review from ssanderson Feb 13, 2018

@ssanderson

@ehebert took a pass here. I think "use price instead of volume" change is trickier than expected, so I'd probably recommend splitting it out into a separate PR. For the "don't seek if we don't have to" change, it looks correct to me, but I think we could simplify the logic here quite a bit.

return np.nan
else:
return 0
else:

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

I don't think this needs to be nested in an else: block. Every path out of the not ffill block returns a value.

This comment has been minimized.

@ehebert

ehebert Feb 13, 2018

Member

Good point. I'll remove the nesting.

if column == 'volume':
return 0
else:
if column != 'volume':

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

This logic is duplicated here and in the not ffill branch. Could we convert this into a function call or dict lookup?

This comment has been minimized.

@@ -1140,7 +1140,7 @@ def get_last_traded_dt(self, asset, dt):
return self._pos_to_minute(minute_pos)

def _find_last_traded_position(self, asset, dt):
volumes = self._open_minute_file('volume', asset)
volumes = self._open_minute_file('close', asset)

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

Seems like volumes is no longer the right name for this variable...

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

Ditto self._known_zero_volume_dict.

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

In general, it seems like there's some other cleanup we need to do for this change. I'd say we should probably break this up into a separate change, esp. since it seems from preliminary testing like avoiding unnecessary backwards-seeks gets us most of the perf win?

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

For example, I think this is also used for last_traded, where it seems plausible that we should actually be using nonzero volume to define this.

# Optimize the best case scenario of a liquid asset
# returning a valid price.
result = reader.get_value(asset.sid, dt, column)
if column != 'volume':

This comment has been minimized.

@ssanderson

ssanderson Feb 13, 2018

Member

Since the body of all of these checks is the same, could we coalesce all the conditionals here into something like:

    is_valid = (
        (column != 'volume' and not pd.isnull(result)) or
        (result == 0)
    )
    if is_valid:
        return result

Better yet, since we don't support ffill for non-price fields, could this just be:

assert column == 'price'  # might be 'close'?
if not pd.isnull(result):
    return result

This comment has been minimized.

@ehebert

ehebert Feb 13, 2018

Member

You are right. We only ever ffill the close, turning it into price. Removing the checks for volume once the function reaches the ffill is True state.

@ehebert

This comment has been minimized.

Member

ehebert commented Feb 13, 2018

@ehebert took a pass here. I think "use price instead of volume" change is trickier than expected, so I'd probably recommend splitting it out into a separate PR. For the "don't seek if we don't have to" change, it looks correct to me, but I think we could simplify the logic here quite a bit.

I agree, I'll remove that commit from this PR.

@coveralls

This comment has been minimized.

coveralls commented Feb 13, 2018

Coverage Status

Coverage decreased (-0.003%) to 87.567% when pulling 484f8ac on improve-price-performance into 4002a03 on master.

@ssanderson

This comment has been minimized.

Member

ssanderson commented Feb 14, 2018

LGTM

@ehebert ehebert merged commit ba60392 into master Feb 14, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ehebert ehebert deleted the improve-price-performance branch Feb 14, 2018

jdricklefs added a commit that referenced this pull request Feb 14, 2018

Improve price performance for 1000+ asset universes. (#2108)
PERF: Optimize price for liquid assets.

When using `price`, the call to `last_traded_dt` for every value retrieved
became a noticeable bottleneck in algorithms which used over 1000 assets.

Instead of calling `last_traded_dt` before every retrieval of a `close` for the
`price` field, assume that `close` will retrieve a non-empty value, and then
forward fill if it is empty.

This change optimizes for the case of a tradeable universe which is predominately
composed of liquid assets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment