New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle_market_close fires before daily trades #241

Closed
bencpeters opened this Issue Nov 14, 2013 · 8 comments

Comments

Projects
None yet
4 participants
@bencpeters

bencpeters commented Nov 14, 2013

I've been experimenting with zipline to backtest a strategy that uses a custom DataSource that gives me a feed of 2-4 intraday timestamps for algorithm evaluation and possible portfolio re-balancing. Using daily emission and minute data_frequency, and relying on the default benchmark data, the handle_market_close events fire before any of that day's calls to handle_data and subsequent order execution, etc. This happens regardless of whether instant_fill is set True or False.

The reason this is happening is that the default benchmark timeseries uses zipline's "daily"-style timestamps (midnight of the current day UTC), which is before US market intraday stamped events. When the event stream gets sorted, the benchmark events thus get processed before that day's intraday events, leading to get_message and then handle_market_close firing and rolling over self.todays_performance before the intraday events get added to it. It looks to me like this is a supported configuration, because of this snippet from zipline/finance/performance/tracker.py:

            if (
                self.sim_params.data_frequency == 'minute'
                and
                self.sim_params.emission_rate == 'daily'
            ):
                # Minute data benchmarks should have a timestamp of market
                # close, so that calculations are triggered at the right time.
                # However, risk module uses midnight as the 'day'
                # marker for returns, so adjust back to midgnight.
                midnight = event.dt.replace(
                    hour=0,
                    minute=0,
                    second=0,
                    microsecond=0)
            else:
                midnight = event.dt

However, the default benchmarks still come back with the midnight time slices. Maybe this is a weird configuration, and there's some better way to achieve this (or I should be providing my own custom benchmark if I'm doing intraday data events like this), but I think it could be fixed very easily with something along these lines: 927ee7f

Thoughts?

@ehebert

This comment has been minimized.

Member

ehebert commented Nov 15, 2013

@bencpeters, very astute. For some more context, the
data_frequency='minute' and emission_rate='daily' is analogous to the
current 'minute' backtest mode on Quantopian.

Your patch does look like a potential solution.

Another possibility is to mutate the daily data source so that it takes a
'mark_with_close' option that changes the timestamps with the close at the
data source level.

  • Eddie

On Thu, Nov 14, 2013 at 11:40 AM, Ben Peters notifications@github.comwrote:

I've been experimenting with zipline to backtest a strategy that uses a
custom DataSource that gives me a feed of 2-4 intraday timestamps for
algorithm evaluation and possible portfolio re-balancing. Using dailyemission and
minute data_frequency, and relying on the default benchmark data, the
handle_market_close events fire before any of that day's calls to
handle_data and subsequent order execution, etc. This happens regardless
of whether instant_fill is set True or False.

The reason this is happening is that the default benchmark timeseries uses
zipline's "daily"-style timestamps (midnight of the current day UTC), which
is before US market intraday stamped events. When the event stream gets
sorted, the benchmark events thus get processed before that day's intraday
events, leading to get_message and then handle_market_close firing and
rolling over self.todays_performance before the intraday events get added
to it. It looks to me like this is a supported configuration, because of
this snippet from zipline/finance/performance/tracker.py:

        if (
            self.sim_params.data_frequency == 'minute'
            and
            self.sim_params.emission_rate == 'daily'
        ):
            # Minute data benchmarks should have a timestamp of market
            # close, so that calculations are triggered at the right time.
            # However, risk module uses midnight as the 'day'
            # marker for returns, so adjust back to midgnight.
            midnight = event.dt.replace(
                hour=0,
                minute=0,
                second=0,
                microsecond=0)
        else:
            midnight = event.dt

However, the default benchmarks still come back with the midnight time
slices. Maybe this is a weird configuration, and there's some better way to
achieve this (or I should be providing my own custom benchmark if I'm doing
intraday data events like this), but I think it could be fixed very easily
with something along these lines: 927ee7fhttps://github.com/quantopian/zipline/commit/927ee7f

Thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/241
.

@twiecki

This comment has been minimized.

Contributor

twiecki commented Nov 15, 2013

Agreed, very observational @bencpeters.

I do think @ehebert's suggested fix is a little cleaner as we should be able to do a way with the whole if-clause.

Is the data coming from yahoo finance midnight-timed? We could also just change that function to provide correct times and have the backtester just assume daily data to be timestamped for that particular day.

@bencpeters

This comment has been minimized.

bencpeters commented Dec 10, 2013

The data coming from yahoo finance is mapped according to this:

_BENCHMARK_MAPPING = {
    # Need to add 'symbol'
    'volume': (int, 'Volume'),
    'open': (float, 'Open'),
    'close': (float, 'Close'),
    'high': (float, 'High'),
    'low': (float, 'Low'),
    'adj_close': (float, 'Adj Close'),
    'date': (partial(date_conversion, date_pattern='%Y-%m-%d'), 'Date')
}

The raw CSV from yahoo just has dates for timestamps. Daily returns are calculated in get_benchmark_returns (zipline/data/benchmarks.py), and stored as a native List. Then, in zipline/data/loader.py, an iteration goes through the list of benchmark returns and wraps in a DailyReturn object (from zipline/protocol.py). The constructor for DailyReturn carefully zeros out the hour/min/sec of the timestamp, yielding the behavior described above.

Thus, the easiest way to fix this might just be to modify the default behaviour of the DailyReturn class constructor to replace timestamps with exchange closing timestamps rather than UTC midnight. If that sounds good, I'll throw together a pull request with that change?

@bencpeters

This comment has been minimized.

bencpeters commented Dec 10, 2013

Nevermind - I was looking at an old version of the code. DailyReturns is now a namedtuple rather than a full class, and it looks like Pandas normalize_date gets called to zero out timestamps to midnight.

I'd propose a similar fix to what I described in the last comment. Add something in either load_market_data or get_benchmark_returns to normalize benchmark timestamps from the midnight UTC daily ones to the currently configured local ones. Since get_benchmark_returns gets called by dump_benchmarks, which in turn saves benchmarks to a CSV file for caching, it might be best to add this transformation to market close timestamps in load_market_data rather than one of the lower level functions - that way the cached benchmark data won't be affected by differences in exchange time configurations.

@ehebert

This comment has been minimized.

Member

ehebert commented Dec 10, 2013

@bencpeters I concur that load_market_data seems like the right place while wiring this functionality in.

The converting of the data on each load from cache may end up being an inefficiency, but one that I think is livable while working on making sure behavior is correct when using the closing stamps.

Would love to see a PR, thanks again for doing the investigation into this.

@jkp

This comment has been minimized.

Contributor

jkp commented Dec 31, 2013

I ran into this problem when trying to fix another bug with emission_rate / data_frequency set to minute. On trying to fix it I found a problem with the proposed workaround: namely that the benchmark data may well stretch back beyond the first trading day in the current environment.

I'm not sure it's correct to hack this into load_market_data - if its done that way there probably needs to be an extra argument provided to change the behaviour so that it returns filtered and adjusted data (based on the trading days of the TradingEnvironment and the closing time for each one).

Thoughts?

@jkp

This comment has been minimized.

Contributor

jkp commented Jan 2, 2014

I did some more testing and found that we do need to detect data_frequency or things break for the other case. I added a unit test to cover the change and keep things in shape in the future.

twiecki added a commit that referenced this issue Jan 30, 2014

BUG: adjust benchmark events to match market hours
Previously benchmark events were emitted at 0:00 on the day the
benchmark related to: in 'minute' emission mode this meant that
the benchmarks were emitted before any intra-day trades were
processed.

See: #241
@twiecki

This comment has been minimized.

Contributor

twiecki commented Jan 30, 2014

OK, this is merged with 07e25ae. Thanks for fixing this @jkp and @bencpeters

@twiecki twiecki closed this Jan 30, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment