Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for '1m' History Frequency. #345

Merged
merged 6 commits into from Jun 5, 2014
Merged

Conversation

@ssanderson
Copy link
Contributor

@ssanderson ssanderson commented Jun 4, 2014

Overhauls HistoryContainer in prep to support '1m' frequency.

Major changes:

  • Methods/variables referring to "day" have been renamed/generalized.
    • current_day_panel became buffer_panel, which is now a RollingPanel
    • prior_day_panel became a dictionary mapping Frequency objects to
      "digest panels", which are instances of RollingPanel.
  • Hard-coded daily rollover replaced with a notion of a "current window" for
    each unique frequency managed by the panel.
    • When the end of the current window is reached for a given frequency, we
      compute an aggregate bar (code refers to this as a "digest"), which is
      appended to a panel associated with that frequency.
    • Window rollover dates are managed by a pair of dictionaries,
      cur_window_starts and cur_window_closes. The Frequency class is
      responsible for computing window bounds based on the open/close of the
      previous window.
  • Semantic change to the open_price field: open_price now always
    contains the price of the first trade occurring in the given window.
    Previously it contained the price of the first minute in the window,
    returning NaN it the security happened not to trade in the first minute.
  • price is now the only field that can be forward-filled.
@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

Hrm. It looks like Github is ordering my commits by time rather than by the actual commit sequence. (I reordered the last two commits during rebase.)

@twiecki
Copy link
Contributor

@twiecki twiecki commented Jun 4, 2014

Is this up for review? I checked it out locally but getting some test failures:

======================================================================

FAIL: test_history_container_0_test_daily_open_close (tests.test_history.TestHistoryContainer)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose_parameterized/parameterized.py", line 233, in <lambda>

standalone_func = lambda *a: func(*(a + p.args), **p.kwargs)

File "/home/travis/build/quantopian/zipline/tests/test_history.py", line 205, in test_history_container

expected[spec.key_str][update_count],

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal

verbose=verbose, header='Arrays are not equal')

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare

raise AssertionError(msg)

AssertionError:

Arrays are not equal

(mismatch 66.6666666667%)

x: array([[nan],

[nan],

[10.0]], dtype=object)

y: array([[ nan],

[ nan],

[ 10.]])

>>  raise AssertionError('\nArrays are not equal\n\n(mismatch 66.6666666667%)\n x: array([[nan],\n       [nan],\n       [10.0]], dtype=object)\n y: array([[ nan],\n       [ nan],\n       [ 10.]])')

======================================================================

FAIL: test_history_container_1_test_multiple_fields_and_sids (tests.test_history.TestHistoryContainer)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose_parameterized/parameterized.py", line 233, in <lambda>

standalone_func = lambda *a: func(*(a + p.args), **p.kwargs)

File "/home/travis/build/quantopian/zipline/tests/test_history.py", line 205, in test_history_container

expected[spec.key_str][update_count],

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal

verbose=verbose, header='Arrays are not equal')

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare

raise AssertionError(msg)

AssertionError:

Arrays are not equal

(mismatch 66.6666666667%)

x: array([[nan, nan],

[nan, nan],

[0.0, 0.0]], dtype=object)

y: array([[ nan, nan],

[ nan, nan],

[ 0., 0.]])

>>  raise AssertionError('\nArrays are not equal\n\n(mismatch 66.6666666667%)\n x: array([[nan, nan],\n       [nan, nan],\n       [0.0, 0.0]], dtype=object)\n y: array([[ nan,  nan],\n       [ nan,  nan],\n       [  0.,   0.]])')

======================================================================

FAIL: test_history_container_2_test_illiquid_prices (tests.test_history.TestHistoryContainer)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose_parameterized/parameterized.py", line 233, in <lambda>

standalone_func = lambda *a: func(*(a + p.args), **p.kwargs)

File "/home/travis/build/quantopian/zipline/tests/test_history.py", line 205, in test_history_container

expected[spec.key_str][update_count],

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal

verbose=verbose, header='Arrays are not equal')

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare

raise AssertionError(msg)

AssertionError:

Arrays are not equal

(mismatch 66.6666666667%)

x: array([[nan],

[nan],

[10.0]], dtype=object)

y: array([[ nan],

[ nan],

[ 10.]])

>>  raise AssertionError('\nArrays are not equal\n\n(mismatch 66.6666666667%)\n x: array([[nan],\n       [nan],\n       [10.0]], dtype=object)\n y: array([[ nan],\n       [ nan],\n       [ 10.]])')

======================================================================

FAIL: test_history_container_3_test_mixed_frequencies (tests.test_history.TestHistoryContainer)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose_parameterized/parameterized.py", line 233, in <lambda>

standalone_func = lambda *a: func(*(a + p.args), **p.kwargs)

File "/home/travis/build/quantopian/zipline/tests/test_history.py", line 205, in test_history_container

expected[spec.key_str][update_count],

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal

verbose=verbose, header='Arrays are not equal')

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare

raise AssertionError(msg)

AssertionError:

Arrays are not equal

(mismatch 50.0%)

x: array([[nan],

[0.0]], dtype=object)

y: array([[ nan],

[ 0.]])

>>  raise AssertionError('\nArrays are not equal\n\n(mismatch 50.0%)\n x: array([[nan],\n       [0.0]], dtype=object)\n y: array([[ nan],\n       [  0.]])')

@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

@twiecki interesting. Looks like Travis has the same failures as well, but the tests are passing locally for me. It looks like a numpy/pandas versioning mismatch, because the issue is that I'm using a pandas testing utility to check that two dataframes are equal, and in your version that method is treating NaNs as unequal, whereas in mine it's treating them as equal. I get the following in my Zipline virtualenv:

(qexec)[~/quantopian/qexec/zipline_repo]@(one_minute_history_for_realz:6d85a8d5dc255dcb7e88d6fc2eb15ebc3594009a)$ pip freeze | grep pandas
pandas==0.12.0
(qexec)[~/quantopian/qexec/zipline_repo]@(one_minute_history_for_realz:6d85a8d5dc255dcb7e88d6fc2eb15ebc3594009a)$ pip freeze | grep numpy
numpy==1.8.1
@twiecki
Copy link
Contributor

@twiecki twiecki commented Jun 4, 2014

I stepped into the failing test and it seems it's a datetime index issue. I'll post the index soon.

@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

@twiecki I upgraded my pandas from 0.12.0 to 0.14.0 and I'm now seeing the failure locally. Looking at the Travis log, I think we're just installing the lastest Pandas there as well.

@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

So, there's a date error in the expected results for test_daily_open_close, but fixing that still yields this error, because the testing method I'm using actually only checks the values of the underlying numpy arrays, ignoring the indices. The underlying failure is still that NaN is being treated as unequal to NaN inside numpy.testing.assert_array_equal when given a pandas 0.14.0 frame, which isn't the case for pandas 0.12.0.

@twiecki
Copy link
Contributor

@twiecki twiecki commented Jun 4, 2014

Yeah, that's quite annoying. I just discovered pandas.util.testing that
e.g. has a assert_equal() for data structures. Perhaps that one handles
this better?

On Wed, Jun 4, 2014 at 5:06 PM, Scott Sanderson notifications@github.com
wrote:

So, there's a date error in the expected results for
test_daily_open_close, but fixing that still yields this error, because the
testing method I'm using actually only checks the values of the underlying
numpy arrays, ignoring the indices. The underlying failure is still that
NaN is being treated as unequal to NaN inside
numpy.testing.assert_array_equal when given a pandas 0.14.0 frame, which
isn't the case for pandas 0.12.0.


Reply to this email directly or view it on GitHub
#345 (comment).

Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston

@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

Yeah, i'm going to switch over to using assert_frame_equal. It caught
another data-entry bug in my test suite, and it handles the nans the way we
want it to.

On Wed, Jun 4, 2014 at 11:09 AM, Thomas Wiecki notifications@github.com
wrote:

Yeah, that's quite annoying. I just discovered pandas.util.testing that
e.g. has a assert_equal() for data structures. Perhaps that one handles
this better?

On Wed, Jun 4, 2014 at 5:06 PM, Scott Sanderson notifications@github.com

wrote:

So, there's a date error in the expected results for
test_daily_open_close, but fixing that still yields this error, because
the
testing method I'm using actually only checks the values of the
underlying
numpy arrays, ignoring the indices. The underlying failure is still that
NaN is being treated as unequal to NaN inside
numpy.testing.assert_array_equal when given a pandas 0.14.0 frame, which
isn't the case for pandas 0.12.0.


Reply to this email directly or view it on GitHub
#345 (comment).

Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston


Reply to this email directly or view it on GitHub
#345 (comment).

@ssanderson
Copy link
Contributor Author

@ssanderson ssanderson commented Jun 4, 2014

Updated the last commit with test fixes.

ssanderson added 6 commits May 30, 2014
Adds a suite of new functions for querying data from the trading calendar.

These include:
      `previous_trading_day`
      `minutes_for_days_in_range` (minutely version of `days_in_range`)
      `previous_open_and_close` (inverse of `next_open_and_close`)
      `next_market_minute`
      `previous_market_minute`
      `open_close_window` (get a range of opens/closes with slicing semantics)
      `market_minute_window` (get a range of minutes with slicing semantics)

Also refactors `test_finance` to move `TradingEnvironment` tests into their own
TestCase.
Overhauls `HistoryContainer` in prep for support of more than one frequency.

Major changes:

   - Methods/variables referring to "day" have been renamed/generalized.
     - `current_day_panel` became `buffer_panel`, which is now a `RollingPanel`
     - `prior_day_panel` became a dictionary mapping `Frequency` objects to
       "digest panels", which are instances of `RollingPanel`.

   - Hard-coded daily rollover replaced with a notion of a "current window" for
     each unique frequency managed by the panel.

     - When the end of the current window is reached for a given frequency, we
       compute an aggregate bar (code refers to this as a "digest"), which is
       appended to a panel associated with that frequency.

     - Window rollover dates are managed by a pair of dictionaries,
       `cur_window_starts` and `cur_window_closes`.  The `Frequency` class is
       responsible for computing window bounds based on the open/close of the
       previous window.

   - Semantic change to the `open_price` field: `open_price` now always
     contains the price of the first trade occurring in the given window.
     Previously it contained the price of the first minute in the window,
     returning NaN it the security happened not to trade in the first minute.
Fixes an issue where, if `ffill=False`, `get_history` would return nans for
every entry in the history frame except the last one.
Updates `HistoryContainer.roll` to handle cases where no data is present for
the period being rolled.

We now only forward-fill the `price` field when `ffill` is specified.
Also adds a length-1 HistorySpec to the test.
@ssanderson ssanderson merged commit fba649d into master Jun 5, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@ssanderson ssanderson deleted the one_minute_history_for_realz branch Jun 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants