New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic history #412

Merged
merged 2 commits into from Nov 3, 2014

Conversation

Projects
None yet
3 participants
@llllllllll
Member

llllllllll commented Oct 16, 2014

Previously, all specs had to be pre-allocated by using the 'add_history'
function. This is now no longer required and instead serves as a hint to
the HistoryContainer to pre-allocate the space for the given spec.

History can grow by increasing the length for a frequency, adding a
frequency, or adding a field. It can grow with any combination of
these.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch from 1f1d7bc to 2db166e Oct 16, 2014

@twiecki

This comment has been minimized.

Contributor

twiecki commented Oct 16, 2014

+1 on replacing daily_at_midnight with data_frequency.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch from 2db166e to fcf7652 Oct 21, 2014

@llllllllll llllllllll added Needs Review and removed In Progress labels Oct 21, 2014

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch from fcf7652 to a0967fd Oct 21, 2014

@llllllllll llllllllll added In Progress and removed Needs Review labels Oct 21, 2014

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch 3 times, most recently from c5ae828 to 9a6e5d7 Oct 22, 2014

@llllllllll llllllllll added Needs Review and removed In Progress labels Oct 22, 2014

@@ -451,3 +452,33 @@ def __repr__(self):
emission_rate=self.emission_rate,
first_open=self.first_open,
last_close=self.last_close)
def with_environment(asname='env'):

This comment has been minimized.

@ssanderson
@@ -124,33 +171,44 @@ class HistoryContainer(object):
Entry point for the algoscript is the result of `get_history`.
"""
VALID_FIELDS = {
'price', 'open_price', 'volume', 'high', 'low',

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

should close_price be here as well?

This comment has been minimized.

@llllllllll
pd.Panel(
items=self.items,
minor_axis=self.minor_axis,
major_axis=range(delta),

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

This should probably be np.arange.

Resizes the buffer to hold a new window with a new cap_multiple.
If cap_multiple is None, then the old cap_multiple is used.
"""
if window == self.window and cap_multiple == self.cap_multiple:

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

Are there valid cases where this condition is met?

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

To simplify this, I am just removing the notion of changing cap_multiple, I am also removing the check here. The assumption will be that the caller has verified that a resize needs to happen.

return
self.window = window
self.cap_multiple = cap_multiple or self.cap_multiple

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

When would we want to reset the cap_multiple?

@@ -116,6 +163,9 @@ def _roll_data(self):
self.date_buf[:self.window] = self.date_buf[-self.window:]
self.pos = self.window
def __len__(self):

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

Not sure how I feel about putting a len on RollingPanel. For comparison, the base pandas Panel has no len, presumably because it's not obvious what axis the length should refer to.

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

This just seemed nicer than reaching in for the window size.

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

My inclination would still be to put this on a normal method rather than the magic __len__. One worry I might have is that it could be ambiguous from a user perspective whether len refers to the window length or the length of the actual buffer.

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

sounds good, I just made a window_length property, this lets people inspect this value without worrying about mutation. I also renamed window to _window to, The proper way to adjust the window would be through a call to resize, so this avoids confusion.

@@ -61,6 +68,10 @@ def set_sids(self, sids):
self.minor_axis = _ensure_index(sids)
self.buffer = self.buffer.reindex(minor_axis=self.minor_axis)
def set_fields(self, fields):

This comment has been minimized.

@ssanderson

ssanderson Oct 22, 2014

Member

This feels like it should be set_items rather than set_fields. (I'm also having second thoughts about having renamed minor_axis to sids).

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

how would you feel if I renamed set_fields to set_items and set_sids toset_minor_axis?

initial_sids,
initial_dt,
data_frequency,
before_init=True):

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

This seems like it gets passed around all over the place. Can you explain what it's for?

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

We can now create panels before initialize or after. This changes how they neeed to be lined up.

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

Why do panels need to be aligned differently before vs after init?

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

Before init, when a panel is created, initial_dt is where the next bar should be inserted; however, when a panel is constructed at runtime, the dt is the current datetime and thus must be "shifted" forward by one bar. Thinking through this better, I think it only affects the buffer_panel.

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

It seems like knowing the context in which the panel is being created shouldn't be in the purview of this function. Is there a way to make it so that this is always just passed the dt of the next bar to be inserted?

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

It actually seems more clear when I moved this responsibility to the caller.

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

If we're keeping the logic behind this parameter, can we at least give it a clearer name? Calling a function named __init__ with a parameter named before_init is pretty confusing.

In general, for boolean parameters, I think its better for names the behavior it defines than the context in which it's expected to be used.

This comment has been minimized.

@llllllllll

llllllllll Oct 31, 2014

Member

I have named this to shift_digest, as it shift the digest panels by one bar. I have also written a docstring for the HistoryContainer explaining the reason for the various parameters, including the newly added one.

date_buf=date_buf,
)
if not before_init:

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

What's happening here?

panel.date_buf[panel.pos] = pd.Timestamp(dt).value
panel.pos += 1
if not is_buffer:

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

Am I correct in reading that window_starts and window_closes are meaningless if _is_buffer is True? Seems like this should be part of a different function if that's the case.

This comment has been minimized.

@llllllllll

llllllllll Oct 23, 2014

Member

I agree that I could refactor this into more helpers.

window_starts=None,
window_closes=None,
before_init=True,
env=None):

This comment has been minimized.

@ssanderson

ssanderson Oct 23, 2014

Member

This badly wants a docstring.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch 3 times, most recently from ab41ccc to 7d9daba Oct 23, 2014

source = RandomWalkSource(start=start, end=end)
self.assertIsNone(test_algo.history_container)
self.assertIsNotNone(test_algo.run(source))

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

I'm really not a fan of our pattern of writing tests that just run an algorithm and assert it doesn't crash.

This comment has been minimized.

@llllllllll

llllllllll Oct 29, 2014

Member

I meant to add assertions about the new container, I will make sure that it is correctly formed.

return close
return pd.tslib.normalize_date(

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

I think the comment on this function is longer true for daily mode.

This comment has been minimized.

@llllllllll

llllllllll Oct 29, 2014

Member

yup, I will update it

@property
def max_days(self):
if self.data_frequency == 'minute':

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

This is probably more idiot-proof for the future if it's written as data_frequency != daily.

This comment has been minimized.

@llllllllll

llllllllll Oct 29, 2014

Member

This makes sense, especially if we are going to add more frequency types.

@with_environment()
def prev_bar(spec, env=None):

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

This is defined twice.

This comment has been minimized.

@llllllllll

llllllllll Oct 31, 2014

Member

I turned this into a method on the frequency. This only depended on the spec's frequency's data, so that seemed like the sane place for it.

@@ -88,33 +88,127 @@ def freq_str_and_bar_count(history_spec):
return (history_spec.frequency.freq_str, history_spec.bar_count)
def group_by_frequency(history_specs):
@with_environment()
def prev_bar(spec, env=None):

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

This is defined twice.

)
panels[freq] = rp
return panels, first_window_starts, first_window_closes
def create_buffer_panel(self, initial_dt):
def create_buffer_panel(self, initial_dt, before_init):

This comment has been minimized.

@ssanderson

ssanderson Oct 29, 2014

Member

I think before_init is unused here.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch 6 times, most recently from e86bbdc to f3e110c Oct 31, 2014

"""
if unit_str == 'd':
# Get the properly key'd datetime64 out of the pandas Timestamp
arr = _extract_values(

This comment has been minimized.

@ssanderson

ssanderson Nov 3, 2014

Member

Why are we calling _extract_values here but not below?

This comment has been minimized.

@llllllllll

llllllllll Nov 3, 2014

Member

I believe it was because open_close_window (incorrectly?) returned an array of pd.Timestamps, not datetime64[ns]

HistorySpec(2, '2d', 'open', True),
HistorySpec(2, '1d', 'open', False),
HistorySpec(5, '1m', 'open', True)]
def _extract_values(arr):

This comment has been minimized.

@ssanderson

ssanderson Nov 3, 2014

Member

I think this can just be written as `_extract_values = np.vectorize(attrgetter('value')). Though even that is super inefficient. I'm surprised there isn't a pandas/numpy method for this.

major_deque.append(date)
result = rp.get_current()
expected = pd.Panel(frames, items=list(major_deque),

This comment has been minimized.

@ssanderson

ssanderson Nov 3, 2014

Member

I'm not sure why this test changed.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch 3 times, most recently from 70a3799 to 2957601 Nov 3, 2014

ENH: Allows history to be dynamic and grow the container at runtime.
Previously, all specs had to be pre-allocated by using the 'add_history'
function. This is now no longer required and instead serves as a hint to
the HistoryContainer to pre-allocate the space for the given spec.

History can grow by increasing the length for a frequency, adding a
frequency, or adding a field. It can grow with any combination of
these.

HistoryContainer now is aware of the data_frequency of the algorithm,
and no longer uses the daily_at_midnight flag; instead, this is the
default behavior.

@llllllllll llllllllll force-pushed the not-so-simple-transforms branch from 2957601 to f8f7f2f Nov 3, 2014

@ssanderson

This comment has been minimized.

Member

ssanderson commented Nov 3, 2014

Assuming Travis passes, this looks good to me.

@llllllllll llllllllll merged commit ca1569f into master Nov 3, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@llllllllll llllllllll deleted the not-so-simple-transforms branch Nov 3, 2014

@llllllllll llllllllll removed the Needs Review label Nov 4, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment