Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allows users to specify the cutoff time for data query in blaze loaders #947

Merged
merged 7 commits into from Jan 14, 2016

Conversation

@llllllllll
Copy link
Contributor

@llllllllll llllllllll commented Jan 7, 2016

This allows people to set their cutoff time to the time they will
actually execute 'before_trading_start'. Currently this is just passed
to the constructor of the loader; however, I would like to make this
managed by the algorithm simulation runner. This would help keep all of
the loaders in sync and lock 'before_trading_start's execution to the
time the data is queried for.

@llllllllll llllllllll added the Pipeline label Jan 7, 2016
@llllllllll llllllllll force-pushed the data-query-time branch 3 times, most recently from c68537a to 3756228 Jan 8, 2016
@richafrank richafrank self-assigned this Jan 8, 2016
df : pd.DataFrame
The dataframe to update. This needs a column named ``ts_field``.
time : datetime.time
The time to query before.

This comment has been minimized.

@richafrank

richafrank Jan 8, 2016
Member

What do you mean by "query" here?

This comment has been minimized.

@llllllllll

llllllllll Jan 9, 2016
Author Contributor

updating both docstrings to show:

    time : datetime.time
        The time of day to use as the cutoff point for new data. Data points
        that you learn about after this time will become available to your
        algorithm on the next trading day.
    tz : tzinfo
        The timezone to normalize your dates to before comparing against
        `time`.

This comment has been minimized.

@richafrank

richafrank Jan 11, 2016
Member

Nice - should we change the other uses of the word "query" throughout this?


dtidx = pd.DatetimeIndex(df.loc[:, ts_field], tz='utc')
dtidx_local_time = dtidx.tz_convert(tz)
to_roll_forward = dtidx_local_time.time > time

This comment has been minimized.

@richafrank

richafrank Jan 8, 2016
Member

Do we want to roll forward times gt or gte?

This comment has been minimized.

@llllllllll

llllllllll Jan 9, 2016
Author Contributor

I think we want inclusive, I will change and update the tests

This comment has been minimized.

@llllllllll

llllllllll Jan 9, 2016
Author Contributor

actually, I am reconsidering this. I think a sane default is to use midnight and if this is ge then things that are timestamped at exactly midnight get cut off.

data_query_tz,
inplace=True,
ts_field=TS_FIELD_NAME,
)

This comment has been minimized.

@richafrank

richafrank Jan 8, 2016
Member

Since we're astypeing and normalizing the ts_field for both the expr and deltas, should we do it in a loop instead of duplicating those statements?

# for all of the times that are greater than our query time add 1
# day and truncate to the date
df.loc[to_roll_forward, ts_field] = (
dtidx_local_time[to_roll_forward] + datetime.timedelta(days=1)

This comment has been minimized.

@richafrank

richafrank Jan 8, 2016
Member

Do we need to take into account the trading calendar?

This comment has been minimized.

@llllllllll

llllllllll Jan 9, 2016
Author Contributor

Nope, this is marking the date we would have knowledge of the event. This does not need to align with the trading calendar. For example: things we learn about on saturday become available to the algo on monday.

This comment has been minimized.

@richafrank

richafrank Jan 11, 2016
Member

Makes sense.

@@ -121,20 +148,32 @@ def load_adjusted_array(self, columns, dates, assets, mask):
# range. It must all be null anyways.
lower = dates[0]

This comment has been minimized.

@richafrank

richafrank Jan 8, 2016
Member

Do we need to be normalizing lower?

This comment has been minimized.

@llllllllll

llllllllll Jan 9, 2016
Author Contributor

we ffill from the last known value anyway so it shouldn't matter.

@llllllllll llllllllll force-pushed the data-query-time branch 2 times, most recently from 0d07681 to 68f0f7e Jan 9, 2016
df = df.copy()

dtidx = pd.DatetimeIndex(df.loc[:, ts_field], tz='utc')
dtidx_local_time = dtidx.tz_convert(tz)

This comment has been minimized.

@richafrank

richafrank Jan 11, 2016
Member

Would there be much of a perf gain to skip all the conversions if tz == 'utc'?

This comment has been minimized.

@llllllllll

llllllllll Jan 12, 2016
Author Contributor

I don't think so, tz_convert shouldn't affect the raw representation of the dates:

In [5]: big = pd.date_range('1700', '2016', tz='utc')

In [7]: %timeit big.tz_convert('US/Eastern')
The slowest run took 8.15 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 8.25 µs per loop

In [8]: len(big)
Out[8]: 115417
@llllllllll
Copy link
Contributor Author

@llllllllll llllllllll commented Jan 12, 2016

updated based on in person discussion.

llllllllll added 4 commits Jan 7, 2016
loaders

This allows people to set their cutoff time to the time they will
actually execute 'before_trading_start'. Currently this is just passed
to the constructor of the loader; however, I would like to make this
managed by the algorithm simulation runner. This would help keep all of
the loaders in sync and lock 'before_trading_start's execution to the
time the data is queried for.
@llllllllll llllllllll force-pushed the data-query-time branch from f437e86 to b87e15b Jan 13, 2016
lower -= datetime.timedelta(days=1)
if time is not None:
return normalize_data_query_time(
lower - datetime.timedelta(days=1),

This comment has been minimized.

@richafrank

richafrank Jan 13, 2016
Member

Is this further subtraction intentional?

This comment has been minimized.

@llllllllll

llllllllll Jan 13, 2016
Author Contributor

nope

@llllllllll llllllllll force-pushed the data-query-time branch from b87e15b to 271f7e6 Jan 13, 2016
The timezone to normalize your dates to before comparing against
`time`.
"""
# Subtract one day to grab things that happend on the first day we are

This comment has been minimized.

@richafrank

richafrank Jan 13, 2016
Member

typo "happend"

ValueError
Raised when only one of the arguments is None.
"""
if (data_query_time is None) ^ (data_query_tz is None):

This comment has been minimized.

@richafrank

richafrank Jan 13, 2016
Member

I think I would probably use !=, since this is on booleans, but bitwise xor works too.

This comment has been minimized.

@llllllllll

llllllllll Jan 13, 2016
Author Contributor

I think the xor more accuratly describes the check that I am trying to do. I want to fail if exactly one is true.

This comment has been minimized.

@llllllllll

llllllllll Jan 13, 2016
Author Contributor

also, __xor__ is overridden so it will return a bool.

This comment has been minimized.

@richafrank

richafrank Jan 13, 2016
Member

I totally follow and can see cases for both. Feel free to leave it.

@llllllllll llllllllll force-pushed the data-query-time branch from 271f7e6 to f7d051b Jan 13, 2016
@richafrank
Copy link
Member

@richafrank richafrank commented Jan 13, 2016

Looks like a test is still failing.

@llllllllll
Copy link
Contributor Author

@llllllllll llllllllll commented Jan 13, 2016

merging on pass

@llllllllll llllllllll force-pushed the data-query-time branch from 08f250c to a3dbf75 Jan 13, 2016
llllllllll added a commit that referenced this pull request Jan 14, 2016
allows users to specify the cutoff time for data query in blaze loaders
@llllllllll llllllllll merged commit 7115b85 into master Jan 14, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@llllllllll llllllllll deleted the data-query-time branch Jan 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.