New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for different typed adjusted arrays and adds an EarningsCalendar loader #905

Merged
merged 14 commits into from Dec 11, 2015

Conversation

Projects
None yet
2 participants
@llllllllll
Member

llllllllll commented Dec 8, 2015

  • Moves most of AdjustedArray back into Python. The window iterator is
    the only part that's performance-intensive.
  • Adds a bootleg templating system for creating specialized versions of
    AdjustedArrayWindow for each concrete type we care about.
  • Adds support for differently dtyped terms in pipeline. This allows us
    to use datetime64s which are needed in the EarningsCalendar.
  • Adds EarningsCalendar dataset for the next and previous earnings
    announcements in pipeline.
  • Adds in memory loader for EarningsCalendar.
  • Adds blaze loader for EarningsCalendar.

@llllllllll llllllllll force-pushed the refactor-adjusted-array branch from 7357421 to 8220d1e Dec 9, 2015

ssanderson and others added some commits Nov 18, 2015

ENH: Adds support for different typed adjusted arrays and adds an
EarningsCalendar loader.

- Moves most of AdjustedArray back into Python. The window iterator is
  the only part that's performance-intensive.

- Adds a bootleg templating system for creating specialized versions of
  AdjustedArrayWindow for each concrete type we care about.

- Adds support for differently dtyped terms in pipeline. This allows us
  to use datetime64s which are needed in the EarningsCalendar.

- Adds EarningsCalendar dataset for the next and previous earnings
  announcements in pipeline.

- Adds in memory loader for EarningsCalendar.

- Adds blaze loader for EarningsCalendar.
BUG: Fix hardcoded type repr in test.
Types repr differently in py2 vs py3.
timestamp=filtered[TS_FIELD_NAME].max(),
).timestamp.min(),
pd.Timestamp,
**self._odo_kwargs or {}

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

I don't think we need this check here. We default to an empty dict in the constructor.

(expr[TS_FIELD_NAME] <= dates[-1])
],
pd.DataFrame,
**self._odo_kwargs or {}

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

Same note as above.


gb = raw.groupby(SID_FIELD_NAME)
if self._has_ts:
def mkseries(idx, raw_loc=raw.loc):

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

Is this just caching in the locals for performance?

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

yeah, the function is only consumed within this scope so I figure this is safe.

self._has_ts = has_ts = TS_FIELD_NAME in dshape.measure.dict
if not has_ts:
# This field is optional.
expected_fields - {TS_FIELD_NAME}

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

I don't think this does anything...

ssanderson added some commits Dec 10, 2015

MAINT: Simplify BlazeEarningsCalendarLoader.
Removes codepaths that add implicit timestamps.

@llllllllll llllllllll force-pushed the refactor-adjusted-array branch from 90a7181 to 4963b2c Dec 10, 2015

message = e.exception.args[0]
expected = (
"Don't know how to compute datetime64[ns] + datetime64[ns].\n"
"Arithmetic operators are only supported on Factors of dtype "

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

Should this say addition instead. Subtraction is supported, no?

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

also, is there a test for datetime - datetime :: timedelta or is this not yet supported?

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

It's not supported yet, mostly because I couldn't think of a real use-case for it.

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

(The error message is currently an accurate reflection of the actual behavior.)

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

would you want to do something like:

mask = EarningsCalendar.next_earnings.latest() - today < timedelta(5)

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

You probably don't want calendar date arithmetic there though. I added BusinessDaysUntil{Next,Previous}Earnings for that use case.

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

And I'm not sure yet how/if we want to do today as a symbolic object.

self.anchor = window_length + offset
self.max_anchor = data.shape[0]

if len(self.adjustment_indices) > 0:

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

this is duplicated in the __next__. Can we pull this out into a next_adj func?

# Purely for readability. There aren't C-level declarations for these types.
ctypedef object Int64Index_t
ctypedef object DatetimeIndex_t
ctypedef object Timestamp_t

# Adjustment kinds.
MULTIPLY = 0

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

Could we define these with cpdef enum AdjustmentKind: for clarity. We can then change the usages of int adjustment_kind to AdjustmentKind adjustment_kind. This compiles the same but ls more clear in the source.

array.
"""
cdef type type_ = choose_adjustment_type(adjustment_kind, value)
return type_.from_assets_and_dates(

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

do we have the same issue with kwargs here?

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

Nope. Yay Cython!

This comment has been minimized.

@llllllllll
window_length = 0
dtype = float64_dtype

def _compute(self, arrays, dates, assets, mask):

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

since this is the same func as prev but with the operatands flipped, do you think it makes sense to share the setup code path? They are pretty short so idk if it is worth.

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

I thought about that while writing this and ended up just pulling out the functionality that's in busday_count_mask_NaT. I kind of want to use these as examples of how to write factors that work with dates, so I think it's ok if they're a bit verbose/explicit.

`2 + Factor()` has the same behavior as `2.0 + Factor()`.
"""
if isinstance(argvalue, Number):
return float64(argvalue)

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

should we warn in the case of a possible loss of precisio? For example argvalue=Fraction(1, 3) or 2 ** (sys.float_info.mant_dig + 1) + 1

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

If numpy doesn't do this, then I'm not sure we should be in the business of doing it either.

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member
In [2]: float64(Fraction(1, 3))
Out[2]: 0.33333333333333331

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

Okay, just a thought. Also, numpy should totally warn about this.

edit: really it should be Fractional.__float__ and int.__float__ that warn, nvm

@@ -231,7 +274,8 @@ def _compute(self, windows, dates, assets, mask):
"""
# TODO: Make mask available to user's `compute`.
compute = self.compute
out = full(mask.shape, nan, dtype=self.dtype)
missing_value = self.missing_value
out = full(mask.shape, missing_value, dtype=self.dtype)

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

full_like(mask, ...)

TypeError: foo() expected an argument with dtype 'int64' for argument 'x', but got dtype 'float64' instead. # noqa
"""
if _pos:
raise TypeError("expect_dtypes() only takes keyword arguments.")

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

why do we accept *args just to raise? This will raise a type error without accepting _pos.

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

This is how the other expect_* functions work as well. I did this to provide a clearer error message I think, since I kept writing them passing positionally and getting confused when it didn't work.

I could see the argument for removing the *pos, but I don't think that that change belongs here.

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

Okay, I just think that it is not generally good practice to alias standard language errors. We can make this change is a different pr.

def display_bad_value(value):
# If the bad value has a dtype, but it's wrong, show the dtype name.
if hasattr(value, 'dtype'):
return value.dtype.name

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

there are some cases where this will raise an attribute error, maybe check hasattr(value, 'dtype') and hasattr(value.dtype, 'name')

"""
Get the default fill value for `dtype`.
"""
return _FILLVALUE_DEFAULTS[dtype]

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

why don't consumers just use this dict directly?

This comment has been minimized.

@ssanderson

ssanderson Dec 10, 2015

Member

I can imagine the implementation becoming more complex in the future. The fact that it's just a simple dict lookup right now is an implementation detail.

@@ -603,3 +605,28 @@ def wrapped(*args, **kwargs):

return wrapped
return dec


def assert_timestamp_equal(left, right, compare_nat_equal=True, msg=""):

This comment has been minimized.

@llllllllll

llllllllll Dec 10, 2015

Member

this is no longer used

@llllllllll

This comment has been minimized.

Member

llllllllll commented Dec 10, 2015

other than the small comments your changes look good to merge.

@ssanderson

This comment has been minimized.

Member

ssanderson commented Dec 10, 2015

This is ready to merge here. Waiting on downstream compat changes.

ssanderson added a commit that referenced this pull request Dec 11, 2015

Merge pull request #905 from quantopian/refactor-adjusted-array
Adds support for different typed adjusted arrays and adds an EarningsCalendar loader

@ssanderson ssanderson merged commit ebb4fbd into master Dec 11, 2015

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Scrutinizer 133 updated code elements
Details

@ssanderson ssanderson deleted the refactor-adjusted-array branch Dec 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment