New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor-to-factor correlations and regressions #1307

Merged
merged 1 commit into from Jul 19, 2016

Conversation

Projects
None yet
3 participants
@dmichalowicz
Contributor

dmichalowicz commented Jun 28, 2016

Opening in favor of #1300

  • Allow correlations/regressions to be computed between two 2D factors by doing computations asset-wise
  • Moves all correlation/regression related tests from test_engine.py to test_statistical.py
@coveralls

This comment has been minimized.

coveralls commented Jun 28, 2016

Coverage Status

Coverage decreased (-0.03%) to 82.956% when pulling 70a4e6d on sid-to-sid-2 into 43f4c3a on master.

@dmichalowicz dmichalowicz force-pushed the sid-to-sid-2 branch from 70a4e6d to 54f434e Jun 28, 2016

@coveralls

This comment has been minimized.

coveralls commented Jun 28, 2016

Coverage Status

Coverage increased (+0.03%) to 83.012% when pulling 54f434e on sid-to-sid-2 into c89e957 on master.

@coveralls

This comment has been minimized.

coveralls commented Jun 28, 2016

Coverage Status

Coverage increased (+0.03%) to 83.014% when pulling 8d4faba on sid-to-sid-2 into c89e957 on master.

@dmichalowicz dmichalowicz force-pushed the sid-to-sid-2 branch from 8d4faba to 63a7111 Jun 29, 2016

@coveralls

This comment has been minimized.

coveralls commented Jun 29, 2016

Coverage Status

Coverage increased (+0.03%) to 83.014% when pulling 63a7111 on sid-to-sid-2 into c89e957 on master.

@dmichalowicz dmichalowicz force-pushed the sid-to-sid-2 branch from 63a7111 to 06b7789 Jul 5, 2016

@coveralls

This comment has been minimized.

coveralls commented Jul 5, 2016

Coverage Status

Coverage increased (+0.03%) to 83.031% when pulling 06b7789 on sid-to-sid-2 into 459366c on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 5, 2016

Coverage Status

Coverage increased (+0.03%) to 83.031% when pulling 761ac81 on sid-to-sid-2 into 459366c on master.

@@ -626,7 +632,9 @@ def rank(self, method='ordinal', ascending=True, mask=NotSpecified):
return Rank(self, method=method, ascending=ascending, mask=mask)
@expect_types(
target=Slice, correlation_length=int, mask=(Filter, NotSpecifiedType),
target=(FactorProxy, LoadableTerm, Slice),

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 6, 2016

Contributor

Note to fix: If the wrong type of term is given here, expect_types says something like "this parameter was expected to be of type FactorProxy or..." which is confusing since we actually mean Factor

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 6, 2016

Contributor

Also, thinking this should be BoundColumn instead of LoadableTerm?

@expect_types(
target=Slice, correlation_length=int, mask=(Filter, NotSpecifiedType),
)
@expect_types(correlation_length=int, mask=(Filter, NotSpecifiedType))

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 7, 2016

Contributor

Omitting the type check here for target and instead relying on the type checking done on target in RollingPearson, because we now allow Factors, Slices and BoundColumns.

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2016

Member

You might want to throw an expect_types of Term for target here?

@coveralls

This comment has been minimized.

coveralls commented Jul 7, 2016

Coverage Status

Coverage increased (+0.03%) to 83.031% when pulling bb7f392 on sid-to-sid-2 into 459366c on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 13, 2016

Coverage Status

Coverage increased (+0.02%) to 83.025% when pulling def9ec3 on sid-to-sid-2 into 459366c on master.

# spearmanr returns the R-value and the P-value.
out[i] = spearmanr(factor_data[:, i], slice_data_column)[0]
def compute(self, today, assets, out, base_data, target_data):
if target_data.shape[1] > 1:

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2016

Member

Cleaner here might be:

target_date = np.broadcast_to(target_data, base_data.shape)
for i in range(len(out)):
    ...

The same pattern could be used pretty much everywhere in this file.

# pearsonr returns the R-value and the P-value.
out[i] = pearsonr(factor_data[:, i], slice_data_column)[0]
def compute(self, today, assets, out, base_data, target_data):
if target_data.shape[1] > 1:

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2016

Member

See note below re: cleaning this up via broacasting.

@@ -765,6 +759,10 @@ def linear_regression(self, target, regression_length, mask=NotSpecified):
Parameters
----------
target : zipline.pipeline.Term with a numeric dtype

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2016

Member

target appears twice in this docstring

)
class StatisticalBuiltInsTestCase(WithTradingEnvironment, ZiplineTestCase):

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 18, 2016

Contributor

This whole test suite has just been moved from test_engine, there is no logic change here.

assert_frame_equal(regression_results, expected_regression_results)
# Make sure we cannot call the linear regression method on factors or
# slices of dtype `datetime64[ns]`.

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 18, 2016

Contributor

Note to self: make this a separate test?

)
@parameter_space(correlation_length=[1, 2, 3, 4])
def test_factor_correlation_methods_two_factors(self, correlation_length):

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 18, 2016

Contributor

This test and test_factor_regression_method_two_factors below are the only new tests for the factor-to-factor computations. @ssanderson Previously I had tried to factor out some of the shared code here with test_correlation_factors above, but that proved to be too confusing for what it was worth, so this test contains similar logic to the pre-existing tests. It might be worth a glance, but there's not much new here.

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2016

Member

This looks fine to me. I agree that it'd be nice to clean up a bit, but I don't think that's the most efficient use of your time.

# out to the same shape as `base_data`, then compute column-wise. This
# is efficient because each column of the broadcasted array only refers
# to a single memory location.
target_data = broadcast_arrays(target_data, base_data)[0]

This comment has been minimized.

@dmichalowicz

dmichalowicz Jul 19, 2016

Contributor

@ssanderson I'm using broadcast_arrays here (and below) because broadcast_to was not introduced until numpy version 1.10.0

@dmichalowicz dmichalowicz force-pushed the sid-to-sid-2 branch from c9058e8 to 0df579d Jul 19, 2016

@dmichalowicz dmichalowicz force-pushed the sid-to-sid-2 branch from 0df579d to a8486c5 Jul 19, 2016

@coveralls

This comment has been minimized.

coveralls commented Jul 19, 2016

Coverage Status

Coverage increased (+0.01%) to 83.691% when pulling a8486c5 on sid-to-sid-2 into dffdf0a on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 19, 2016

Coverage Status

Coverage increased (+0.01%) to 83.691% when pulling a8486c5 on sid-to-sid-2 into dffdf0a on master.

@dmichalowicz dmichalowicz merged commit b364b0f into master Jul 19, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dmichalowicz dmichalowicz deleted the sid-to-sid-2 branch Jul 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment