New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Pyfolio integration #227
Conversation
Oh, I forgot. To properly compute the portfolio returns I needed to know the length of the periods. Currently they are just integers and it is not known the units of those integers. So I changed the period column names from a simple integer to a pd.Timedelta string representation (1D, 5H, etc) |
I'll go through pep8 issues tomorrow. |
9ce1247
to
2511db4
Compare
Ok, finished with pep8 issues. This PR is ready |
alphalens/performance.py
Outdated
Period over which to calculate the turnover | ||
if 'period' is a string it must follow pandas.Timedelta constructor | ||
format (.e.g '1 days', '1D', '30m', '3h', '1D1h', etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.e.g
-> e.g.
This looks great! |
The PR is ready to be merged but I expect some issues with pyfolio as I haven't done extensive testing. Anyway I don't expect regressions with existing functionalities and I actually expect better compatibility with non-daily factors and event studies. |
ac5e19d
to
7090c26
Compare
@twiecki I am going to merge this if you agree |
alphalens/utils.py
Outdated
|
||
def timedelta_to_string(timedelta): | ||
""" | ||
Utility that converts a pandas.Timedelta to a string rappresentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks :)
return columns[valid_columns] | ||
|
||
|
||
def timedelta_to_string(timedelta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that really not exist inside pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no, the one in pandas gives back the verbose string. e.g. '1 days 00:00:00', which doesn't suit the column headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
7090c26
to
b0e8e19
Compare
Does this change existing behavior? |
No, all scenarios that were working before are working now + the current behaviour does a better job in handling factors with gaps (when values are not computed every day, e.g. event study) and intraday factors. The only user level visible change is the period column names that are now named "1D", "3D" (1 day, 3 days or "1H", 3H" for 1 hour, 3 hours, etc. ) instead of "1", "3". |
OK, great. And users can now also have arbitrary periods, like hours. I haven't reviewed the code carefully but I'm fine merging this as an experimental feature. |
Actually, let's see if @richafrank has someone who could do a code review. |
Actually arbitrary periods are possible even without this PR, for example a daily factor with 1 hours period or a weekly factor (computed on Monday for example) with a period of 3 days. With this kind of factors the results are computed correctly even though the turnover analysing is not so interesting, but the cumulative returns are wrong and this prevent the pyfolio analysis, this is why I fixed it in this PR and, as I was already fixing the code, I made sure the cumulative returns is correct with factors not computed at a fixed frequency, e.g. event study |
b0e8e19
to
969235d
Compare
@luca-s @twiecki this looks neat! I had a couple questions around the UX:
|
You are right @abhijeetkalyan , the name is confusing a little. The reason is that I initially thought to return |
969235d
to
3d1a6db
Compare
I added an example NB and also I made |
3d1a6db
to
e5bb76b
Compare
7b90c26
to
8469794
Compare
I also added a warning in
I believe this PR is now ready to be merged |
48b6b50
to
2eb6372
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is awesome! I'm a big fan of the movement from integers to Timedelta strings for period. Is it possible to keep the default values on the period
parameters? I'm not sure what makes sense as a default period, '1D', maybe?
Also, it looks like https://github.com/quantopian/alphalens/blob/master/alphalens/examples/tear_sheet_walk_through.ipynb uses many of the methods that have interface changes to. The notebook should probably be updated to use the new form of the methods.
alphalens/performance.py
Outdated
@@ -68,7 +68,7 @@ def src_ic(group): | |||
grouper.append('group') | |||
|
|||
ic = factor_data.groupby(grouper).apply(src_ic) | |||
ic.columns = pd.Int64Index(ic.columns) | |||
ic.columns = pd.Index(ic.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ic.columns
is already an index, this line and https://github.com/quantopian/alphalens/pull/227/files#diff-39e40ed256ddc7619de160850cb6f6a2R125 can probably be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are right :)
alphalens/utils.py
Outdated
|
||
|
||
def std_conversion(period_std): | ||
def std_conversion(period_std, one_period_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one_period_len
isn't in the docstring. Looks like it should be a string in a Timedelta
format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right!
@prsutherland thanks for reminding me about the walk through NB, I completely forgot about that. Regarding the default values on the |
ff3f129
to
4784baa
Compare
805c9f1
to
0518cb0
Compare
Unless there are other comments I am going to merge this PR soon. I only need to finish updating the tear_sheet_walk_through NB. |
Most of Alphalens functions are now factor frequency agnostic (the factor can be computed daily, weekly, randomly) and they don't assume forward returns periods to be multiple of the frequency at which a factor is computed, but there are still some functions that rely on those assumptions. This commit addresses those limitations. The main function which requires a fix is 'cumulative_returns', which assumes returns to be computed at a specific frequency (daily, weekly, etc) without gaps (e.g. an event which can happend randomly doesn't work well with current implementation) and the return periods must be multiple of factor computation frequency. In order to remove those limitation from 'cumulative_returns' the period length for which the returns are computed is required and for this reason is now included in the forward columns name, which now contains the period lenght (e.g. 1 day, is now '1D', 3 days is now '3D', 2 hours '2h' and so on). Other small fixed are included in this commit which are related to the removal of the above assumptions. This commit also add tests for performance.cumulative_returns.
0518cb0
to
bf7c1a8
Compare
Make sure the NB reflect the new API
Looks great, @luca-s. Thanks! |
Issue #225
Added performance.create_pyfolio_input, which create input suitable for pyfolio. Intended use:
Also, I greatly improved the function that computes the cumulative returns as that is the base on which the pyfolio integration feature is built on. Mainly I removed the legacy assumptions which required the factor to be computed at specific frequency (daily, weekly etc) and also the periods had to be multiple of this frequency (2 days, 4 days etc). The function is now able to properly compute returns when there are gaps in the factor data or when we analyze an intraday factor.