Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Pyfolio integration #227

Merged
merged 6 commits into from Dec 19, 2017
Merged

Conversation

luca-s
Copy link
Collaborator

@luca-s luca-s commented Dec 7, 2017

Issue #225

Added performance.create_pyfolio_input, which create input suitable for pyfolio. Intended use:

factor_data = alphales.utils.get_clean_factor_and_forward_returns(factor, prices)

pf_returns = alphales.performance.create_pyfolio_input(factor_data,'1D',
                     long_short=True,
                     group_neutral=False,
                     quantiles=None,
                     groups=None)

pyfolio.tears.create_returns_tear_sheet(pf_returns)

Also, I greatly improved the function that computes the cumulative returns as that is the base on which the pyfolio integration feature is built on. Mainly I removed the legacy assumptions which required the factor to be computed at specific frequency (daily, weekly etc) and also the periods had to be multiple of this frequency (2 days, 4 days etc). The function is now able to properly compute returns when there are gaps in the factor data or when we analyze an intraday factor.

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 7, 2017

Oh, I forgot. To properly compute the portfolio returns I needed to know the length of the periods. Currently they are just integers and it is not known the units of those integers. So I changed the period column names from a simple integer to a pd.Timedelta string representation (1D, 5H, etc)

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 7, 2017

I'll go through pep8 issues tomorrow.

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 8, 2017

Ok, finished with pep8 issues. This PR is ready

Period over which to calculate the turnover
if 'period' is a string it must follow pandas.Timedelta constructor
format (.e.g '1 days', '1D', '30m', '3h', '1D1h', etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.e.g -> e.g.

@twiecki
Copy link
Contributor

twiecki commented Dec 8, 2017

This looks great!

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 8, 2017

The PR is ready to be merged but I expect some issues with pyfolio as I haven't done extensive testing. Anyway I don't expect regressions with existing functionalities and I actually expect better compatibility with non-daily factors and event studies.

@luca-s luca-s changed the title Pyfolio integration ENH Pyfolio integration Dec 10, 2017
@luca-s luca-s force-pushed the pyfolio_integration branch 5 times, most recently from ac5e19d to 7090c26 Compare December 12, 2017 09:41
@luca-s
Copy link
Collaborator Author

luca-s commented Dec 12, 2017

@twiecki I am going to merge this if you agree


def timedelta_to_string(timedelta):
"""
Utility that converts a pandas.Timedelta to a string rappresentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

representation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks :)

return columns[valid_columns]


def timedelta_to_string(timedelta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that really not exist inside pandas?

Copy link
Collaborator Author

@luca-s luca-s Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no, the one in pandas gives back the verbose string. e.g. '1 days 00:00:00', which doesn't suit the column headers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

@twiecki
Copy link
Contributor

twiecki commented Dec 12, 2017

Does this change existing behavior?

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 12, 2017

No, all scenarios that were working before are working now + the current behaviour does a better job in handling factors with gaps (when values are not computed every day, e.g. event study) and intraday factors.

The only user level visible change is the period column names that are now named "1D", "3D" (1 day, 3 days or "1H", 3H" for 1 hour, 3 hours, etc. ) instead of "1", "3".

@twiecki
Copy link
Contributor

twiecki commented Dec 12, 2017

OK, great. And users can now also have arbitrary periods, like hours.

I haven't reviewed the code carefully but I'm fine merging this as an experimental feature.

@twiecki
Copy link
Contributor

twiecki commented Dec 12, 2017

Actually, let's see if @richafrank has someone who could do a code review.

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 12, 2017

Actually arbitrary periods are possible even without this PR, for example a daily factor with 1 hours period or a weekly factor (computed on Monday for example) with a period of 3 days. With this kind of factors the results are computed correctly even though the turnover analysing is not so interesting, but the cumulative returns are wrong and this prevent the pyfolio analysis, this is why I fixed it in this PR and, as I was already fixing the code, I made sure the cumulative returns is correct with factors not computed at a fixed frequency, e.g. event study

@abhijeetkalyan
Copy link

@luca-s @twiecki this looks neat! I had a couple questions around the UX:

  • The name confused me a bit. When I saw create_pyfolio_input, I expected it to have positions and transactions as well. Since the function's output is the return stream, should we change its name to clarify that it's providing only returns, and not all of the other inputs that pyfolio consumes?
  • If this is still an experimental feature as @twiecki mentioned in ENH Pyfolio integration #227 (comment), should we add some kind of warning to inform users of that fact?

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 13, 2017

You are right @abhijeetkalyan , the name is confusing a little. The reason is that I initially thought to return positions and transactions as well but I haven't managed to do that yet. I am actively working on this so my plan is to improve create_pyfolio_input to make it returns positions and transactions in another PR. I hope to push that PR soon together with an example NB. If I manage to do that before we create a new tag I would avoid the warning, otherwise if we create a tag and release a new version of Alphalens before I finish working on this feature I will add a warning message. Sounds like a plan?

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 14, 2017

I added an example NB and also I made create_pyfolio_input returns the benchmark returns too (as the factor universe mean returns).

@luca-s luca-s force-pushed the pyfolio_integration branch 3 times, most recently from 7b90c26 to 8469794 Compare December 14, 2017 12:50
@luca-s
Copy link
Collaborator Author

luca-s commented Dec 14, 2017

I also added a warning in create_pyfolio_input docstring:

WARNING: this API is still in experimental phase and input/output
paramenters might change in the future

I believe this PR is now ready to be merged

Copy link

@prsutherland prsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is awesome! I'm a big fan of the movement from integers to Timedelta strings for period. Is it possible to keep the default values on the period parameters? I'm not sure what makes sense as a default period, '1D', maybe?

Also, it looks like https://github.com/quantopian/alphalens/blob/master/alphalens/examples/tear_sheet_walk_through.ipynb uses many of the methods that have interface changes to. The notebook should probably be updated to use the new form of the methods.

@@ -68,7 +68,7 @@ def src_ic(group):
grouper.append('group')

ic = factor_data.groupby(grouper).apply(src_ic)
ic.columns = pd.Int64Index(ic.columns)
ic.columns = pd.Index(ic.columns)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ic.columns is already an index, this line and https://github.com/quantopian/alphalens/pull/227/files#diff-39e40ed256ddc7619de160850cb6f6a2R125 can probably be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are right :)



def std_conversion(period_std):
def std_conversion(period_std, one_period_len):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one_period_len isn't in the docstring. Looks like it should be a string in a Timedelta format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right!

@luca-s
Copy link
Collaborator Author

luca-s commented Dec 14, 2017

@prsutherland thanks for reminding me about the walk through NB, I completely forgot about that. Regarding the default values on the period parameters I am not sure... If I find a sensible default I would restore them.
The movement from integers to Timedelta strings for period was needed to carry information that otherwise would have been lost but some thorough testing is now required because this change could lead to some hidden and non-obvious bugs. Sometimes I think get_clean_factor_and_forward_returns should return an object where we can store all the information we want together with factor_data instead of trying to save the information we need inside the DataFrame column names...but for now let's keep it this way.

@luca-s luca-s force-pushed the pyfolio_integration branch 4 times, most recently from ff3f129 to 4784baa Compare December 19, 2017 09:32
@luca-s luca-s force-pushed the pyfolio_integration branch 2 times, most recently from 805c9f1 to 0518cb0 Compare December 19, 2017 20:55
@luca-s
Copy link
Collaborator Author

luca-s commented Dec 19, 2017

Unless there are other comments I am going to merge this PR soon. I only need to finish updating the tear_sheet_walk_through NB.

Most of Alphalens functions are now factor frequency agnostic (the
factor can be computed daily, weekly, randomly) and they don't assume
forward returns periods to be multiple of the frequency at which a
factor is computed, but there are still some functions that rely on
those assumptions. This commit  addresses those limitations.

The main function which requires a fix is 'cumulative_returns', which
assumes returns to be computed at a specific frequency (daily, weekly,
etc) without  gaps (e.g. an event which can happend randomly doesn't
work well with current implementation) and the return periods must be
multiple of factor computation frequency.

In order to remove those limitation from 'cumulative_returns' the period
length for which the returns are computed is required and for this
reason is now included in the forward columns name, which now contains
the period lenght (e.g. 1 day, is now '1D', 3 days is now '3D', 2 hours
'2h' and so on).

Other small fixed are included in this commit which are related to
the removal of the above assumptions.

This commit also add tests for performance.cumulative_returns.
Make sure the NB reflect the new API
@twiecki
Copy link
Contributor

twiecki commented Dec 19, 2017

Looks great, @luca-s. Thanks!

@twiecki twiecki merged commit b2b4667 into quantopian:master Dec 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants