New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk tear sheet #391

Merged
merged 57 commits into from Jul 11, 2017

Conversation

Projects
None yet
4 participants
@eigenfoo
Contributor

eigenfoo commented Jun 12, 2017

Risk tear sheet: currently plots style factor weighted exposures, volume exposures, sector exposures.

To add: P&L attribution.

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jun 13, 2017

Basic risk plots added, with docstrings. Unit tests performed. Attached are screenshots illustrating an example of a risk tear sheet.

EDIT - uploaded latest example risk tear sheet 7/8/17

download

@eigenfoo eigenfoo requested review from twiecki and gusgordon Jun 13, 2017

@eigenfoo eigenfoo self-assigned this Jun 13, 2017

eigenfoo added some commits Jun 13, 2017

@eigenfoo eigenfoo changed the title from Risk tear sheet to Risk, perf attrib tear sheet Jun 22, 2017

2017-04-05 311.0 206.0
caps : pd.DataFrame
Daily Morningstar sector code per asset

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

Comment needs fixing

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

Fixed!

2017-04-05 -0.90132 1.13981
sector : pd.DataFrame
Daily Morningstar sector code per asset

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

What if someone passes non-morningstar data? Maybe there's a way where they can pass strings instead of numbers, then we can plot the strings if so? Could also make that the default, then convert numbers to strings on our end before calling the tear sheet.

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

Made functions so that users can optionally pass in a sector_dict that lists custom sector codes (keys, e.g. ints or strings) to sector names (values, must be strings). If no sector_dict is passed in, it defaults to Morningstar sector mappings.

return longed_threshold, shorted_threshold, grossed_threshold
def plot_volume_exposures_longshort(longed_threshold, shorted_threshold,

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

Should this belong in the risk tear sheet, and not somewhere else? I'm also curious about why the number of shares transacted is useful to us, since typically, dollars traded is much more useful.

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

You might have missed the meeting: we take the number of shares held at the end of the day, divided by the total volume. That gives us a number per stock per day. We find the 10th percentile (i.e. the most illiquid) and plot that as a time series. So it is a measure of how exposed our portfolio is to illiquid stocks.

return ax
def plot_volume_exposures_gross(grossed_threshold, percentile, ax=None):

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

Same for this - I might have missed discussion where you guys talked about volume exposures?

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

See comment above

ax.plot(tot_sfe.index, tot_sfe, label=factor_name)
avg = tot_sfe.mean()
ax.axhline(avg, linestyle='-.', label='Mean = {:.3}'.format(avg))
ax.axhline(0, color='k', linestyle='-')

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

I'm still not convinced the best way to plot these is zoomed in with the 0 line bolded. Although the zoom is definitely nice, we aren't typically concerned with micro-movements of the signal and are more interesting in swings above of below zero. I think having a more standardized axis will be less confusing, and allow us to go through things more quickly. Do Jonathan or Jess prefer it the way you have it?

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

I can center the zero line, but I don't think sharing y axes will be useful: the scales can be off by an order of magnitude, maybe even more.

Percentile to use when computing and plotting volume exposures.
- Defaults to 10th percentile
'''

This comment has been minimized.

@gusgordon

gusgordon Jun 22, 2017

Contributor

You should also pass positions through check_intraday, like here: https://github.com/quantopian/pyfolio/blob/master/pyfolio/tears.py#L603

Otherwise, for strategies that have no end-of-day holdings, your tear sheet will probably not be able to plot much.

That also requires passing transactions and returns to create_risk_tear_sheet :)

This comment has been minimized.

@eigenfoo

eigenfoo Jun 22, 2017

Contributor

Done! Also did this for create_simple_tear_sheet too...

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jun 22, 2017

Several other comments:

  • Make legends slightly opaque in the stackplots, so we can see the colors associated with the sectors/market caps
  • Modify create_risk_tear_sheet so that the only required input is the positions: if nothing else is passed, simply plot nothing.
  • Make short line in 10th percentile of proportion of volume plot red, not green: easier to see.

eigenfoo added some commits Jun 22, 2017

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jun 23, 2017

Made edits from Gus' feedback: removed the 'Date' x axis labels, and deleted perf_attrib.py. The PR is now ready to merge with master so that the risk tear sheet is up and running. The plan is to add performance attribution at a later date.

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 7, 2017

Finally done! Ready to merge @gusgordon

@twiecki

This comment has been minimized.

Contributor

twiecki commented Jul 7, 2017

I'm not quite done yet.

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 7, 2017

Sorry, didn't realize. Will hold off on merging until further feedback

@plotting_context
def create_risk_tear_sheet(positions,
style_factor_panel=None,
sectors=None,

This comment has been minimized.

@eigenfoo

eigenfoo Jul 9, 2017

Contributor

Bug: what if the input data do not have the same DateTimeIndexes? It is very likely that the risk metrics data will start before and end later than the positions data. If so, the tear sheet produces misleading graphs and artificially deflated means as below. I.e. the analysis returns 0 until all data inputs have non-zero/non-nan values.

One fix is to only analyze/plot over the time range where all data inputs are non-zero/non-nan. I.e. in create_risk_tear_sheet, mask all data inputs to the intersection of their DateTimeIndexes. @gusgordon @twiecki thoughts?

screen shot 2017-07-08 at 9 42 56 pm

@twiecki

This comment has been minimized.

Contributor

twiecki commented Jul 10, 2017

Some more general thoughts:

  • Before we had utility functions in e.g. positions.py and plotting functions in plotting.py. Here we are grouping the math and plotting functions into the same file. I feel like we should stay consistent. Our options are:
    1. Move all plotting functions into plotting.py
    2. Remove plotting.py and move things into the other files
  • Data format: There main problem we need to solve is that this PR uses pandas.Panel which is deprecated (https://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel). Thus, at a minimum we should change the Panels to multiindex dataframes. Once we have done this we could go one step further and convert all data to long (I realize this is a major refactor) where dt and sid are the index and columns are position, sector, cap, momentum, reversal, shares_held, energy etc. In that world we only have a single data structure to be passed internally, the risk tear sheet could check for the existence of certain columns and plotting functions extract a list of columns they want to plot. We have ultimately switched to the same format in alphalens under a much higher cost because we have done so much later in the game, but it definitely was worth it, as this format is also much easier to extend. I also think we will want to do something similar for the risk model output as well as for the PnL attribution, so the earlier we bite the bullet the better.

Thoughts @georgh0021, @gusgordon?

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 10, 2017

As of right now we have three issues that need addressing:

  1. Non-identical dt indexes produce misleading graphs (pretty major bug)
  2. Moving code around into risk.py and plotting.py (for consistency)
  3. Data format: a) deprecating panels and b) changing to long format (for extendability)

1 and 3a have straightforward solutions: those will be my tasks over the next few days.

Wrt 2, I agree we need to stay consistent, but I think it makes more sense to do away with plotting.py and move all plotting functions into their respective modules (reasons below). If we decide on this approach, nothing needs to be done for this PR, but we'll need to open a new one to break up plotting.py.

  • pyfolio is modularized into separate tear sheets because that's how the user will consume the final product: tear sheet by tear sheet. It doesn't make sense to me to have the functionality of, say, the returns tear sheet split between two different modules.
  • pyfolio plots much more than it computes: plotting.py clocks in at around 2000 lines of code, whereas most other modules barely make 500. Refactoring so that plotting functions immediately follow their corresponding compute functions would make pyfolio much easier to read.

Wrt 3b, the spirit of long data goes way beyond the risk tear sheet: another long term PR would be to refactor all pyfolio code (i.e. from the returns tear sheet up) so that it's long data in, long data out. This will probably have to wait a while.

@gusgordon?

@gusgordon

This comment has been minimized.

Contributor

gusgordon commented Jul 10, 2017

@twiecki @georgh0021 Agree on removing plotting.py and using the new data format. I vote we should merge this PR (once any loose ends are cleaned up), then we can open another PR where all of that refactoring is done. Do you guys think that makes sense?

@twiecki

This comment has been minimized.

Contributor

twiecki commented Jul 10, 2017

Agreed on plotting.py.

I don't agree on 3b. I don't think other parts of pyfolio need to be converted to long, there is only returns (series), positions (wide), and transactions (long), these all seem fine. It's only affecting this PR.

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 10, 2017

Makes sense. I'll see if I can fix 1) by the end of today, and then we can open up a separate PR to refactor. @twiecki does that sound good?

@twiecki

This comment has been minimized.

Contributor

twiecki commented Jul 10, 2017

OK, that's fine.

eigenfoo added some commits Jul 10, 2017

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 10, 2017

Fixed problem 1; ready to merge as soon as builds finish @gusgordon @twiecki

@gusgordon

This comment has been minimized.

Contributor

gusgordon commented Jul 11, 2017

@georgh0021 test failed again for the same reason. I would see if you can reproduce it on your computer. Make sure your branch is up to date and rebased with this one (you could just pull pyfolio into a new folder and checkout this branch).

@eigenfoo

This comment has been minimized.

Contributor

eigenfoo commented Jul 11, 2017

@twiecki there was a bug in test_timeseries.py caused by pandas data reader moving from Yahoo to Google: basically, there was only one drawdown in a test that tested for overlapping drawdowns. It's fixed now and ready to merge (for real this time)

@gusgordon gusgordon merged commit 86f25de into master Jul 11, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@eigenfoo eigenfoo deleted the risk_tear_sheet branch Jul 11, 2017

@eigenfoo eigenfoo referenced this pull request Aug 14, 2017

Closed

New release for pyfolio #414

1 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment