Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use ddof=0 and nanstd, remove skipped test #36

Closed
wants to merge 1 commit into from
Closed

Conversation

analicia
Copy link
Contributor

Empyrical is not generally used for the sample of a large population. Change the degrees of freedom(ddof) to 0 because that is how the standard deviation of a population is calculated. Regressive tests have been updated to reflect change in method.

Use nanstd instead of np.std for consistency.

There was a flapping test without sound logic that is being removed as well.

@twiecki
Copy link
Contributor

twiecki commented Nov 23, 2016

Why would we have the whole population here? There are ever only samples.

@ssanderson
Copy link

@twiecki this came up in the context of adding a Volatility built-in Factor in Zipline.

The question I'm unsure of here is what we're considering as "the population" when we're calculating the volatility of a returns timeseries. It's straightforward that we should use ddof=1 if, for example, we were estimating the volatility of returns of all assets by taking a random sample of 1000 assets.

In this case, however, one could argue that we have "the whole population", in the sense that we have a data point for every day that we're interested in. In particular, for returns, I think question is whether the mean of the daily returns for some asset can ever deviate from the mean of the returns measured at some finer granularity, since the source of bias corrected by using (n - 1) in the denominator here is that we have to estimate population variance using an estimate of the population mean.

@CaptainKanuk @mmargenot your thoughts here would also be helpful.

@ssanderson
Copy link

Another way of putting the above is to observe that daily returns aren't samples from an underlying continuous returns series, they're aggregations of an underlying continuous series. This is in contrast, for example, with estimating the standard deviation of a stock's price by looking at end-of-day data, which amounts to taking regular samples from a much larger population. In the latter case, I think ddof=1 would be clearly justified.

@CaptainKanuk
Copy link

Before we start, this is a good wiki page. https://en.wikipedia.org/wiki/Bessel's_correction

My take is that it actually depends on how the user interprets the result of the computation, which is tricky. If we are trying to measure/estimate the mean daily return over a fixed time period and have data for that entire time period (no NaNs), then we have all the data for the statistic we are trying to measure and it should be ddof=0 (not applying Bessel's correction).

If we are trying to measure the inherent volatility of an asset, then as Max pointed out in Slack we don't actually know the overall mu of the data generating process (DGP) which drives the returns. As such we don't know the population mu and are just sampling. This is in my experience the more common interpretation of vol, and is an interpretation that people often use without realizing it. Another case is that if our data has some missing elements, then we don't have the full population and should use ddof=1.

I say ddof=1 for the following reasons, but agree it isn't clear cut.

  • Because people will more often assume that the resulting std is indicative of some underlying asset property and therefore require ddof=1.

  • Between the two ddof=1 will uniformly overestimate volatility and risk and is therefore a safer choice in my opinion.

@mmargenot
Copy link

Seconding Delaney, mainly on the premise that any more granular return series will still be a sample of the DGP. We can't know the true movement of the underlying without divine inspiration.

@analicia analicia closed this Nov 28, 2016
@analicia analicia deleted the ddof-0 branch November 28, 2016 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants