Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
robust.mad is not being computed correctly or is non-standard definition; it returns the median #658
The statsmodel median absolute deviation function robust.mad in not correct in my opinion. It returns the median. The function stand_mad returns the mad. Maybe this is intentional.
Here is some example code with its output.
import numpy as np
print 'mu, sigma ', mu, sigma
print 'statsmodel MAD: ', sm.mad(x)
x = np.asarray(x)
c = Gaussian.ppf(3.0/4.0)
print 'MAD to rms ratio: ', c, 1.0/c
m = np.median(x)
print 'my MAD: ', mymad
mu, sigma 100.0 15.0
mu, sigma are the test gaussian data with mean 100 and sigma=15.0
mad returns the median
I propose that stand_mad is deprecated and mad is replaced with the code
mad is used when
std_mad is the usual standalone definition, when we don't have an assumption on
*) actually, after checking, I'm not sure because it's applied to
I don't know if there are better names, Skipper implemented this from Huber's book.
2 bug reports means we need to do something here
I had a statsmodels.stats.robust_descriptive in work (before I got distracted with release problems), that focuses on robust measures for skew and kurtosis, but could/should also get robust standard deviation.
(barely related: a collection of measures comparing two arrays http://statsmodels.sourceforge.net/devel/tools.html#measure-for-fit-performance-eval-measures
Probably we just combine both via a keyword argument of mad (with a deprecation warning). E.g., like in R,
and we change RLM internally to mad(x, 0) or whatever.
The keyword wouldn't go to RLM. The user doesn't need to control this. I was just thinking internally to change the call to