# robust.mad is not being computed correctly or is non-standard definition; it returns the median #658

Closed
opened this issue Feb 10, 2013 · 10 comments

Projects
None yet
3 participants

### richardgmcmahon commented Feb 10, 2013

Hello,

The statsmodel median absolute deviation function robust.mad in not correct in my opinion. It returns the median. The function stand_mad returns the mad. Maybe this is intentional.

Here is some example code with its output.

import numpy as np
import statsmodels.robust.scale as sm
from scipy.stats import norm as Gaussian

mu=100.0
sigma=15.0
x = np.random.normal(mu, sigma, 10000)

print 'mu, sigma ', mu, sigma

# x = np.asarray(x)

c = Gaussian.ppf(3.0/4.0)

print 'MAD to rms ratio: ', c, 1.0/c

m = np.median(x)
mymad = np.median(np.fabs(m - x) / c)

print 'median(x), median(x)/c: ', np.median(x), np.median(x)/c

mu, sigma 100.0 15.0
MAD to rms ratio: 0.674489750196 1.48260221851
median(x), median(x)/c: 100.035163825 148.312355816

mu, sigma are the test gaussian data with mean 100 and sigma=15.0

I propose that stand_mad is deprecated and mad is replaced with the code

Member

### josef-pkt commented Feb 10, 2013

 My reading: mad is used when `mu` is assumed to be zero, The main current use is in RLM applied to the residuals, when the mean is automatically zero *), `mad` is currently the default for RLM std_mad is the usual standalone definition, when we don't have an assumption on `mu`. *) actually, after checking, I'm not sure because it's applied to `resid` not the weighted residuals `wresid`. I don't know if there are better names, Skipper implemented this from Huber's book.
Member

### jseabold commented Feb 10, 2013

 These functions were leftover from Jonathan's work IIRC. I'll have to check on the proposal vs. what we need for RLM, but I'm sympathetic to the confusion.

Closed

Member

### josef-pkt commented Oct 9, 2013

 2 bug reports means we need to do something here I had a statsmodels.stats.robust_descriptive in work (before I got distracted with release problems), that focuses on robust measures for skew and kurtosis, but could/should also get robust standard deviation. (barely related: a collection of measures comparing two arrays http://statsmodels.sourceforge.net/devel/tools.html#measure-for-fit-performance-eval-measures I don't remember why it ended up in tools )
Member

### jseabold commented Oct 9, 2013

 A PR to change this to standard (non-confusing) usage would be very welcome.
Member

### josef-pkt commented Oct 9, 2013

 Looking at the code, we already have `stand_mad` Subtracting the median is the only difference between `mad` and `stand_mad`, AFAICS.
Member

### jseabold commented Oct 9, 2013

 Yes. `stand_mad` is what you should use if want traditional MAD on an existing univariate random variable. Our `mad` is used internally on the residuals in RLM for a consistent estimator for the std error of the regression IIRC (and is also what's used by default in rlm in MASS in R and likely in Huber's book for RLM). Probably we just combine both via a keyword argument of mad (with a deprecation warning). E.g., like in R, mad(x, center=np.median) and we change RLM internally to mad(x, 0) or whatever. (where's np.nanmedian?)
Member

### josef-pkt commented Oct 9, 2013

 We need to check if the call to RLM doesn't get messier with an extra keyword. Improving the docstring of `mad` to point to `stand_mad` and to mention the difference might be enough to direct users to the function that they want. ?
Member

### jseabold commented Oct 9, 2013

 The keyword wouldn't go to RLM. The user doesn't need to control this. I was just thinking internally to change the call to `mad(resid, center=0)`. And change `mad` to take a variable, a center, and a correction number in addition to just the variable and correction keyword now. Then we just get rid of stand_mad.
Member

### josef-pkt commented Oct 9, 2013

 Yes, looks fine. I just checked: RLM uses a string `'mad'` or `'stand_mad'` which is independent of the call signature for the mad function.
Member

### jseabold commented Oct 9, 2013

 Yeah, I'm just removing the ability to use stand_mad.

### jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 9, 2013

``` REF: Deprecate stand_mad. Add center keyword to mad. Closes statsmode… ```
`…ls#658.`
``` 9415057 ```

### jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 10, 2013

``` REF: Deprecate stand_mad. Add center keyword to mad. Closes statsmode… ```
`…ls#658.`
``` dd5cb3e ```

### jseabold added a commit that referenced this issue Oct 23, 2013

``` Merge pull request #1107 from jseabold/robust-mad-refactor ```
`REF: Deprecate stand_mad. Add center keyword to mad. Closes #658.`
``` ea61f86 ```

### PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

``` REF: Deprecate stand_mad. Add center keyword to mad. Closes statsmode… ```
`…ls#658.`
``` 4a9b24f ```

### PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

``` Merge pull request statsmodels#1107 from jseabold/robust-mad-refactor ```
`REF: Deprecate stand_mad. Add center keyword to mad. Closes statsmodels#658.`
``` 7db00f5 ```