Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A collection of robust statistical tests in Julia
Julia C
branch: master
Failed to load latest commit information.
src Update RobustStats.jl
test
.gitignore
LICENSE
README.md Add a Bitdeli badge to README
REQUIRE Update REQUIRE

README.md

RobustStats

Most functions in this file are robust statistical methods based on the R package WRS (R-Forge repository) by Rand Wilcox. Only a handful of functions are included at this point. More will be added soon.

In order to use the functions, Rmath, Stats, Dataframes, Distributions, and Winston are required. They can be installed by invoking Pkg.add("packagename").

Examples

#Set up a sample dataset:
x=[1.672064, 0.7876588, 0.317322, 0.9721646, 0.4004206, 1.665123, 3.059971, 0.09459603, 1.27424, 3.522148, 
   0.8211308, 1.328767, 2.825956, 0.1102891, 0.06314285, 2.59152, 8.624108, 0.6516885, 5.770285, 0.5154299]

julia> mean(x)     #the mean of this dataset
1.853401259

1. tmean: trimmed mean

julia> tmean(x)            #20% trimming by default
1.2921802666666669

julia> tmean(x, tr=0)      #no trimming; the same as the output of mean()
1.853401259

julia> tmean(x, tr=0.3)    #30% trimming
1.1466045875000002

julia> tmean(x, tr=0.5)    #50% trimming, which gives you the median of the dataset.
1.1232023

2. winval: winsorize data

julia> winval(x)           #20% winsorization; can be changed via the named argument `tr`.
20-element Any Array:
 1.67206 
 0.787659
 0.400421
 0.972165
 ...
 0.651689
 2.82596 
 0.51543 

3. winmean: winsorized mean

julia> winmean(x)          #20% winsorization; can be changed via the named argument `tr`.
1.4205834800000001

4. winvar: winsorized variance

julia> winvar(x)           #20% winsorization; can be changed via the named argument `tr`.
0.998659015947531

5. trimse: estimated standard error of the gamma trimmed mean

julia> trimse(x)           #20% winsorization; can be changed via the named argument `tr`.       
0.3724280347984342

6. trimci: 1-alpha confidence interval for the trimmed mean

Can be used for paired groups if x consists of the difference scores of two paired groups.

julia> trimci(x)                 #20% winsorization; can be changed via the named argument `tr`.       
1-alpha confidence interval for the trimmed mean

Degrees of freedom:   11
Estimate:             1.292180
Statistic:            3.469611
Confidence interval:  0.472472       2.111889
p value:              0.005244

7. stein1: Stein's method

julia> stein1(x, 1)     
41

8. stein2: the second stage of Stein's method.

julia> srand(1)         #set seed
julia> x2=rnorm(21);    #suppose additional 21 data points were collected.
julia> stein2(x, x2)
df:        19
estimate:  0.8441066102423266
confint:   [-3.65043, 5.33865]
statistic: 2.516970580783766
crit:      2.093024054408309
pval:      0.020975586092444765

9. idealf: the ideal fourths:

julia> idealf(x)
The ideal fourths
Lower quartile:  0.448341
Upper quartile:  2.728274

10. pbvar: percentage bend midvariance

julia> pbvar(x)
2.0009575278957623

11. bivar: biweight midvariance

julia> bivar(x)
1.588527039652727

12. tauloc: tau measure of location

(see Yohai and Zamar (JASA, 83, 406-413))

julia> tauloc(x)       #the named argument `cval` is 4.5 by default.
1.2696674487629664

13. tauvar: the tau measure of scale

(see Yohai and Zamar (JASA, 1988, 83, 406-413) and Maronna and Zamar (Technometrics, 2002, 44, 307-317))

julia> tauvar(x)
1.5300804969271078

14. outbox: outlier detection

using a modified boxplot rule based on the ideal fourths; when the named argument mbox is set to true, a modification of the boxplot rule suggested by Carling (2000) is used

julia> outbox(x)     
Outlier detection method using 
the ideal-fourths based boxplot rule

Outlier ID:         17                                                        
Outlier value:      8.62411
Number of outliers: 1
Non-outlier ID:     1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20

15. msmedse:

Computing standard error of the median using a method recommended by McKean and Shrader (1984)

julia> msmedse(x)
0.4708261134886094

16. binomci:

Compute a 1-alpha confidence interval for p, the probability of success for a binomial dist. using Pratt's method y is a vector of 1's and 0's, x is the number of successes observed among n trials.

julia> binomci(2, 10)           #when number of success and number of total trials are provided. By default alpha=.05
phat:                0.2000
confidence interval: 0.0274   0.5562
Sample size          10


julia> trials=[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]
julia> binomci(trials, 0.01)    #trial results are provided in array form consisting of 1's and 0's. Alpha=.01
phat:    0.5
confint: [0.176811, 0.849496]
n:       12

17. akerd:

Compute adaptive kernel density estimate for univariate data (See Silverman, 1986)

julia> akerd(x, title="Lognormal Distribution", xlab="x", ylab="Density")

plot

julia> srand(10)
julia> x3=rnorm(100, 1, 2);
julia> akerd(x3, title="Normal Distribution; mu=1, sd=2", xlab="x", ylab="Density", color="red", plottype="dash")

plot

18. indirectTest:

This function is adapted from Andrew Hayes' SPSS macro, which evaluates indirect/mediation effects via percentile bootstrap and the Sobel Test.

julia> srand(1)
julia> m = randn(20);        #Mediator
julia> srand(2)             
julia> y = randn(20) + 2.0;  #Outcome variable

julia> indirectTest(y, x, m)   #5000 bootstrap samples by default
TESTS OF INDIRECT EFFECT

Sample size: 20
Number of bootstrap samples: 5000

DIRECT AND TOTAL EFFECTS
          Estimate  Std.Error     t value   Pr(>|t|)
b(YX):    0.125248   0.087729    1.427664   0.170508
b(MX):    0.082156   0.106198    0.773611   0.449202
b(YM.X):  0.140675   0.089359    1.574264   0.133852
b(YX.M): -0.187775   0.195111   -0.962402   0.349338

INDIRECT EFFECT AND SIGNIFICANCE USING NORMAL DISTRIBUTION
          Estimate  Std.Error     z score      CI lo      CI hi  Pr(>|z|) 
Sobel:   -0.015427   0.019161   -0.805126  -0.052981   0.022128  0.420747

BOOTSTRAP RESULTS OF INDIRECT EFFECT
          Estimate  Std.Error       CI lo      CI hi    P value 
Effect:  -0.048598   0.084280   -0.298647   0.019394   0.420800

if we add plotit=true, we get a kernel density plot of the effects derived from bootstrap samples:

plot

18. sint()

Computing the confidence interval for the median.

julia> sint(x)
Confidence interval for the median

 Confidence interval:  0.547483       2.375232

19. acbinomci()

Compute a 1-alpha confidence interval for p, the probability of success for a binomial dist using a generalization of the Agresti-Coull method that was studied by Brown, Cai DasGupta.

julia> acbinomci(2, 10)           #when number of success and number of total trials are provided. By default alpha=.05
phat:                0.2000
confidence interval: 0.0459   0.5206
Sample size          10

21. stein1_tr()

Extension of Stein's method based on the trimmed mean.

julia> stein1_tr(x, 0.2)
89

22. stein2_tr()

Extension of the 2nd stage of Stein's method based on the trimmed mean.

julia> stein2_tr(x, x2)
Extension of the 2nd stage of Stein's method based on the trimmed mean

Statistic:            3.154993
Critical value:       2.200985

This function is able to handle multiple dependent groups. Suppose that the original dataset contained 4 dependent groups and the sample size is 5 (xold), and we collected more data (xnew).

julia> srand(2)
julia> xnew = rand(6, 4)
julia> xold = reshape(x, 5, 4)
julia> stein2_tr(xold, xnew)

Extension of the 2nd stage of Stein's method based on the trimmed mean

 Statistic             Group 1  Group 2  Statistic
                             1        2  -0.933245
                             1        3   0.291014
                             1        4  -0.949618
                             2        3   1.212577
                             2        4  -0.608510
                             3        4  -0.649426
 Critical value:       10.885867

23. sintv2()

Confidence interval for the median using the Hettmansperger-Sheather interpolation method.

julia> sintv2(x)
Confidence interval for the median
using the Hettmansperger-Sheather interpolation method

 Confidence interval:  0.547483       2.375232
 p value:              0.000100

24. onestep()

Compute one-step M-estimator of location using Huber's Psi. The default bending constant is 1.28.

julia> onestep(x)
1.384070801414857

25. onesampb()

Compute a bootstrap, .95 confidence interval for the measure of location corresponding to the argument est. By default, a one-step M-estimator of location based on Huber's Psi is used. The default number of bootstrap samples is nboot=2000. nv=0 when computing a p-value.

julia> onesampb(x)
Estimate:             1.384071
Confidence interval:  0.760978       2.378872
p value:             < 10e-16

26. mom()

Compute MOM-estimator of location. The default bending constant is 2.24

julia> mom(x)
1.2596462322222222

27. momci()

Compute the bootstrap .95 confidence interval for the MOM-estimator of location based on Huber's Psi.

julia> momci(x, nboot=2)
Bootstrap .95 confidence interval for the MOM-
estimator of location based on Huber's Psi

 Confidence interval:  0.652212       1.684510

28. cnorm()

Create contaminated normal distributions.

julia> srand(1)
julia> c = cnorm(2000)
julia> akerd(c)

plot

29. trimpb()

Compute a 1-alpha confidence interval for a trimmed mean. The default number of bootstrap samples is nboot=2000. win is the amount of Winsorizing before bootstrapping when WIN=true.

plotit=true gives a plot of the bootstrap values

  • pop=1 results in the expected frequency curve.
  • pop=2 kernel density estimate NOT IMPLEMENTED
  • pop=3 boxplot NOT IMPLEMENTED
  • pop=4 stem-and-leaf NOT IMPLEMENTED
  • pop=5 histogram
  • pop=6 adaptive kernel density estimate.

fr controls the amount of smoothing when plotting the bootstrap values via the function rdplot. fr=NaN means the function will use fr=.8 (When plotting bivariate data, rdplot uses fr=.6 by default.)

julia> trimpb(x, plotit=true, pop=6)
Compute a 1-alpha confidence interval for a trimmed mean using the bootstrap percentile method.

 Confidence interval:  0.708236       2.253489
 p value:             < 10e-16

plot

30. trimcibt()

Compute a 1-alpha confidence interval for the trimmed mean using a bootstrap percentile t method. The default amount of trimming is tr=.2. side=true, indicates the symmetric two-sided method. side=false yields an equal-tailed confidence interval. NOTE: p.value is reported when side=true only.

julia> trimcibt(x, nboot=5000, plotit=true)
Bootstrap .95 confidence interval for the trimmed mean
using a bootstrap percentile t method

 Estimate:             1.292180
 Statistic:            3.469611
 Confidence interval:  0.292162       2.292199
 p value:              0.022600

plot

31. pcorb()

Compute a .95 confidence interval for Pearson's correlation coefficient. This function uses an adjusted percentile bootstrap method that gives good results when the error term is heteroscedastic.

julia> pcorb(x, y)
Estimate:             0.318931
Confidence interval: -0.106467       0.663678

32. yuend()

Compare the trimmed means of two dependent random variables using the data in x and y. The default amount of trimming is 20%.

julia> srand(3)
julia> y2 = randn(20)+3; 
julia> yuend(x, y2)

Comparing the trimmed means of two dependent variables.

Sample size:          20
Degrees of freedom:   11
Estimate:            -1.547776
Standard error:       0.460304
Statistic:           -3.362507
Confidence interval: -2.560898      -0.534653
p value:              0.006336

32. t1way()

A heteroscedastic one-way ANOVA for trimmed means using a generalization of Welch's method. When tr=0, the function conducts a heteroscedastic 1-way ANOVA without trimming.

There are two ways to specify data for the function:

  • A two dimensional array where each column represents a group. Hence, a 10 X 3 array means that a one way ANOVA will be performed on 3 groups. This also means equal sample sizes amongst the groups.
  • A vector containing the outcome variable and another vector containing group information.

Two examples are shown below to demonstrate the two ways of specifying data:

#Data in two dimensional array form
#Prepare data
julia> srand(12)
julia> m2 = reshape(sort(randn(30)), 10, 3); 

julia> t1way(m2)
Heteroscedastic one-way ANOVA for trimmed means
using a generalization of Welch's method.

Sample size:          10   10   10  
Degrees of freedom:   2.00   8.25  
Statistic:            20.955146
p value:              0.000583


#Data in vector form
julia> srand(12)
julia> m3 = sort(randn(30));
julia> group = rep(1:3, [8,12,10]);   #Unequal sample sizes: n1 = 8, n2 = 12, n3 = 10

julia> t1way(m3, group)
Heteroscedastic one-way ANOVA for trimmed means
using a generalization of Welch's method.

Sample size:          8   12   10  
Degrees of freedom:   2.00   8.11  
Statistic:            28.990510
p value:              0.000202

Bitdeli Badge

Something went wrong with that request. Please try again.