Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If ols_test_outlier() does not find any outliers, it returns largest positive residual instead of largest absolute residual #177

Closed
iankelk opened this issue Apr 17, 2021 · 0 comments
Assignees
Labels

Comments

@iankelk
Copy link

iankelk commented Apr 17, 2021

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem:

If ols_test_outlier() does not find an outlier, it should instead display the largest studentized residual. However, it does not conduct the search for the largest absolute value of the studentized residual, but only displays the largest positive value. In the code located here: https://github.com/rsquaredacademy/olsrr/blob/master/R/ols-outlier-test.R I found the problematic line.

Problem

Because this line only uses the max() function, it doesn't account for larger negative residuals:
31: out <- data_bon[data_bon$stud_resid == max(data_bon$stud_resid), ]

Fix 1

If I change this line to use abs() on both sides of the comparison, it works.

31: out <- data_bon[abs(data_bon$stud_resid) == max(abs(data_bon$stud_resid)), ]

Fix 2

An even nicer looking fix uses the which.max() function:

31: out <- data_bon[which.max(abs(data_bon$stud_resid)), ]

library(faraway)
library(olsrr)
#> Registered S3 methods overwritten by 'car':
#>   method                          from
#>   influence.merMod                lme4
#>   cooks.distance.influence.merMod lme4
#>   dfbeta.influence.merMod         lme4
#>   dfbetas.influence.merMod        lme4
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:faraway':
#> 
#>     hsb
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
f<-lm(stack.loss ~ ., data=stackloss)
## No outliers with p-value less than Bonferroni, should display 
## largest residual, but only finds largest positive value
ols_test_outlier(f)
#>   studentized_residual unadjusted_p_val bonferroni_p_val
#> 4             2.051797        0.0569287         1.195503
## Residual at index found is largest positive residual
rstudent(f)[4]
#>        4 
#> 2.051797
## Actual maximum absolute value of residual is at index=21
index <- which.max(abs(rstudent(f)))
index
#> 21 
#> 21
## Actual largest studentized residual
rstudent(f)[index]
#>        21 
#> -3.330493

Created on 2021-04-17 by the reprex package (v1.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants