-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flagging outliers in R #2
Comments
Hi Ben,
Thank you, this is very thorough and helpful answer! I’ll let you know in class if I run into any troubles with your instructions.
Thanks again!
Jess
…-----
Jessica Miller
MSc candidate
Aquatic Behavioural Ecology Lab (ABEL)
Department of Psychology, Neuroscience & Behaviour
McMaster University
Hamilton, ON L8S 4K1
Phone: 905 525-9140 ext 26037
Fax: 905 529-6225
On Jan 16, 2017, at 4:50 PM, Ben Bolker ***@***.***> wrote:
If you just do summary(), R will tell you (among other things) the min and max values (as well as the number of NA values, if any). (Here I'm using summary just for the mpg column in the built-in mtcars data set; summary(mtcars) will give you the summaries for every column)
summary(mtcars$mpg)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 10.40 15.42 19.20 20.09 22.80 33.90
If you have a range of variables in mind, you can use filter() to select just the rows that are outside this range: in this case I'm going to look for values outside the range (12,32).
library(dplyr)
badrows <- (mtcars %>%
filter(mpg<12 | mpg>32)
)
(I'm putting parentheses around the whole expression here so Jonathan doesn't yell at me)
I can look at this filtered data set by clicking on the little spreadsheety-looking icon in the Data window in RStudio
If I'm working in the console and want to look only at a few columns, I could quickly select() a few:
(badrows %>%
select(mpg,cyl,disp))
# mpg cyl disp
# 1 10.4 8 472.0
# 2 10.4 8 460.0
# 3 32.4 4 78.7
# 4 33.9 4 71.1
In fact, the view of the data that I get in RStudio actually has a Filter button that I can use to do this interactively ...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ben mentioned in class today that when examining your data, there are ways to get R to flag any data values that it identifies as outliers or that you designate as outside an expected value range. Are there any resources that elaborate on that? I think it would be really helpful to be able to flag and then remove or mask values that you’ve identified as outliers.
The text was updated successfully, but these errors were encountered: