New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird mean() error in beginning of the dplyr episode #558
Comments
hi @rkclement! Your diagnostic is correct, you can't compute the mean on a data frame object: mean(iris)
# [1] NA
# Warning message:
# In mean.default(iris) : argument is not numeric or logical: returning NA If you import gapminder <- read.csv("data/gapminder_data.csv")
class(gapminder)
# [1] "data.frame" When you evaluate: class(gapminder[gapminder$continent == "Africa", "gdpPercap"])
# [1] "numeric" class(gapminder[gapminder$continent == "Africa", "gdpPercap", drop=FALSE])
# [1] "data.frame" The tidyverse is designed to make data frames the data structure of choice, and strives for limiting surprises caused by coercions. Therefore, tibbles never do this type of coercion by default. gapminder_ti <- read_csv("data/gapminder_data.csv")
class(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap"])
# [1] "tbl_df" "tbl" "data.frame" # this is the class of a tibble How can you calculate the mean then?
mean(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap", drop=TRUE])
[1] 2193.755
mean(gapminder_ti[["gdpPercap"]][gapminder_ti$continent == "Africa"])
[1] 2193.755
library(dplyr)
gapminder_ti %>%
filter(continent == "Africa") %>%
pull(gdpPercap) %>%
mean() |
Add catch for None type code block in lesson_check
I have taken to showing students read_csv() instead of read.csv() when we start working with the gapminder data. Today I ran into a strange error and want to know if others have seen this and have any idea how old it is (I think this code worked back in February 2019).
If I read in the gapminder data with:
gapminder <- read_csv("data/gapminder_data.csv")
When I get to the beginning of the dplyr lesson and try to run:
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
I get the following error:
Warning message: In mean.default(gapminder[gapminder$continent == "Africa", "gdpPercap"]) : argument is not numeric or logical: returning NA
There are no NAs in the data, as confirmed by:
sum(is.na(gapminder[gapminder$continent == "Africa", "gdpPercap"]))
Interestingly, if I read read the data with
read.csv()
the problem disappears:gapminder <- read.csv("data/gapminder_data.csv") mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
Also, If I just do (i.e. without the square brackets for subsetting):
gapminder <- read_csv("data/gapminder_data.csv") mean(gapminder$gdpPercap)
the problem also disappears.
Finally, though this seems to be related to reading in the data as a tibble, the following also does not work:
gapminder <- read_csv("data/gapminder_data.csv") mean(as.data.frame(gapminder[gapminder$continent == "Africa", "gdpPercap"]))
As best I can tell, this is related to the fact that a tibble, when you use brackets to subset, gives you another tibble out, whereas a data frame, when you use brackets to subset, gives you a vector. However, I can't even use
as.vector()
to fix this (i.e., this doesn't work:as.vector(gapminder[gapminder$continent == "Africa", "gdpPercap"], mode = 'numeric')
Is there any way around this issue, or can you just not effectively use brackets to subset tibbles?
The text was updated successfully, but these errors were encountered: