Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Weird mean() error in beginning of the dplyr episode #558
I have taken to showing students read_csv() instead of read.csv() when we start working with the gapminder data. Today I ran into a strange error and want to know if others have seen this and have any idea how old it is (I think this code worked back in February 2019).
If I read in the gapminder data with:
When I get to the beginning of the dplyr lesson and try to run:
I get the following error:
There are no NAs in the data, as confirmed by:
Interestingly, if I read read the data with
Also, If I just do (i.e. without the square brackets for subsetting):
Finally, though this seems to be related to reading in the data as a tibble, the following also does not work:
As best I can tell, this is related to the fact that a tibble, when you use brackets to subset, gives you another tibble out, whereas a data frame, when you use brackets to subset, gives you a vector. However, I can't even use
Is there any way around this issue, or can you just not effectively use brackets to subset tibbles?
Your diagnostic is correct, you can't compute the mean on a data frame object:
mean(iris) #  NA # Warning message: # In mean.default(iris) : argument is not numeric or logical: returning NA
If you import
gapminder <- read.csv("data/gapminder_data.csv") class(gapminder) #  "data.frame"
When you evaluate:
class(gapminder[gapminder$continent == "Africa", "gdpPercap"]) #  "numeric"
class(gapminder[gapminder$continent == "Africa", "gdpPercap", drop=FALSE]) #  "data.frame"
The tidyverse is designed to make data frames the data structure of choice, and strives for limiting surprises caused by coercions. Therefore, tibbles never do this type of coercion by default.
gapminder_ti <- read_csv("data/gapminder_data.csv") class(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap"]) #  "tbl_df" "tbl" "data.frame" # this is the class of a tibble
How can you calculate the mean then?
mean(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap", drop=TRUE])  2193.755
mean(gapminder_ti[["gdpPercap"]][gapminder_ti$continent == "Africa"])  2193.755
library(dplyr) gapminder_ti %>% filter(continent == "Africa") %>% pull(gdpPercap) %>% mean()