Weird mean() error in beginning of the dplyr episode #558

rkclement · 2019-08-29T19:29:13Z

I have taken to showing students read_csv() instead of read.csv() when we start working with the gapminder data. Today I ran into a strange error and want to know if others have seen this and have any idea how old it is (I think this code worked back in February 2019).

If I read in the gapminder data with:
gapminder <- read_csv("data/gapminder_data.csv")

When I get to the beginning of the dplyr lesson and try to run:
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])

I get the following error:
Warning message: In mean.default(gapminder[gapminder$continent == "Africa", "gdpPercap"]) : argument is not numeric or logical: returning NA

There are no NAs in the data, as confirmed by:
sum(is.na(gapminder[gapminder$continent == "Africa", "gdpPercap"]))

Interestingly, if I read read the data with read.csv() the problem disappears:
gapminder <- read.csv("data/gapminder_data.csv") mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])

Also, If I just do (i.e. without the square brackets for subsetting):
gapminder <- read_csv("data/gapminder_data.csv") mean(gapminder$gdpPercap)
the problem also disappears.

Finally, though this seems to be related to reading in the data as a tibble, the following also does not work:
gapminder <- read_csv("data/gapminder_data.csv") mean(as.data.frame(gapminder[gapminder$continent == "Africa", "gdpPercap"]))

As best I can tell, this is related to the fact that a tibble, when you use brackets to subset, gives you another tibble out, whereas a data frame, when you use brackets to subset, gives you a vector. However, I can't even use as.vector() to fix this (i.e., this doesn't work: as.vector(gapminder[gapminder$continent == "Africa", "gdpPercap"], mode = 'numeric')

Is there any way around this issue, or can you just not effectively use brackets to subset tibbles?

The text was updated successfully, but these errors were encountered:

fmichonneau · 2019-09-06T14:02:04Z

hi @rkclement!

Your diagnostic is correct, you can't compute the mean on a data frame object:

mean(iris)
# [1] NA
# Warning message:
# In mean.default(iris) : argument is not numeric or logical: returning NA

If you import gapminder using read.csv, the gapminder object is of class data.frame:

gapminder <- read.csv("data/gapminder_data.csv")
class(gapminder)
# [1] "data.frame"

When you evaluate: gapminder[gapminder$continent == "Africa", "gdpPercap"], R then coerces the result to a vector. Indeed, a hidden argument of the [ function called drop (set to TRUE by default) converts automatically a 1-column data frame into an atomic vector.

class(gapminder[gapminder$continent == "Africa", "gdpPercap"])
# [1] "numeric"

class(gapminder[gapminder$continent == "Africa", "gdpPercap", drop=FALSE])
# [1] "data.frame"

The tidyverse is designed to make data frames the data structure of choice, and strives for limiting surprises caused by coercions. Therefore, tibbles never do this type of coercion by default.

gapminder_ti <- read_csv("data/gapminder_data.csv")
class(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap"])
# [1] "tbl_df"     "tbl"        "data.frame" # this is the class of a tibble

How can you calculate the mean then?

you can use drop = TRUE:

mean(gapminder_ti[gapminder_ti$continent == "Africa", "gdpPercap", drop=TRUE])
[1] 2193.755

you can extract the list elements from the data frame:

mean(gapminder_ti[["gdpPercap"]][gapminder_ti$continent == "Africa"])
[1] 2193.755

you can use the tidyverse functions:

library(dplyr)
gapminder_ti %>%
  filter(continent == "Africa") %>%
  pull(gdpPercap) %>%
  mean()

Add catch for None type code block in lesson_check

jcoliver closed this as completed Sep 26, 2019

zkamvar added a commit that referenced this issue Mar 15, 2021

Merge pull request #558 from zkamvar/znk-fix-550

091d31a

Add catch for None type code block in lesson_check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird mean() error in beginning of the dplyr episode #558

Weird mean() error in beginning of the dplyr episode #558

rkclement commented Aug 29, 2019 •

edited

fmichonneau commented Sep 6, 2019 •

edited

Weird mean() error in beginning of the dplyr episode #558

Weird mean() error in beginning of the dplyr episode #558

Comments

rkclement commented Aug 29, 2019 • edited

fmichonneau commented Sep 6, 2019 • edited

rkclement commented Aug 29, 2019 •

edited

fmichonneau commented Sep 6, 2019 •

edited