New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of converting a factor into a character vector in Lesson 5 is a bit awkward #313

Open
doktor-nick opened this Issue Oct 13, 2017 · 1 comment

Comments

Projects
None yet
3 participants
@doktor-nick

doktor-nick commented Oct 13, 2017

In the Factors section of Lesson 5, one solution to the error given when trying to add a row containing a level, tortoiseshell, to the factor cats$coat is given as adding the level to the factor first, i.e.
levels(cats$coat) <- c(levels(cats$coat), 'tortoiseshell')
which makes sense, since cats$coat is naturally categorical, and so is naturally a factor.

The other solution of converting cats$coat to a character vector makes less sense, since cats$coat is naturally a factor, hence this conversion is not a good solution to changing the data so that no error is obtained when adding a new row to the cats.

The intent appears to be to show how as.character() works as well as how to add a level to a category. One suggestion would that if a cat 'name' column was added to the cats data in Lessons 4 and 5, i.e. the feline-data.csv contained:

name, coat, weight, likes_string
Pris, calico, 2.1, 1
Ada, black, 5, 0
Coco, tabby, 3.2, 1

then the distinction between factor (cats$coat) and as.character (cats$name) data might be made more clearly, and there would be less confusion caused by making naturally categorical data into character data.

[p.s. Thanks for these great lessons, I've really enjoyed teaching them!]

@naupaka

This comment has been minimized.

Member

naupaka commented Oct 17, 2017

Hi @doktor-nick thanks for logging the issue. I think you've landed on something that definitely needs fixing.

I think some might argue that it's best to keep things as character vectors for as long as possible (the whole stringsAsFactors = FALSE discussion, e.g. https://twitter.com/hadleywickham/status/624349074636976128).

As I understand it the tidyverse approach to this issue is to not convert anything into a factor unless absolutely necessary. So it could be reasonable to turn the column into a character vector, but at the same time, I see your point about that allowing people to become careless with, e.g. typos.

I also just noticed as I was looking over it that the failed rbind() is never cleaned up, so you get this output, where the 4th row in the coat column has the value NA:

str(cats)
# 'data.frame':	5 obs. of  4 variables:
# $ coat        : Factor w/ 4 levels "black","calico",..: 2 1 3 NA 4
# $ weight      : num  2.1 5 3.2 3.3 3.3
# $ likes_string: int  1 0 1 1 1
# $ age         : num  4 5 8 9 9

Seems like that should be fixed too.

I think one way to solve this is to get rid of the as.character bit altogether - I'm a little hesitant for us to add more content (e.g. a bit on cat names) in a section that is already a little into the weeds for beginners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment