New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace `typeof` with `class` in episode 4 #350

Open
davebridges opened this Issue Feb 26, 2018 · 5 comments

Comments

Projects
None yet
7 participants
@davebridges
Contributor

davebridges commented Feb 26, 2018

In lesson 04-data-structures-part1, under data types the user learns using typeof to determine the structure of data. This can be confusing because the typeof of a factor returns an integer not a factor as one might expect. I suggest modifying this lesson to use class instead which returns a data type of factor. This may be less confusing to the new learner. For example:

>cats <- data.frame(coat = c('calico','black','tabby'))
>class(cats$coat)
[1] "factor"
>typeof(cats$coat)
[1] "integer"
@jcoliver

This comment has been minimized.

Collaborator

jcoliver commented Feb 27, 2018

@davebridges Thanks for that suggestion. I don't think the lesson ever uses typeof on cats$coat, but learners certainly might try it and be confused when they see a factor is type integer. In the Factor section, there is one call to typeof on a factor, but little actual discussion. Even though the lesson introduces both typeof and class, there doesn't seem to be explicit discussion of the difference between the two. See also the discussion at #294 . Discussion @naupaka @mawds ?

@jcoliver jcoliver added the discussion label Feb 27, 2018

@mawds

This comment has been minimized.

Collaborator

mawds commented Mar 1, 2018

I agree it's confusing.

I wonder whether using stringsAsFactors=FALSE when reading the csv files at the start of the lesson might make it easier to introduce the different types of data? This way we are trying to add a number and a string on cats$weight + cats$coat, and when using feline-data_v2.csv with the broken weight column.

typeof() would then always return something intuitive until we get to the factors section, where the reason why typeof(AFACTOR) is integer can be properly explained.

It would mean introducing stringsAsFactors = FALSE without being able to explain it properly though.

@naupaka

This comment has been minimized.

Member

naupaka commented Mar 2, 2018

Yes, agreed, factors are a little weird. However, my inclination is to leave the lesson as it is. It is true that if people go poking around on their own, they may get confused, but the pace of a workshop is usually rapid enough that there's not too much of that going on. Also, the fact that typeof(my_factor) returns integer is explicitly addressed later on in the lesson when factors are introduced.

@raynamharris

This comment has been minimized.

Contributor

raynamharris commented Jul 6, 2018

To bring this issue back up, I'm inclined to agree with @davebridges.

This section where typeof() is explored is quite long code-wise. There are 5 examples using typeof() but then the types of aren't explained in detail. (I had to google double.)

typeof(3.14)
typeof(1L) # The L suffix forces the number to be an integer, since by default R uses float numbers
typeof(1+1i)
typeof(TRUE)
typeof('banana')

I feel like the question "why doesn't cats$weight + cats$coat work" could be best answered with an example the provide further exploration of the data frame they created

class(cats$coat)
class(cats$weight)
@ahadjixenofontos

This comment has been minimized.

ahadjixenofontos commented Jul 19, 2018

I agree with naupaka that this a learning opportunity. Factors are so central to R that understanding them thoroughly would be beneficial. When typeof(cats$coat) returns "integer" that's a first brush with factor innards, which will be further reinforced later where factors are discussed in more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment