From 319a774756cdf1e3c4bb7a9d1632ce7c2f8c6f61 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Fri, 12 May 2023 11:51:25 +0000
Subject: [PATCH] differences for PR #829

---
 05-data-structures-part2.md     | 254 +++++---------------------------
 fig/06-rmd-generate-figures.sh  |   0
 fig/12-plyr-generate-figures.sh |   0
 md5sum.txt                      |   2 +-
 4 files changed, 40 insertions(+), 216 deletions(-)
 mode change 100755 => 100644 fig/06-rmd-generate-figures.sh
 mode change 100755 => 100644 fig/12-plyr-generate-figures.sh
diff --git a/05-data-structures-part2.md b/05-data-structures-part2.md
index 76451a448..05669423b 100644
--- a/05-data-structures-part2.md
+++ b/05-data-structures-part2.md
@@ -10,8 +10,6 @@ source: Rmd
 - Add and remove rows or columns.
 - Remove rows with `NA` values.
 - Append two data frames.
-- Understand what a `factor` is.
-- Convert a `factor` to a `character` vector and vice versa.
 - Display basic properties of data frames including size and class of the columns, names, and first few rows.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -122,12 +120,7 @@ newRow <- list("tortoiseshell", 3.3, TRUE, 9)
 cats <- rbind(cats, newRow)
 ```
 
-```{.warning}
-Warning in `[<-.factor`(`*tmp*`, ri, value = "tortoiseshell"): invalid factor
-level, NA generated
-```
-
-Looks like our attempt to use the `rbind()` function returns a warning.  Recall that, unlike errors, warnings do not necessarily stop a function from performing its intended action.  You can confirm this by taking a look at the `cats` data frame.
+Let's confirm that our new row was added correctly. 
 
 
 ```r
@@ -135,98 +128,17 @@ cats
 ```
 
 ```{.output}
-    coat weight likes_string age
-1 calico    2.1            1   2
-2  black    5.0            0   3
-3  tabby    3.2            1   5
-4   <NA>    3.3            1   9
-```
-
-Notice that not only did we successfully add a new row, but there is `NA` in the column *coats* where we expected "tortoiseshell" to be.  Why did this happen?
-
-## Factors
-
-For an object containing the data type `factor`, each different value represents what is called a `level`. In our case, the `factor` "coat" has 3 levels: "black", "calico", and "tabby". R will only accept values that match one of the levels. If you add a new value, it will become `NA`.
-
-The warning is telling us that we unsuccessfully added "tortoiseshell" to our
-*coat* factor, but 3.3 (a numeric), TRUE (a logical), and 9 (a numeric) were
-successfully added to *weight*, *likes\_string*, and *age*, respectively, since
-those variables are not factors. To successfully add a cat with a
-"tortoiseshell" *coat*, add "tortoiseshell" as a possible *level* of the factor:
-
-
-```r
-levels(cats$coat)
-```
-
-```{.output}
-[1] "black"  "calico" "tabby" 
-```
-
-```r
-levels(cats$coat) <- c(levels(cats$coat), "tortoiseshell")
-cats <- rbind(cats, list("tortoiseshell", 3.3, TRUE, 9))
-```
-
-Alternatively, we can change a factor into a character vector; we lose the
-handy categories of the factor, but we can subsequently add any word we want to the
-column without babysitting the factor levels:
-
-
-```r
-str(cats)
-```
-
-```{.output}
-'data.frame':	5 obs. of  4 variables:
- $ coat        : Factor w/ 4 levels "black","calico",..: 2 1 3 NA 4
- $ weight      : num  2.1 5 3.2 3.3 3.3
- $ likes_string: int  1 0 1 1 1
- $ age         : num  2 3 5 9 9
-```
-
-```r
-cats$coat <- as.character(cats$coat)
-str(cats)
-```
-
-```{.output}
-'data.frame':	5 obs. of  4 variables:
- $ coat        : chr  "calico" "black" "tabby" NA ...
- $ weight      : num  2.1 5 3.2 3.3 3.3
- $ likes_string: int  1 0 1 1 1
- $ age         : num  2 3 5 9 9
+           coat weight likes_string age
+1        calico    2.1            1   2
+2         black    5.0            0   3
+3         tabby    3.2            1   5
+4 tortoiseshell    3.3            1   9
 ```
 
-:::::::::::::::::::::::::::::::::::::::  challenge
-
-## Challenge 1
-
-Let's imagine that 1 cat year is equivalent to 7 human years.
-
-1. Create a vector called `human_age` by multiplying `cats$age` by 7.
-2. Convert `human_age` to a factor.
-3. Convert `human_age` back to a numeric vector using the `as.numeric()` function. Now divide it by 7 to get the original ages back. Explain what happened.
-
-:::::::::::::::  solution
-
-## Solution to Challenge 1
-
-1. `human_age <- cats$age * 7`
-2. `human_age <- factor(human_age)`. `as.factor(human_age)` works just as well.
-3. `as.numeric(human_age)` yields `1 2 3 4 4` because factors are stored as integers (here, 1:4), each of which is associated with a label (here, 28, 35, 56, and 63). Converting the factor to a numeric vector gives us the underlying integers, not the labels. If we want the original numbers, we need to convert `human_age` to a character vector (using `as.character(human_age)`) and then to a numeric vector (why does this work?). This comes up in real life when we accidentally include a character somewhere in a column of a .csv file supposed to only contain numbers, and set `stringsAsFactors=TRUE` when we read in the data.
-  
-  
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::::::::::::
 
 ## Removing rows
 
-We now know how to add rows and columns to our data frame in R—but in our
-first attempt to add a "tortoiseshell" cat to the data frame we have accidentally
-added a garbage row:
+We now know how to add rows and columns to our data frame in R. Now let's learn to remove rows. 
 
 
 ```r
@@ -238,11 +150,10 @@ cats
 1        calico    2.1            1   2
 2         black    5.0            0   3
 3         tabby    3.2            1   5
-4          <NA>    3.3            1   9
-5 tortoiseshell    3.3            1   9
+4 tortoiseshell    3.3            1   9
 ```
 
-We can ask for a data frame minus this offending row:
+We can ask for a data frame minus the last row:
 
 
 ```r
@@ -250,11 +161,10 @@ cats[-4, ]
 ```
 
 ```{.output}
-           coat weight likes_string age
-1        calico    2.1            1   2
-2         black    5.0            0   3
-3         tabby    3.2            1   5
-5 tortoiseshell    3.3            1   9
+    coat weight likes_string age
+1 calico    2.1            1   2
+2  black    5.0            0   3
+3  tabby    3.2            1   5
 ```
 
 Notice the comma with nothing after it to indicate that we want to drop the entire fourth row.
@@ -262,27 +172,6 @@ Notice the comma with nothing after it to indicate that we want to drop the enti
 Note: we could also remove both new rows at once by putting the row numbers
 inside of a vector: `cats[c(-4,-5), ]`
 
-Alternatively, we can drop all rows with `NA` values:
-
-
-```r
-na.omit(cats)
-```
-
-```{.output}
-           coat weight likes_string age
-1        calico    2.1            1   2
-2         black    5.0            0   3
-3         tabby    3.2            1   5
-5 tortoiseshell    3.3            1   9
-```
-
-Let's reassign the output to `cats`, so that our changes will be permanent:
-
-
-```r
-cats <- na.omit(cats)
-```
 
 ## Removing columns
 
@@ -298,7 +187,7 @@ cats[,-4]
 1        calico    2.1            1
 2         black    5.0            0
 3         tabby    3.2            1
-5 tortoiseshell    3.3            1
+4 tortoiseshell    3.3            1
 ```
 
 Notice the comma with nothing before it, indicating we want to keep all of the rows.
@@ -316,7 +205,7 @@ cats[,!drop]
 1        calico    2.1            1
 2         black    5.0            0
 3         tabby    3.2            1
-5 tortoiseshell    3.3            1
+4 tortoiseshell    3.3            1
 ```
 
 We will cover subsetting with logical operators like `%in%` in more detail in the next episode. See the section [Subsetting through other logical operations](06-data-subsetting.Rmd)
@@ -334,15 +223,15 @@ cats
 ```
 
 ```{.output}
-            coat weight likes_string age
-1         calico    2.1            1   2
-2          black    5.0            0   3
-3          tabby    3.2            1   5
-5  tortoiseshell    3.3            1   9
-11        calico    2.1            1   2
-21         black    5.0            0   3
-31         tabby    3.2            1   5
-51 tortoiseshell    3.3            1   9
+           coat weight likes_string age
+1        calico    2.1            1   2
+2         black    5.0            0   3
+3         tabby    3.2            1   5
+4 tortoiseshell    3.3            1   9
+5        calico    2.1            1   2
+6         black    5.0            0   3
+7         tabby    3.2            1   5
+8 tortoiseshell    3.3            1   9
 ```
 
 But now the row names are unnecessarily complicated. We can remove the rownames,
@@ -413,7 +302,7 @@ now let's use those skills to digest a more realistic dataset. Let's read in the
 
 
 ```r
-gapminder <- read.csv("data/gapminder_data.csv", stringsAsFactors = TRUE)
+gapminder <- read.csv("data/gapminder_data.csv")
 ```
 
 :::::::::::::::::::::::::::::::::::::::::  callout
@@ -429,19 +318,21 @@ gapminder <- read.csv("data/gapminder_data.csv", stringsAsFactors = TRUE)
 
 ```r
 download.file("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv", destfile = "data/gapminder_data.csv")
-gapminder <- read.csv("data/gapminder_data.csv", stringsAsFactors = TRUE)
+gapminder <- read.csv("data/gapminder_data.csv")
 ```
 
 - Alternatively, you can also read in files directly into R from the Internet by replacing the file paths with a web address in `read.csv`. One should note that in doing this no local copy of the csv file is first saved onto your computer. For example,
 
 
 ```r
-gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv", stringsAsFactors = TRUE)
+gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv")
 ```
 
 - You can read directly from excel spreadsheets without
   converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.
   
+- The argument "stringsAsFactors" can be useful to tell R how to read strings either as factors or as character strings. In R versions after 4.0, all strings are read-in as characters by default, but in earlier versions of R, strings are read-in as factors by default. For more information, see the call-out in [the previous episode](https://swcarpentry.github.io/r-novice-gapminder/04-data-structures-part1.html#callout2). 
+  
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -455,15 +346,15 @@ str(gapminder)
 
 ```{.output}
 'data.frame':	1704 obs. of  6 variables:
- $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
+ $ country  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
- $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
+ $ continent: chr  "Asia" "Asia" "Asia" "Asia" ...
  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
  $ gdpPercap: num  779 821 853 836 740 ...
 ```
 
-An additional method for examining the structure of gapminder is to use the `summary` function. This function can be used on various objects in R. For data frames, `summary` yields a numeric, tabular, or descriptive summary of each column. Factor columns are summarized by the number of items in each level, numeric or integer columns by the descriptive statistics (quartiles and mean), and character columns by its length, class, and mode.
+An additional method for examining the structure of gapminder is to use the `summary` function. This function can be used on various objects in R. For data frames, `summary` yields a numeric, tabular, or descriptive summary of each column. Numeric or integer columns are described by the descriptive statistics (quartiles and mean), and character columns by its length, class, and mode.
 
 
 ```r
@@ -471,74 +362,8 @@ summary(gapminder$country)
 ```
 
 ```{.output}
-             Afghanistan                  Albania                  Algeria 
-                      12                       12                       12 
-                  Angola                Argentina                Australia 
-                      12                       12                       12 
-                 Austria                  Bahrain               Bangladesh 
-                      12                       12                       12 
-                 Belgium                    Benin                  Bolivia 
-                      12                       12                       12 
-  Bosnia and Herzegovina                 Botswana                   Brazil 
-                      12                       12                       12 
-                Bulgaria             Burkina Faso                  Burundi 
-                      12                       12                       12 
-                Cambodia                 Cameroon                   Canada 
-                      12                       12                       12 
-Central African Republic                     Chad                    Chile 
-                      12                       12                       12 
-                   China                 Colombia                  Comoros 
-                      12                       12                       12 
-         Congo Dem. Rep.               Congo Rep.               Costa Rica 
-                      12                       12                       12 
-           Cote d'Ivoire                  Croatia                     Cuba 
-                      12                       12                       12 
-          Czech Republic                  Denmark                 Djibouti 
-                      12                       12                       12 
-      Dominican Republic                  Ecuador                    Egypt 
-                      12                       12                       12 
-             El Salvador        Equatorial Guinea                  Eritrea 
-                      12                       12                       12 
-                Ethiopia                  Finland                   France 
-                      12                       12                       12 
-                   Gabon                   Gambia                  Germany 
-                      12                       12                       12 
-                   Ghana                   Greece                Guatemala 
-                      12                       12                       12 
-                  Guinea            Guinea-Bissau                    Haiti 
-                      12                       12                       12 
-                Honduras          Hong Kong China                  Hungary 
-                      12                       12                       12 
-                 Iceland                    India                Indonesia 
-                      12                       12                       12 
-                    Iran                     Iraq                  Ireland 
-                      12                       12                       12 
-                  Israel                    Italy                  Jamaica 
-                      12                       12                       12 
-                   Japan                   Jordan                    Kenya 
-                      12                       12                       12 
-         Korea Dem. Rep.               Korea Rep.                   Kuwait 
-                      12                       12                       12 
-                 Lebanon                  Lesotho                  Liberia 
-                      12                       12                       12 
-                   Libya               Madagascar                   Malawi 
-                      12                       12                       12 
-                Malaysia                     Mali               Mauritania 
-                      12                       12                       12 
-               Mauritius                   Mexico                 Mongolia 
-                      12                       12                       12 
-              Montenegro                  Morocco               Mozambique 
-                      12                       12                       12 
-                 Myanmar                  Namibia                    Nepal 
-                      12                       12                       12 
-             Netherlands              New Zealand                Nicaragua 
-                      12                       12                       12 
-                   Niger                  Nigeria                   Norway 
-                      12                       12                       12 
-                    Oman                 Pakistan                   Panama 
-                      12                       12                       12 
-                 (Other) 
-                     516 
+   Length     Class      Mode 
+     1704 character character 
 ```
 
 Along with the `str` and `summary` functions, we can examine individual columns of the data frame with our `typeof` function:
@@ -557,7 +382,7 @@ typeof(gapminder$country)
 ```
 
 ```{.output}
-[1] "integer"
+[1] "character"
 ```
 
 ```r
@@ -565,7 +390,7 @@ str(gapminder$country)
 ```
 
 ```{.output}
- Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
+ chr [1:1704] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 ```
 
 We can also interrogate the data frame for information about its dimensions;
@@ -726,7 +551,7 @@ Check out `?source` to find out more.
 
 ```r
 download.file("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv", destfile = "data/gapminder_data.csv")
-gapminder <- read.csv(file = "data/gapminder_data.csv", stringsAsFactors = TRUE)
+gapminder <- read.csv(file = "data/gapminder_data.csv")
 ```
 
 To run the script and load the data into the `gapminder` variable:
@@ -745,7 +570,7 @@ source(file = "scripts/load-gapminder.R")
 ## Challenge 5
 
 Read the output of `str(gapminder)` again;
-this time, use what you've learned about factors, lists and vectors,
+this time, use what you've learned about lists and vectors,
 as well as the output of functions like `colnames` and `dim`
 to explain what everything that `str` prints out for gapminder means.
 If there are any parts you can't interpret, discuss with your neighbors!
@@ -756,7 +581,7 @@ If there are any parts you can't interpret, discuss with your neighbors!
 
 The object `gapminder` is a data frame with columns
 
-- `country` and `continent` are factors.
+- `country` and `continent` are character strings.
 - `year` is an integer vector.
 - `pop`, `lifeExp`, and `gdpPercap` are numeric vectors.
 
@@ -770,7 +595,6 @@ The object `gapminder` is a data frame with columns
 - Use `rbind()` to add a new row to a data frame.
 - Remove rows from a data frame.
 - Use `na.omit()` to remove rows from a data frame with `NA` values.
-- Use `levels()` and `as.character()` to explore and manipulate factors.
 - Use `str()`, `summary()`, `nrow()`, `ncol()`, `dim()`, `colnames()`, `rownames()`, `head()`, and `typeof()` to understand the structure of a data frame.
 - Read in a csv file using `read.csv()`.
 - Understand what `length()` of a data frame represents.
diff --git a/fig/06-rmd-generate-figures.sh b/fig/06-rmd-generate-figures.sh
old mode 100755
new mode 100644
diff --git a/fig/12-plyr-generate-figures.sh b/fig/12-plyr-generate-figures.sh
old mode 100755
new mode 100644
diff --git a/md5sum.txt b/md5sum.txt
index 1fd6625d8..b76f0de61 100644
--- a/md5sum.txt
+++ b/md5sum.txt
@@ -7,7 +7,7 @@
 "episodes/02-project-intro.Rmd" "c476f54478c2eaa5102fabe3182f506c" "site/built/02-project-intro.md" "2023-05-03"
 "episodes/03-seeking-help.Rmd" "d24c310b8f36930e70379458f3c93461" "site/built/03-seeking-help.md" "2023-05-03"
 "episodes/04-data-structures-part1.Rmd" "5ec938f71a9cec633cef9329d214c3a0" "site/built/04-data-structures-part1.md" "2023-05-03"
-"episodes/05-data-structures-part2.Rmd" "7669c29de6184a1df7185bffd307c938" "site/built/05-data-structures-part2.md" "2023-05-03"
+"episodes/05-data-structures-part2.Rmd" "ce32e1f2d223079ccea65af1bdf40157" "site/built/05-data-structures-part2.md" "2023-05-12"
 "episodes/06-data-subsetting.Rmd" "5d4ce8731ab37ddea81874d63ae1ce86" "site/built/06-data-subsetting.md" "2023-05-03"
 "episodes/07-control-flow.Rmd" "5f13e849ea80a6c0c6bffbcc035c1e37" "site/built/07-control-flow.md" "2023-05-03"
 "episodes/08-plot-ggplot2.Rmd" "cda76ccacc08449cb54675ba99577894" "site/built/08-plot-ggplot2.md" "2023-05-03"