Skip to content

Commit

Permalink
resubmission info
Browse files Browse the repository at this point in the history
  • Loading branch information
rudeboybert committed Oct 7, 2021
1 parent 3e12aba commit f2e558a
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 3 deletions.
2 changes: 2 additions & 0 deletions CRAN-RELEASE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This package was submitted to CRAN on 2021-10-05.
Once it is accepted, delete this file and tag the release (commit 3e12aba).
23 changes: 23 additions & 0 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
## Resubmission

This is a resubmission to address the following comments:

Found the following (possibly) invalid URLs:

URL: https://bit.ly/2uD3ls6 (moved to
https://raw.githubusercontent.com/rudeboybert/fivethirtyeight/master/data-raw/bechdel/movies.csv)
From: inst/doc/tame.html
Status: 301
Message: Moved Permanently
URL: https://bit.ly/2vg8gTf (moved to
https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv)
From: inst/doc/tame.html
Status: 301
Message: Moved Permanently
URL: https://bit.ly/2vgRFiw (moved to
https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv)
From: inst/doc/tame.html
Status: 301
Message: Moved Permanently


## Test environments

* local macOS install, R 4.1.0
Expand Down
6 changes: 3 additions & 3 deletions vignettes/tame.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ By advocating for the shielding of novices from such pre-processing, we are not

Whereas the first three subpoints relating to data frame and variable naming are more cosmetic in nature, the fourth principle relates to an R-specific issue faced by novices. Variable names that include spaces require different treatment than those that do not, specifically the use of tick marks when referring to them. While this is a topic R users eventually have to learn, we argue that it is not an immediate priority.

As an example of the importance of preprocessing variable names, consider the data corresponding to the FiveThirtyEight article "41 Percent Of Fliers Think You’re Rude If You Recline Your Seat" [@fliers_538] where survey respondents were asked, among other things, if 1) they considered it rude to bring a baby on a flight and 2) whether or not they had children under the age of 18. The raw data corresponding to this article is saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight> and can be accessed via the shortened bit.ly link <https://bit.ly/2vg8gTf>. We load and save this in a data frame `flying_raw` and look at the first 5 variable names:
As an example of the importance of preprocessing variable names, consider the data corresponding to the FiveThirtyEight article "41 Percent Of Fliers Think You’re Rude If You Recline Your Seat" [@fliers_538] where survey respondents were asked, among other things, if 1) they considered it rude to bring a baby on a flight and 2) whether or not they had children under the age of 18. The raw data corresponding to this article is saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight> and can be accessed [online](https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv). We load and save this in a data frame `flying_raw` and look at the first 5 variable names:

```{r, eval=FALSE}
library(readr)
Expand Down Expand Up @@ -306,7 +306,7 @@ ggplot(flying, aes(x = children_under_18, fill = baby)) +

Although recent advances such as the `lubridate` package have made wrangling dates much easier, performing such tasks can still be very challenging for students who have not had experience [@lubridate]. In datasets where only a numerical variable indicating the year exists, it can be argued no pre-processing is necessary. However, when a month and/or day variables exist along with a year variable, we argue that pre-processing should be done. Specifically, they should be combined and converted to `Date` objects. This allows for easy creation of time series plots with well formatted x-axes and for performing of basic date arithmetic.

As an example of the importance of preprocessing dates, consider the data corresponding to the FiveThirtyEight article "Some People Are Too Superstitious To Have A Baby On Friday The 13th" [@baby13_538] of the number of daily births in the United States between 1994 and 2003. The raw data corresponding to this article is saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight/> and can be accessed via the shortened bit.ly link <https://bit.ly/2vgRFiw>. We load this data, filter for only those rows corresponding to 1999 births, and save this in a data frame `US_births_1999_raw`. The raw data is saved in a format that makes it difficult for novices to create a time series plot. Furthermore, people do not typically think of the day of the week (Sunday, Monday, etc) in terms of a numerical value between 1 and 7.
As an example of the importance of preprocessing dates, consider the data corresponding to the FiveThirtyEight article "Some People Are Too Superstitious To Have A Baby On Friday The 13th" [@baby13_538] of the number of daily births in the United States between 1994 and 2003. The raw data corresponding to this article is saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight/> and can be accessed [online](https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv). We load this data, filter for only those rows corresponding to 1999 births, and save this in a data frame `US_births_1999_raw`. The raw data is saved in a format that makes it difficult for novices to create a time series plot. Furthermore, people do not typically think of the day of the week (Sunday, Monday, etc) in terms of a numerical value between 1 and 7.

```{r, eval=FALSE}
library(readr)
Expand Down Expand Up @@ -355,7 +355,7 @@ As an example of the importance of preprocessing categorical variables, consider
knitr::include_graphics("images/hickey-bechdel-11.png")
```

We now reconstruct this graphic using the data provided by FiveThirtyEight using both the raw data saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight> (accessible via the shortened bit.ly link <https://bit.ly/2uD3ls6>) and using the pre-processed version in the `fivethirtyeight` R package. In both cases, we discretize the `year` variable into 5-year bins using the vector `year_bins` and plot a stacked barplot of proportions.
We now reconstruct this graphic using the data provided by FiveThirtyEight using both the raw data saved in CSV format on FiveThirtyEight's GitHub repository page <https://github.com/fivethirtyeight> which can be accessed [online](https://raw.githubusercontent.com/rudeboybert/fivethirtyeight/master/data-raw/bechdel/movies.csv) and using the pre-processed version in the `fivethirtyeight` R package. In both cases, we discretize the `year` variable into 5-year bins using the vector `year_bins` and plot a stacked barplot of proportions.

```{r, eval=FALSE}
year_bins <- c("'70-'74", "'75-'79", "'80-'84", "'85-'89", "'90-'94",
Expand Down

0 comments on commit f2e558a

Please sign in to comment.