Export and document data #29

maurolepore · 2018-02-28T16:22:35Z

This issue describes the process by which I export and document data. You can track this process searching for commits tagged with the number of this issue (#29).

* Remove information that is better recorded via git. Git knows who does what and when. So it's best to avoid recording this info as comments because it is too easy for such comments to become out of sync with changes in the code. * Describe file more generally so the comment remains up to date. Notice that the description refers to 4 tables; later a comment refers to 5 tables; and finally only 3 tables are exported. So it is best to not refer to how many tables we expect because this may continue to change. * Remove references to table numbers; instead rely on table names. This is a clearer way to refer to tables because it identifies tables in a unique way (by name as opposed to by name and number). Also, numbers become out of date too easily, as tables are added, removed or their order is changed. * Group code by table, not by task. This makes it easier to read and maintain the code.

maurolepore · 2018-02-28T18:19:15Z

@gonzalezeb,

The code that exports data contains this chunk.

# eliminate rows where fam or sp is unknown #use unique(allo_main$species)
master <- subset(master, family != "Unkown")
# chnage name of "equation" column to "equation_form"

1. Do you still want to filter `master` to exclude rows where `sp == "!Unknown"`?

Your comment suggest so, but the code doesn't do that. Please update to remove ambiguity. If you express your goal in code you need not to repeate your goal in a comment.

This example may help you write the code you want:

dfm <- data.frame(a = 1:3, b = 3:1)
subset(dfm, a >= 2 & b <=2)
#>   a b
#> 2 2 2
#> 3 3 1

2. Do you still want to rename the column `equation` as `equation_form`?

Althought the comment suggests this, the code doesn't do it. Please solve this ambiguity.

This already donne by SHA `0e985e0`

maurolepore · 2018-02-28T18:57:50Z

@gonzalezeb,

For safety I've changed the script that exports data to subset the master dataset not by column position but by column name. Please review and ensure that the selected columns are the ones you intended.

Subsetting objects by position is too risky to do it in a sript; it's OK for interactive use but not for programming because the order of the columns can change too easily and the script will be blind to such change. For programming it is important to subset data by column name. If the names change the scrpit will flag the change with an error. This is desirable because it makes the program safer.

Consider the following example:

dfm <- data.frame(x = 1, y = 2, z = 3)

# Say that you have something like this
selected_names <- names(dfm)[c(1, 3)]

# You can get a vector of names with `datapasta::dpasta()`
datapasta::dpasta(selected_names)

# This is the result
c("x", "z")

# Now you can use this vector to subset `dfm`
dfm_cols <- c("x", "z")
dfm[dfm_cols]

maurolepore · 2018-02-28T21:17:02Z

@gonzalezeb,

Is the name of the file "allotemp_main.csv" expresive enough? I feel that "temp" is a sufix that may be missleading -- suggesting the data is not important but temporal. I like "_main" although I think you should pick only one of either "master" or "main". I suggest the name "allodb_master_data".

…ith use_data().

See for example ggplot2 at https://github.com/tidyverse/ggplot2/tree/master/data-raw. Notice that each data-raw/filename.csv is cleaned in data-raw/filename.R and exported to data/ via use_data(); then each exported dataset is documented in R/data.R

…-raw/data.Rmd.

maurolepore · 2018-03-01T13:02:17Z

@gonzalezeb

Please review data-raw/data.Rmd. Some of the notes use informal abreviations (e.g. sp). Could you please rewrite, now considering that the document is not for personal use but to be shared with users? I think it is OK to abreviate words that mean a concept, for example, it is OK to use sp. to abreviate the concept species. But to refer not to the concept "species" but to an R object called species I suggest you spell the word out and type species (and best is to use code syntax as I do here).

…-raw/data.Rmd.

maurolepore self-assigned this Feb 28, 2018

maurolepore assigned gonzalezeb Feb 28, 2018

maurolepore added the question label Feb 28, 2018

maurolepore added a commit that referenced this issue Feb 28, 2018

#29: Allow reproducible data-export by reading master and exporting w…

6df24d8

…ith use_data().

maurolepore added a commit that referenced this issue Feb 28, 2018

#29: Rename data_master.R to conform with convension in data-raw/

5508445

maurolepore added a commit that referenced this issue Mar 1, 2018

#29: Centralize gathering, cleaning and exporting of raw data in data…

07f5e37

…-raw/data.Rmd.

maurolepore added a commit that referenced this issue Mar 1, 2018

#29: Centralize gathering, cleaning and exporting of raw data in data…

ec61533

…-raw/data.Rmd.

maurolepore added a commit that referenced this issue Mar 6, 2018

#29: DOn't remove stems of unknown family.

0e985e0

gonzalezeb closed this as completed Sep 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export and document data #29

Export and document data #29

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Mar 1, 2018 •

edited

Loading

Export and document data #29

Export and document data #29

Comments

maurolepore commented Feb 28, 2018 • edited Loading

maurolepore commented Feb 28, 2018 • edited Loading

1. Do you still want to filter master to exclude rows where sp == "!Unknown"?

2. Do you still want to rename the column equation as equation_form?

This already donne by SHA 0e985e0

maurolepore commented Feb 28, 2018 • edited Loading

maurolepore commented Feb 28, 2018 • edited Loading

maurolepore commented Mar 1, 2018 • edited Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

1. Do you still want to filter `master` to exclude rows where `sp == "!Unknown"`?

2. Do you still want to rename the column `equation` as `equation_form`?

This already donne by SHA `0e985e0`

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Feb 28, 2018 •

edited

Loading

maurolepore commented Mar 1, 2018 •

edited

Loading