Permalink
Cannot retrieve contributors at this time
--- | |
title: "Data cleaning" | |
output: github_document | |
--- | |
I explicitly use this package to teach data cleaning, so have refactored my old cleaning code into several scripts. I also include them as compiled Markdown reports. Caveat: these are realistic cleaning scripts! Not the highly polished ones people write with 20/20 hindsight :) I wouldn't necessarily clean it the same way again (and I would download more recent data!), but at this point there is great value in reproducing the data I've been using for ~5 years. | |
Cleaning history | |
* 2010: The first time I documented cleaning this dataset. I started with | |
delimited files I exported from Excel. Not present in this repo. | |
* 2014: I re-cleaned the data and (mostly) forced myself to pull it straight | |
out of the spreadsheets. Used the `gdata` package. It was kind of painful, due to encoding and other issues. See the scripts in this state in [v0.1.0](https://github.com/jennybc/gapminder/tree/v0.1.0/data-raw). | |
* 2015: I revisited the cleaning and switched to `readxl`. This was much less painful. Present day. | |
```{r results='asis', echo = FALSE, warning = FALSE} | |
library(tidyverse) | |
library(stringr) | |
library(knitr) | |
library(here) | |
x <- tibble(fls = list.files(here("data-raw"))) %>% | |
mutate(fls_basename = basename(fls)) %>% | |
separate(fls_basename, c("script", "slug", "ext"), "[_\\.]") | |
x <- x %>% | |
filter(script %>% str_detect("^[0-9]+"), | |
ext %>% str_detect("R|r|md|tsv")) %>% | |
select(-slug) | |
y <- x %>% | |
group_by(script) %>% | |
nest() | |
collapse_md_links <- function(x) { | |
x %>% { | |
paste0("[", ., "](", ., ")") | |
} %>% | |
paste(collapse = ", ") | |
} | |
jfun <- function(z) { | |
tibble( | |
r_script = z$fls[z$ext == "R"] %>% collapse_md_links(), | |
notebook = z$fls[z$ext == "md"] %>% collapse_md_links(), | |
tsv = z$fls[z$ext == "tsv"] %>% collapse_md_links() | |
) | |
} | |
y$data %>% | |
map_df(jfun) %>% | |
kable() | |
``` | |
```{r eval = FALSE, echo = FALSE} | |
devtools::session_info() | |
``` |