Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of ISO Country Codes (Char and Num) to gapminder data objects #16

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 4 additions & 2 deletions R/gapminder.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,16 @@
#' Excerpt of the Gapminder data on life expectancy, GDP per capita, and
#' population by country.
#'
#' @format The main data frame \code{gapminder} has 1704 rows and 6 variables:
#' @format The main data frame \code{gapminder} has 1704 rows and 8 variables:
#' \describe{
#' \item{country}{factor with 142 levels}
#' \item{continent}{factor with 5 levels}
#' \item{year}{ranges from 1952 to 2007 in increments of 5 years}
#' \item{lifeExp}{life expectancy at birth, in years}
#' \item{pop}{population}
#' \item{gdpPercap}{GDP per capita (US$, inflation-adjusted)}
#' \item{gdpPercap}{GDP per capita}
#' \item{isoChar}{ISO alpha-3 country code}
#' \item{isoNum}{ISO numeric-3 country code}
#' }
#'
#' The supplemental data frame \code{\link{gapminder_unfiltered}} was not
Expand Down
34 changes: 19 additions & 15 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,18 @@ set.seed(1)

Excerpt from the [Gapminder](http://www.gapminder.org/data/) data. The main object in this package is the `gapminder` data frame or "tibble". There are other goodies, such as the data in tab delimited form, a larger unfiltered dataset, and premade color schemes for the countries and continents.

The `gapminder` data frames include six variables, ([Gapminder.org documentation page](http://www.gapminder.org/data/documentation/)):

| variable | meaning |
|:------------|:-------------------------|
| country | |
| continent | |
| year | |
| lifeExp | life expectancy at birth |
| pop | total population |
| gdpPercap | per-capita GDP |
The `gapminder` data frames include eight variables, ([Gapminder.org documentation page](http://www.gapminder.org/data/documentation/)):

| variable | meaning |
| :------------ | :------------------------- |
| country | |
| continent | |
| year | |
| lifeExp | life expectancy at birth |
| pop | total population |
| gdpPercap | per-capita GDP |
| isoChar | ISO alpha-3 country code |
| isoNum | ISO numeric-3 country code |

Per-capita GDP (Gross domestic product) is given in units of [international dollars](http://en.wikipedia.org/wiki/Geary%E2%80%93Khamis_dollar), "a hypothetical unit of currency that has the same purchasing power parity that the U.S. dollar had in the United States at a given point in time" -- 2005, in this case.

Expand Down Expand Up @@ -59,7 +61,7 @@ gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
summarise(lifeExp = median(lifeExp))

library("ggplot2")
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "hotpink") +
Expand Down Expand Up @@ -132,6 +134,8 @@ Description:
- `pop`: population
- `gdpPercap`: GDP per capita
- `lifeExp`: life expectancy
- `isoChar`: ISO alpha-3 country code
- `isoNum`: ISO numeric-3 country code

There are 12 rows for each country in `gapminder`, i.e. complete data for 1952, 1955, ..., 2007.

Expand All @@ -154,13 +158,13 @@ If you want to practice importing from file, various tab delimited files are inc
* [`gapminder.tsv`](inst/gapminder.tsv): the same dataset available via `library("gapminder"); gapminder`
* [`gapminder-unfiltered.tsv`](inst/gapminder-unfiltered.tsv): the larger dataset available via `library("gapminder"); gapminder_unfiltered`.
* [`continent-colors.tsv`](inst/continent-colors.tsv) and [`country-colors.tsv`](inst/country-colors.tsv): color schemes

Here in the source, these delimited files can be found:

* in the [`inst/`](inst) sub-directory

Once you've installed the `gapminder` package they can be found locally and used like so:

```{r}
gap_tsv <- system.file("gapminder.tsv", package = "gapminder")
gap_tsv <- read.delim(gap_tsv)
Expand All @@ -177,4 +181,4 @@ gap_bigger_tsv %>% # Bhutan IS here though! :)

## License

Gapminder's data is released under the Creative Commons Attribution 3.0 Unported license. See their [terms of use](https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM).
Gapminder's data is released under the Creative Commons Attribution 3.0 Unported license. See their [terms of use](https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM).
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,18 @@ gapminder

Excerpt from the [Gapminder](http://www.gapminder.org/data/) data. The main object in this package is the `gapminder` data frame or "tibble". There are other goodies, such as the data in tab delimited form, a larger unfiltered dataset, and premade color schemes for the countries and continents.

The `gapminder` data frames include six variables, ([Gapminder.org documentation page](http://www.gapminder.org/data/documentation/)):

| variable | meaning |
|:----------|:-------------------------|
| country | |
| continent | |
| year | |
| lifeExp | life expectancy at birth |
| pop | total population |
| gdpPercap | per-capita GDP |
The `gapminder` data frames include eight variables, ([Gapminder.org documentation page](http://www.gapminder.org/data/documentation/)):

| variable | meaning |
|:----------|:---------------------------|
| country | |
| continent | |
| year | |
| lifeExp | life expectancy at birth |
| pop | total population |
| gdpPercap | per-capita GDP |
| isoChar | ISO alpha-3 country code |
| isoNum | ISO numeric-3 country code |

Per-capita GDP (Gross domestic product) is given in units of [international dollars](http://en.wikipedia.org/wiki/Geary%E2%80%93Khamis_dollar), "a hypothetical unit of currency that has the same purchasing power parity that the U.S. dollar had in the United States at a given point in time" -- 2005, in this case.

Expand Down Expand Up @@ -64,7 +66,7 @@ gapminder %>%
## 3 Asia 72.3960
## 4 Europe 78.6085
## 5 Oceania 80.7195

library("ggplot2")
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "hotpink") +
Expand Down Expand Up @@ -149,6 +151,8 @@ Description:
- `pop`: population
- `gdpPercap`: GDP per capita
- `lifeExp`: life expectancy
- `isoChar`: ISO alpha-3 country code
- `isoNum`: ISO numeric-3 country code

There are 12 rows for each country in `gapminder`, i.e. complete data for 1952, 1955, ..., 2007.

Expand Down
49 changes: 49 additions & 0 deletions data-raw/10_add-iso-codes-to-data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#' ---
#' date: "`r format(Sys.Date())`"
#' output:
#' html_document:
#' keep_md: TRUE
#' ---
#'

suppressPackageStartupMessages(library(dplyr))
library(countrycode)
load(file = file.path("..", "data", "gapminder.rdata"))


# Modifying `countrycode.rdata`
## Using the `countrycode` package, we can add ISO Alpha-3 and ISO Numeric-3
## country codes for each observation
gapminder <- gapminder %>%
mutate(isoChar = countrycode(gapminder$country, "country.name", "iso3c"),
isoNum = countrycode(gapminder$country, "country.name", "iso3n"))

## Is there an ISO code matched to each observation?
(sum(is.na(gapminder$isoNum) == TRUE)) # 0 (No non-matches)
(sum(is.na(gapminder$isoChar) == TRUE)) # 0 (No non-matches)

## Writing data
save(gapminder, file = file.path("..", "data", "gapminder.rdata"))
write.table(gapminder, file = file.path("..", "inst", "gapminder.tsv"),
quote=FALSE, sep='\t')

# Modifying `gapminder_unfiltered.rdata`
load(file = file.path("..", "data", "gapminder_unfiltered.rdata"))

## Adding ISO codes
gapminder_unfiltered <- gapminder_unfiltered %>%
mutate(isoChar = countrycode(gapminder_unfiltered$country,
"country.name", "iso3c"),
isoNum = countrycode(gapminder_unfiltered$country,
"country.name", "iso3n"))

## Is there an ISO code matched to each observation?
(sum(is.na(gapminder_unfiltered$isoNum) == TRUE)) # 0 (No non-matches)
(sum(is.na(gapminder_unfiltered$isoChar) == TRUE)) # 0 (No non-matches)

## Writing data
save(gapminder_unfiltered, file = file.path("..", "data",
"gapminder_unfiltered.rdata"))
write.table(gapminder_unfiltered,
file= file.path("..", "inst", "gapminder_unfiltered.tsv"),
quote=FALSE, sep='\t')
1 change: 1 addition & 0 deletions data-raw/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Cleaning history
- 2010: The first time I documented cleaning this dataset. I started with delimited files I exported from Excel. Not present in this repo.
- 2014: I re-cleaned the data and (mostly) forced myself to pull it straight out of the spreadsheets. Used the `gdata` package. It was kind of painful, due to encoding and other issues. See the scripts in this state in [v0.1.0](https://github.com/jennybc/gapminder/tree/v0.1.0/data-raw).
- 2015: I revisited the cleaning and switched to `readxl`. This was much less painful. Present day.
- 2016: Added ISO country codes.

| r\_script | notebook | tsv |
|:------------------------------------------------------------------------|:--------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------|
Expand Down
Binary file modified data/gapminder.rdata
Binary file not shown.
Binary file modified data/gapminder_unfiltered.rdata
Binary file not shown.