Skip to content

Load CPS microdata into R using the Census API

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

matt-saenz/cpsR

Repository files navigation

cpsR

CRAN status Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN downloads R-CMD-check

Overview

Load Current Population Survey (CPS) microdata into R using the Census Bureau Data API, including basic monthly CPS and CPS ASEC microdata.

Note: This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.

For a Python version of this package, check out PyCPS.

Installation

To install cpsR, run the following code:

install.packages("cpsR")

To install the development version of cpsR, run the following code:

# install.packages("devtools")
devtools::install_github("matt-saenz/cpsR")

Census API key

In order to use cpsR functions, you must supply a Census API key in one of two ways:

  1. Using the key argument (manually)
  2. Using environment variable CENSUS_API_KEY (automatically)

Using environment variable (or env var, for short) CENSUS_API_KEY is strongly recommended for two reasons:

  1. Saves you from having to copy-paste your key around
  2. Allows you to avoid including your key in scripts

It is important to avoid including your key in scripts if you plan to share your code with others (like in the example below) since you should keep your key secret.

You can set up env var CENSUS_API_KEY in two steps:

First, open your .Renviron file. You can do so by running:

# install.packages("usethis")
usethis::edit_r_environ()

Second, add your Census API key to your .Renviron file like so:

CENSUS_API_KEY='your_key_here'

This enables cpsR functions to automatically look up your key by running:

Sys.getenv("CENSUS_API_KEY")

Example

library(cpsR)
library(dplyr)
library(purrr)


# Simple use of the basic monthly CPS

sep21 <- get_basic(
  year = 2021,
  month = 9,
  vars = c("prpertyp", "prtage", "pemlr", "pwcmpwgt")
)

sep21
#> # A tibble: 103,858 × 4
#>    prpertyp prtage pemlr pwcmpwgt
#>       <int>  <int> <int>    <dbl>
#>  1        2     80     5    1361.
#>  2        2     85     5    1411.
#>  3        2     80     5    4619.
#>  4        2     80     5    4587.
#>  5        2     42     1    3677.
#>  6        2     42     1    3645.
#>  7        1      9    -1       0 
#>  8        2     41     1    3652.
#>  9        2     32     7    4117.
#> 10        2     67     1    2479.
#> # ℹ 103,848 more rows

sep21 %>%
  filter(prpertyp == 2 & prtage >= 16) %>%
  summarize(
    pop16plus = sum(pwcmpwgt),
    employed = sum(pwcmpwgt[pemlr %in% 1:2])
  ) %>%
  mutate(epop_ratio = employed / pop16plus)
#> # A tibble: 1 × 3
#>    pop16plus   employed epop_ratio
#>        <dbl>      <dbl>      <dbl>
#> 1 261765646. 154025931.      0.588


# Pulling multiple years of CPS ASEC microdata

asec <- map_dfr(2020:2021, get_asec, vars = c("h_year", "marsupwt"))

count(asec, h_year, wt = marsupwt)
#> # A tibble: 2 × 2
#>   h_year          n
#>    <int>      <dbl>
#> 1   2020 325268182.
#> 2   2021 326195440.