# Households by AMI

This script is used to pull the number of households by AMI in Colorado using PUMS data via IPUMS using the `ipumsr` package. Median Family Incomes are collected separately as provided by HUD.

In [1]:
# NOTE: To load data, you must download both the extract's data and the DDI
# and also set the working directory to the folder with these files (or change the path below).
if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")
library(tidyverse)
library(srvyr)
library(tidycensus)
library(duckplyr)

# get hud IL data
library(hudr)
hud_key <- Sys.getenv("HUD_API_KEY")


Loading required package: ipumsr

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching package: 'srvyr'


The following object is masked from 'package:stats':

    filter


[1m[22mThe [34mduckplyr[39m packag

Next we pull for all years, using Census MFI by year pulled from Tidycensus

In [2]:
acs_years <- c(2005:2019, 2021:2023)


In [3]:
acs_mfi <- map(
  acs_years,
  ~ get_acs(
    geography = "state",
    variables = "B19113_001",
    state = "CO",
    year = .x,
    survey = "acs1"
  ) |>
    mutate(year = .x, .before = 1)
) |>
  bind_rows()


Getting data from the 2005 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.

[1m[22mThe [34mduckplyr[39m package is configured to fall back to [34mdplyr[39m when it encounters an
incompatibility. Fallback events can be collected and uploaded for analysis to
guide future development. By default, no data will be collected or uploaded.
[36mℹ[39m A fallback situation just occurred. The following information would have been
  recorded:
  {"version":"0.4.1","message":"No translation for function
  `ifelse`.","name":"mutate","x":{"...1":"character","...2":"character","...3":"character","...4":"character","...5":"numeric"},"args":{"dots":{"...4":"ifelse(...4
  == \"<character>\", \"<character>\",
  \"<character>\")"},".by":"NULL",".keep":["all","used","unused","none"]}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging,
  and `Sys.setenv(DUCKPLY

duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2006 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2007 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2008 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2009 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2010 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2011 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2012 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2013 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2014 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2015 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2016 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2017 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2018 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2019 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2021 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2022 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


Getting data from the 2023 1-year ACS

The 1-year ACS provides data for geographies with populations of 65,000 and greater.



duckplyr: materializing, review details with duckplyr::last_rel()


In [4]:
acs_samples <- get_sample_info("usa") |>
  filter(str_detect(name, pattern = "us20\\d{2}a")) |>
  pull(name)


duckplyr: materializing, review details with duckplyr::last_rel()


In [None]:
if (file.exists("usa_00064.xml")) {
  acs_00_23 <- read_ipums_ddi("usa_00064.xml") |>
    read_ipums_micro()
} else {
  define_extract_micro(
    collection = "usa",
    description = "ACS 1 year samples in Colorado of income variables, all samples since 2000",
    samples = acs_samples,
    variables = list(
      var_spec("STATEFIP", case_selections = "08"),
      "COUNTYFIP",
      "PUMA",
      "NUMPREC",
      "CPI99",
      "OWNERSHP",
      "HHTYPE",
      "HHINCOME",
      "INCTOT",
      "INCWAGE",
      "INCBUS00",
      "INCSS",
      "INCWELFR",
      "INCINVST",
      "INCSUPP",
      "INCOTHER",
      "INCEARN",
      "POVERTY",
      "CBPOVERTY",
      "FAMUNIT",
      "FAMSIZE",
      "FTOTINC"
    )
  ) |>
    submit_extract() |>
    wait_for_extract() |>
    download_extract() |>
    read_ipums_micro()
}


Use of data from IPUMS USA is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.



duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: 

In [8]:
acs_amis <- as_duckplyr_df(acs_00_23) |> 
  left_join(
    acs_mfi |> 
      select("YEAR" = "year", mfi = estimate)
  ) |> 
    mutate(
      hhsize = NUMPREC,
      hhadj = case_when(
        hhsize < 4 ~ 1 - (4 - hhsize) * .1,
        hhsize >.4 ~ 1 + (hhsize - 4) *.08,
        .default = 1
      ),
      mfi_hh = mfi *hhadj,
      pct_mfi = HHINCOME/mfi_hh,
      ami_group = case_when(
        pct_mfi <= .3 ~ "less than or equal to 30% of MFI",
        pct_mfi < .5 ~ "30% to 50% of MFI",
        pct_mfi < .8 ~ "50% to 80% of MFI",
        pct_mfi < 1 ~ "80% to 99% of MFI",
        .default = "100% AMI or greater"
      )
    )


duckplyr: materializing, review details with duckplyr::last_rel()
[1m[22mJoining with `by = join_by(YEAR)`
[1m[22mJoining with `by = join_by(YEAR)`


In [44]:
acs_amis |> 
  filter(YEAR >= 2005) |> 
  glimpse()

Rows: 1,001,521
Columns: 39
$ YEAR      [3m[38;5;246m<int>[39m[23m 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2…
$ SAMPLE    [3m[38;5;246m<int+lbl>[39m[23m 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 200501, 20050…
$ SERIAL    [3m[38;5;246m<dbl>[39m[23m 191400, 191400, 191401, 191402, 191403, 191404, 191404, 191405, 191405, 191406, 191407, 191407, 191407, 191408, 191409, 191410, 191410, 191412, 191413, 191414, 191414, 191414, 191415, 191415, 191415, 191415, 191415, 191416, 191416, 191416, 1…
$ CBSERIAL  [3m[38;5;246m<dbl>[39m[23m 1, 1, 65, 126, 215, 231, 231, 239, 239, 492, 587, 587, 587, 634, 758, 7

In [12]:
acs_amis_srvy <- acs_amis |> 
  zap_labels() |> 
  as_survey_design(
    ids = CLUSTER,
    strata = STRATA,
    weights = HHWT,
    nest=TRUE
  )

In [45]:
acs_amis_duck <- as_duckplyr_df(acs_amis)

In [46]:
ami_sort <- tibble(ami_group = c("less than or equal to 30% of MFI",
  "30% to 50% of MFI",
  "50% to 80% of MFI",
  "80% to 99% of MFI",
  "100% AMI or greater"
),
ami_sort = c(1, 2, 3, 4, 5))

In [47]:
acs_amis_duck |> 
  filter(PERNUM ==1 & GQ == 1 & HHINCOME !=9999999 & HHINCOME >0)  |> View()

In [48]:
ami_groups_by_year <-  acs_amis_duck |> 
  filter(PERNUM ==1 & GQ == 1 & HHINCOME !=9999999 & HHINCOME >0) |> 
  summarize(total_hh = sum(HHWT), .by = c(YEAR, ami_group)) |> 
  pivot_wider(names_from = YEAR, values_from = total_hh) |> 
  left_join(ami_sort) |> 
    arrange(ami_sort) |> 
    select(-ami_sort) |> 
  bind_rows(acs_mfi |> 
      select(YEAR = year, estimate) |> 
    pivot_wider(names_from = YEAR, values_from = estimate) |>
      mutate(ami_group = "median family income"))


[1m[22mJoining with `by = join_by(ami_group)`
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()
duckplyr: materializing, review details with duckplyr::last_rel()


In [38]:
max(acs_amis_duck$HHINCOME)

<labelled<double>[1]>: Total household income 
[1] 9999999

Labels:
   value label
 9999999  N/A 

In [50]:
write_csv(ami_groups_by_year, "ami_groups_by_year.csv")