Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter() with database object failing #173

Open
jamgreen opened this issue Jun 17, 2024 · 0 comments
Open

filter() with database object failing #173

jamgreen opened this issue Jun 17, 2024 · 0 comments

Comments

@jamgreen
Copy link

I am working with ACS PUMS data. I am able to use srvyr successfully working with my dataframe in memory, but wanted to experiment with working from a database. Basic calculations are fine, but filter() fails with the following error:

Adding missing grouping variables: state_nameError indplyr::filter(): ℹ In argument: as.logical(svy$variables$__SRVYR_SUBSET_VAR__). ℹ In group 1: state_name = "Delaware". Caused by error: ! ..1 must be of size 45424 or 1, not size 2969715.

Here is the set up:

library(dplyr)
library(srvyr)
library(RSQLite)
library(data.table)

pums <- fread(here::here("data/pums/usa_00091.csv.gz"), 
                          colClasses = list(character = c("STATEFIP", "PUMA", "METRO", 
                                                          "OCC", "OCCSOC", "EDUC", "EDUCD",
                                                          "DEGFIELD", "DEGFIELDD"))) |> 
  janitor::clean_names("snake")

xwalk <- read_csv(here::here("data/crosswalks/UIUC_pums_occupation_codes.csv"), col_types = "ccc") |> 
  janitor::clean_names("snake")

state_fips <- tidycensus::fips_codes |> 
  distinct(state_code, state_name)

pums <- pums |> 
  left_join(xwalk, by = c("occ" = "pums_occ_code")) |> 
  mutate(artist_dummy = if_else(!is.na(pums_occ_group_title), "Artist", "Non-Artist")) |> 
  left_join(state_fips, by = c("statefip" = "state_code"))

pums <- pums |> 
  mutate(educ_cat = case_when(educ == "0" ~ "No schooling",
                              educ == "1" ~ "Up to grade 4",
                              educ == "2" ~ "Grades 5-8",
                              educ == "3" ~ "Grade 9",
                              educ == "4" ~ "Grade 10",
                              educ == "5" ~ "Grade 11",
                              educ == "6" ~ "Grade 12",
                              educ == "7" ~ "1 year of college",
                              educ == "8" ~ "2 years of college",
                              educ == "9" ~ "3 years of college",
                              educ == "10" ~ "4 years of college",
                              educ == "11" ~ "5 years of college",
                              educ == "99" ~ "Missing Education"))

mydb <- dbConnect(RSQLite::SQLite(), "data/pums/pca_pums.sqlite")
dbWriteTable(conn = mydb, name = "pums", pums, overwrite = TRUE)

pums_db <- tbl(src = mydb, "pums")

mydb_srvy <- pums_db |> 
  as_survey_rep(
    weight = perwt,
    repweights = matches("repwtp[0-9]+") ,
    scale = 4 / 80,
    rscales = rep(1 , 80),
    mse = TRUE,
    type = "JK1", 
    variables = -c(matches("^repwtp"))
  )

mydb_srvy |>
 filter(artist_dummy == "Artist") |> 
group_by(state_name) |> survey_count()

I am unclear as to what I'm doing incorrectly or if there is some kind of conflict I am unaware of. Any clarification on this would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant