# Processing Input Data

This notebook is meant to process data for use in generating visualizations for the Clovis, NM City Council Presentation. It generates CSVs that are easy to use with Pandas and Matplotlib in the next notebook.

Unlike most Jupyter Notebooks, this notebook is meant to be used with an R kernel rather than a Python kernel, because the input files are .sta files, which R can easily handle with the haven package.

In [1]:
#This cell imports the R libraries that will be used to process the data
library(haven)
library(tidyverse) 

"package 'tidyverse' was built under R version 3.6.3"
-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.6     [32mv[39m [34mdplyr  [39m 1.0.8
[32mv[39m [34mtidyr  [39m 1.2.0     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 2.1.2     [32mv[39m [34mforcats[39m 0.5.1

"package 'purrr' was built under R version 3.6.3"
"package 'forcats' was built under R version 3.6.3"
-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
#This cell takes in the HMS smoke days data for U.S. counties and filters for only the tuples that contain 
#infortmation for Curry County, NM. It also drops all extraneous fields

county_smoke_annual <- as.data.frame(read_dta("raw_data/county_smoke_annual.dta"))
#Fip for curry county is 35009
target_FIPS_code <- 35009
county_smoke_curry = filter(county_smoke_annual, COUNTY10 == target_FIPS_code)

curry_NM_hms_data = data.frame(
    county_fips = county_smoke_curry['COUNTY10'],
    rfrnc_yr = county_smoke_curry['rfrnc_yr'],
    num_smoke_days = county_smoke_curry['hms_deep_1'],
    pm25 = county_smoke_curry['pm25']
    
)

colnames(curry_NM_hms_data) <- c("county_fips", "rfrnc_yr", "num_smoke_days", "pm2.5")

write.csv(curry_NM_hms_data, "processed_input_data/curry_NM_hms_data.csv", row.names=FALSE)

In [3]:
#This cell takes in the QWI data, both aggregated by the total for each county/year and separated
#into totals by age gropp, for U.S. counties and filters for only the tuples that contain infortmation
#for Curry County, NM.

qwi_agegrp_county_quarterly <- as.data.frame(read_dta("raw_data/qwi_agegrp_county_quarterly.dta"))

qwi_county_quarterly <- as.data.frame(read_dta("raw_data/qwi_county_quarterly.dta"))

curry_qwi_county_quarterly <-filter(qwi_county_quarterly, countyfip == target_FIPS_code)
curry_qwi_agegrp_county_quarterly <-filter(qwi_agegrp_county_quarterly, countyfip == target_FIPS_code)


write.csv(curry_qwi_county_quarterly, "processed_input_data/curry_qwi_county_quarterly.csv",
          row.names = FALSE)
write.csv(curry_qwi_agegrp_county_quarterly, "processed_input_data/curry_qwi_agegrp_county_quarterly.csv", 
          row.names = FALSE)