Social Categorizations in CRAN Data
Simon P. Couch
The Comprehensive R Archive Network now hosts over 15,000 R packages, many of which provide datasets supplying demographic information on people. How do the choices that R users make in constructing datasets reflect their understandings of social categories? Are these decisions patterned by discipline? Rather than viewing datasets as “raw” or unargumentative, I argue that the way we name columns, write descriptions of these columns in codebooks, construct categories within which subjects must identify (or be assigned), and assign numerical (and by extension, ordinal) values to categories is a process of argumentation. Using CRAN data, I attempt to characterize how sex, gender, race, and ethnicity are conceptualized by this influential portion of the R community, and draw from intersectional feminist literature to reflect on best practices for encoding social divisions more thoughtfully.
This project began as a term paper for SOC 326 (Science and Social Difference) at Reed College in Fall 2019, taught by Dr. Kjersten Whittington.
The full paper is named Couch_Social_Divisions_2019.pdf in this repository.