Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssa_national set is not gender balanced #9

Closed
bmschmidt opened this issue Jun 25, 2014 · 3 comments
Closed

ssa_national set is not gender balanced #9

bmschmidt opened this issue Jun 25, 2014 · 3 comments
Labels
bug

Comments

@bmschmidt
Copy link
Contributor

@bmschmidt bmschmidt commented Jun 25, 2014

method="ssa" could probably use some adjustment to compensate for something I noticed in the Social Security dataset: before about 1918, it's two-thirds women.

I assume this has to do with who was eligible for benefits when the program was created in the late 30s: either the men born around 1900 are dead, or more likely they're not eligible for survivor benefits for spouses or something.

All the ratios for years around 1900 from this method are distorting the female % of the name: for example, in 1901 merle has 91 women and 52 men counted, but since 69% of the sample is female that year, that male number should be 2.2x higher: the right prediction that it's male, not female.

An illustrative plot:

gender::ssa_national %>% group_by(year) %>% summarize(ratio = sum(female)/sum(female+male)) %>% ggplot(aes(x=year,y=ratio)) + geom_line() + labs(title="Percentage of the set that is women")

Illustrative chart

@lmullen
Copy link
Member

@lmullen lmullen commented Jun 25, 2014

Thanks for pointing this out, @bmschmidt. I'll have to take it into account.

@lmullen
Copy link
Member

@lmullen lmullen commented Jul 16, 2014

First stab at a solution with justification for the reasoning: http://rpubs.com/lmullen/gender-imbalance-ssa

lmullen added a commit that referenced this issue Jul 22, 2014
This commit adds a function that calculates correction factors for skewed
gender ratios in the SSA data (as explained in #9).

This commit doesn't pass all the tests when the `gender` function is passed a
data frame of values and the `years` parameter is a logical. This was a bad
design for the function and it is unnecessary because `Map()` and other
functional programming methods in R can let the user do this in more sensible
ways. The next release will incorporate breaking changes to the way that the
`gender` function works, so these tests will be fixed there.

Fixes #9
@lmullen
Copy link
Member

@lmullen lmullen commented Jul 22, 2014

Fixed on develop.

@lmullen lmullen closed this Jul 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.