-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add "babies" dataset referenced in EOCE 6.1 of ISRS #4
Comments
Oh...nevermind. |
Upon further review, these data sets do not appear to be the same. In particular, the Can anyone illuminate? |
I'm afraid there is a data provenance issue here and I have not been able to track down the origin any further than what is stated in the help file of the package. |
I feel like this could be a good opportunity to update the ncbirths dataset, using data from 2019. Pretty sure all my wrangling code would work on the new data, the only tricky piece is that real ages are redacted in the public-facing data, so the only thing available are ranges. Maybe someone has a smart idea for how to impute some ages or just randomly assign an age in the range. |
I think it's a great idea to have the updated datasets here @AmeliaMN! We use the existing dataset in current books so I'd hesitate to replace it -- we can put a note in the docs clarifying the provenance issue as well as suggesting using the newer version. Once it's out of all the most recent editions of the books we could consider deprecating it. I wonder about naming, how about Also, ages, hm... First question that comes to mind is, do we have to have ages? I don't have a great suggestion for imputing but could look up an appropriate method. Selecting from a random distribution in the range should be straightforward. |
That's fair, and I have seen other textbooks do similar things. For example, |
Given that the age bands are not very wide, I wouldn't be opposed to a random draw from a uniform in that range. We can place the data prep code in the |
I started working on a PR and realized a couple things: 1. it seems like the most recent natality data is from 2014, and 2. probably the reason the |
I think that's perfectly fine! |
Unless I am missing it, this is neither the
births
nor thencbirths
data set.Side note: Is it necessary to have both
births
andncbirths
?The text was updated successfully, but these errors were encountered: