add "babies" dataset referenced in EOCE 6.1 of ISRS #4

beanumber · 2015-09-30T14:14:07Z

Unless I am missing it, this is neither the births nor the ncbirths data set.

Side note: Is it necessary to have both births and ncbirths?

The text was updated successfully, but these errors were encountered:

beanumber · 2015-09-30T14:18:07Z

Oh...nevermind.
It seems that this is the same as mosaicData::Gestation
But then shouldn't this package import mosaicData?

beanumber · 2015-09-30T14:30:53Z

Upon further review, these data sets do not appear to be the same. In particular, the parity and smoke variables are handled differently. In babies they are binary, but in mosaicData::Gestation they are not.

Can anyone illuminate?

mine-cetinkaya-rundel · 2017-09-10T02:40:39Z

I'm afraid there is a data provenance issue here and I have not been able to track down the origin any further than what is stated in the help file of the package.

mine-cetinkaya-rundel · 2020-11-24T01:09:44Z

More on this at https://twitter.com/AmeliaMN/status/1331037890382467076?s=20

@AmeliaMN @hardin47

AmeliaMN · 2020-11-24T17:46:01Z

I feel like this could be a good opportunity to update the ncbirths dataset, using data from 2019. Pretty sure all my wrangling code would work on the new data, the only tricky piece is that real ages are redacted in the public-facing data, so the only thing available are ranges. Maybe someone has a smart idea for how to impute some ages or just randomly assign an age in the range.

mine-cetinkaya-rundel · 2020-11-24T18:56:43Z

I think it's a great idea to have the updated datasets here @AmeliaMN! We use the existing dataset in current books so I'd hesitate to replace it -- we can put a note in the docs clarifying the provenance issue as well as suggesting using the newer version. Once it's out of all the most recent editions of the books we could consider deprecating it.

I wonder about naming, how about ncbirths19?

Also, ages, hm... First question that comes to mind is, do we have to have ages? I don't have a great suggestion for imputing but could look up an appropriate method. Selecting from a random distribution in the range should be straightforward.

AmeliaMN · 2020-11-24T20:55:06Z

That's fair, and I have seen other textbooks do similar things. For example, Stat2Data::BaseballTimes vs Stat2Data::BaseballTimes2017 or Lock5Data::HollywoodMovies vs Lock5Data::HollywoodMovies2011. I think ages was a nice variable because it is numeric, and this dataset gets used places like the inference for numeric data lab (that's probably an old link, just easy to put my hands on) and you exploring NC births lab.

mine-cetinkaya-rundel · 2020-11-24T21:00:51Z

Given that the age bands are not very wide, I wouldn't be opposed to a random draw from a uniform in that range. We can place the data prep code in the data-raw folder. I'd be happy to do this based on your work or a PR is good too, whichever you prefer!

AmeliaMN · 2020-11-25T21:09:33Z

I started working on a PR and realized a couple things: 1. it seems like the most recent natality data is from 2014, and 2. probably the reason the ncbirths data was from 2004 is that is the last year the data included state information! So, I could make a births14 dataset that would have random babies born in 2014, but they wouldn't necessarily be from North Carolina.

mine-cetinkaya-rundel · 2020-11-25T21:46:07Z

I think that's perfectly fine!

beanumber mentioned this issue Oct 1, 2015

added babies data set #5

Merged

mine-cetinkaya-rundel closed this as completed Sep 10, 2017

mine-cetinkaya-rundel reopened this Nov 24, 2020

AmeliaMN mentioned this issue Nov 25, 2020

adding births14 data #33

Merged

mine-cetinkaya-rundel added a commit to AmeliaMN/openintro that referenced this issue Feb 19, 2021

Mention births14 in other births data, closes OpenIntroStat#4

6847cde

mine-cetinkaya-rundel closed this as completed in 7578e42 Feb 19, 2021

mine-cetinkaya-rundel added a commit that referenced this issue Feb 19, 2021

Updated docs for data provenance, addresses #4

f1ec257

npaterno mentioned this issue May 8, 2023

Ims issue 4 #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add "babies" dataset referenced in EOCE 6.1 of ISRS #4

add "babies" dataset referenced in EOCE 6.1 of ISRS #4

beanumber commented Sep 30, 2015

beanumber commented Sep 30, 2015

beanumber commented Sep 30, 2015

mine-cetinkaya-rundel commented Sep 10, 2017

mine-cetinkaya-rundel commented Nov 24, 2020 •

edited

Loading

AmeliaMN commented Nov 24, 2020

mine-cetinkaya-rundel commented Nov 24, 2020

AmeliaMN commented Nov 24, 2020

mine-cetinkaya-rundel commented Nov 24, 2020

AmeliaMN commented Nov 25, 2020

mine-cetinkaya-rundel commented Nov 25, 2020

add "babies" dataset referenced in EOCE 6.1 of ISRS #4

add "babies" dataset referenced in EOCE 6.1 of ISRS #4

Comments

beanumber commented Sep 30, 2015

beanumber commented Sep 30, 2015

beanumber commented Sep 30, 2015

mine-cetinkaya-rundel commented Sep 10, 2017

mine-cetinkaya-rundel commented Nov 24, 2020 • edited Loading

AmeliaMN commented Nov 24, 2020

mine-cetinkaya-rundel commented Nov 24, 2020

AmeliaMN commented Nov 24, 2020

mine-cetinkaya-rundel commented Nov 24, 2020

AmeliaMN commented Nov 25, 2020

mine-cetinkaya-rundel commented Nov 25, 2020

mine-cetinkaya-rundel commented Nov 24, 2020 •

edited

Loading