In [None]:
from quilt.data.iana import birthData
import numpy as np
import pandas as pd

%matplotlib inline

# BirthData

This notebook contains natality data from the [National Center for Health Statistics](http://www.nber.org/data/vital-statistics-natality-data.html), specifically, it looks at simple patterns in a reduced set of the 2016 dataset. The data have been made available as the quilt object [iana/birthData](https://quiltdata.com/package/iana/birthData) which can be accessed from python3 with as follows
```
from quilt.data.iana import birthData
birthDF = birthData.birthDF()
```
This is the original dataframe which is probably too large to manipulate efficiently on mybinder.org (~4M lines). There is a derived dataframe consisting of around 20 columns of the original file: `birthDFr`. Here is a summary of the columns included.

  * `dob_mm`: month of birth (1..12)
  * `dob_tt`: time of birth, HMM where H is NOT zero padded [9999 N/A]
  * `dob_wk`: birth day of week. 
    - `1`: Sunday
    - `2`: Monday
    - ...
    - `7`: Saturday
  * `bfacil`: Birth Facility
    - `1`: Hospital
    - `2`: Freestanding Birth Center
    - `3`: Home (intended)
    - `4`: Home (not intended)
    - `5`: Home (unknown if intended)
    - `6`: Clinic/Doctor's office
    - `7`: Other
    - `9`: Unknown
  * `mager`: Mother's single years of age (integer years)
    - `10-12`: Age in years
    - `50`: 50+
  * `mbrace`: Bridged race mother
    - `1`: White
    - `2`: Black
    - `3`: American Indian or Alaskan Native
    - `4`: Asian or Pacific Islander
    - `0`: Other / Not classified
  * `fagercomb`:
    - `9-98`: Father's combined age in years
    - `99`: Unknown or not stated
  * `fbrace`: Bridged race father
    - `1`: White
    - `2`: Black
    - `3`: American Indian or Alaskan Native
    - `4`: Asian or Pacific Islander
    - `9`: Unknown / Not stated.
  * `ilop_r11`: Interval since last other pregnancy recode 11
    - `0`: Zero to 3 months (plural delivery)
    - `1`: 4 to 11 months
    - `2`: 12 to 17 months
    - `3`: 18 to 23 months
    - `4`: 24 to 35 months
    - `5`: 36 to 47 months
    - `6`: 48 to 59 months
    - `7`: 60 to 71 months
    - `8`: 82 months and over
    - `88`: Not applicable (1st natality event)
    - `99`: Unknown or not stated
  * `bmi`: Body Mass Index
    - `13.0-69.9`: Body Mass Index
    - `99.9`: Unknown or not stated
  * `ld_indl`: Induction of labour
    - `Y`: Yes
    - `N`: No
    - `U`: Unknown or not stated
  * `ld_augm`: Augmentation of labour
    - `Y`: Yes
    - `N`: No
    - `U`: Unknown or not stated
  * `ld_anes`: Anesthesia
    - `Y`: Yes
    - `N`: No
    - `U`: Unknown or not stated
  * `me_rout`: Final Route and Method of delivery
    - `1`: Spontaneous
    - `2`: Forceps
    - `3`: Vacuum
    - `4`: Cesarean
    - `9`: Unknown or not stated
  * `me_trial`: Trial of labour attempted (if cesarean)
    - `Y`: Yes
    - `N`: No
    - `X`: Not applicable
    - `Y`: Unknown or not stated
  * `attend`: Attendant at Birth
    - `1`: Doctor of Medicine (MD)
    - `2`: Doctor of Osteopathy (DO)
    - `3`: Certified Nurse Midwife (CNM)
    - `4`: Other Midwife
    - `5`: Other
    - `9`: Unknown or not stated
  * `pay`: Payment source for delivery
    - `1`: Medicade
    - `2`: Private Insurance
    - `3`: Self-Pay
    - `4`: Indian Health Service
    - `5`: CHAMPUS/TRICARE
    - `6`: Other government (federal, state, local)
    - `8`: Other
    - `9`: Unknown
  * `sex`: Sex of infant
    - `M`: Male
    - `F`: Female
  * `dbwt`: Birth weight - detail in grams
    - `227-8166`: Number of grams
    - `9999`: Not stated.
 
Even this reduced data set is quite large, if you run into problems, you can try dropping columns from pandas with `birthDF.drop('column')`. 

The [NBER site](http://www.nber.org/data/vital-statistics-natality-data.html) includes a dsecription file [desc.txt](http://www.nber.org/natality/2016/desc/natl2016/desc.txt) which contains basic information on the columns. More information including definitions of the keys and values is available (in PDF form) in the [2016 natality Dataset User Guide](http://www.nber.org/natality/2016/UserGuide2016.pdf).

In [None]:
birthDF = birthData.birthDFr()

The dob_tt column "sort of" looks like a timestamp, so let's convert it to something that pandas can use as a timestamp. Here is the process
  
  1. Remove invalid entries (9999 = "not stated")
  2. zero pad strings so that e.g. 736 becomes 0736
  3. run the result through to_datetime
  

In [None]:
birthDF.loc[birthDF['dob_tt'] == '9999', ['dob_tt']] = np.NaN
birthDF['dob_tt'] = birthDF['dob_tt'].str.pad(4,fillchar='0')
birthDF['dob_tt'] = pd.to_datetime(birthDF['dob_tt'], format="%H%M")

## Time of Birth

First, we will look at a histogram of time of birth. `dob_tt` contains an integer value from 1 to 2359, where the first two digits must be the hour and the second two the minute.

In [None]:
birthDF["dob_tt"].groupby(birthDF["dob_tt"].dt.hour).count().plot(
    kind="bar", 
    figsize=(10, 10)
)

## Day of the Week

First, let's just look at births on each day, you can spot the weekend pretty easily (Sunday = 0, Saturday = 6).

In [None]:
birthDF.hist('dob_wk', bins=7, figsize=(10, 10))

### Route 1 Births

"Route 1" corresponds to spontaneous birth, eh ... naturally. Let's restrict to route_1 births and see what that does to the day-of-the-week distribution.

In [None]:
birthDF[birthDF['me_rout'] == 1].hist('dob_wk', bins=7, figsize=(10, 10))

### Induced Births

Now, what happens if we restrict ourselves to births which were _not_ induced

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N')].hist('dob_wk', bins=7, figsize=(10, 10))

That looks a bit flatter, so what happens if we pick out different "Attendant at Birth" values.

### Attendant at Birth

#### Doctors of Medicine

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N') &
        (birthDF['attend'] == 1)].hist('dob_wk', bins=7, figsize=(10, 10))

#### Doctor of Osteopathy

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N') &
        (birthDF['attend'] == 2)].hist('dob_wk', bins=7, figsize=(10, 10))

#### Certified Nurse Midwife

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N') &
        (birthDF['attend'] == 3)].hist('dob_wk', bins=7, figsize=(10, 10))

#### Other midwife

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N') &
        (birthDF['attend'] == 4)].hist('dob_wk', bins=7, figsize=(10, 10))

#### Other

In [None]:
birthDF[(birthDF['me_rout'] == 1) & 
        (birthDF['ld_indl'] == 'N') &
        (birthDF['attend'] == 5)].hist('dob_wk', bins=7, figsize=(10, 10))