# Think Stats 2 Chapter 1: Exploratory Data Analysis

Think Stats 2 was written by Allen B. Downey

Chapter 1: Working with survey data from the [National Survey of Family Growth (NSFG)](http://cdc.gov/nchs/nsfg.htm)

**The Question: Do women's first babies tend to be born late?**

In [1]:
import nsfg
import pandas as pd
import numpy as np

pd.options.display.max_rows = 400
pd.options.display.max_columns = 244

Below we will be importing data into a dataFrame. 

**Throughout the chapter, we will be using the following columns**:
* **caseid** is the integer ID of the respondent.
* **prglngth** is the integer duration of the pregnancy in weeks.
* **outcome** is an integer code for the outcome of the pregnancy. The code 1 indicates a live birth.
* **pregordr** is a pregnancy serial number; for example, the code for a respondent's first pregnancy is 1, for the second pregnancy is 2, and so on.
* **birthord** is a serial number for live births; the code for a respondent's first child is 1, and so on. For outcomes other than live birth, this field is blank.
* **birthwgt_lb** and **birthwgt_oz** contain the pounds and ounces parts of the birth weight of the baby.
* **agepreg** is the mother's age at the end of the pregnancy.
* **finalwgt** is the statistical weight associated with the respondent. It is a floating-point value that indicates the number of people in the U.S. population this respondent represents.

In [2]:
# ReadFemPreg takes the data files and returns a dataFrame
df = nsfg.ReadFemPreg()
df.sample(10, random_state=1)

Unnamed: 0,caseid,pregordr,howpreg_n,howpreg_p,moscurrp,nowprgdk,pregend1,pregend2,nbrnaliv,multbrth,cmotpreg,prgoutcome,cmprgend,flgdkmo1,cmprgbeg,ageatend,hpageend,gestasun_m,gestasun_w,wksgest,mosgest,dk1gest,dk2gest,dk3gest,bpa_bdscheck1,bpa_bdscheck2,bpa_bdscheck3,babysex,birthwgt_lb,birthwgt_oz,lobthwgt,babysex2,birthwgt_lb2,birthwgt_oz2,lobthwgt2,babysex3,birthwgt_lb3,birthwgt_oz3,lobthwgt3,cmbabdob,kidage,hpagelb,birthplc,paybirth1,paybirth2,paybirth3,knewpreg,trimestr,ltrimest,priorsmk,postsmks,npostsmk,getprena,bgnprena,pnctrim,lpnctri,workpreg,workborn,didwork,matweeks,weeksdk,matleave,matchfound,livehere,alivenow,cmkidied,cmkidlft,lastage,wherenow,legagree,parenend,anynurse,fedsolid,frsteatd_n,frsteatd_p,frsteatd,quitnurs,ageqtnur_n,ageqtnur_p,ageqtnur,matchfound2,livehere2,alivenow2,cmkidied2,cmkidlft2,lastage2,wherenow2,legagree2,parenend2,anynurse2,fedsolid2,frsteatd_n2,frsteatd_p2,frsteatd2,quitnurs2,ageqtnur_n2,ageqtnur_p2,ageqtnur2,matchfound3,livehere3,alivenow3,cmkidied3,cmkidlft3,lastage3,wherenow3,legagree3,parenend3,anynurse3,fedsolid3,frsteatd_n3,frsteatd_p3,frsteatd3,quitnurs3,ageqtnur_n3,ageqtnur_p3,ageqtnur3,cmlastlb,cmfstprg,cmlstprg,cmintstr,cmintfin,cmintstrop,cmintfinop,cmintstrcr,cmintfincr,evuseint,stopduse,whystopd,whatmeth01,whatmeth02,whatmeth03,whatmeth04,resnouse,wantbold,probbabe,cnfrmno,wantbld2,timingok,toosoon_n,toosoon_p,wthpart1,wthpart2,feelinpg,hpwnold,timokhp,cohpbeg,cohpend,tellfath,whentell,tryscale,wantscal,whyprg1,whyprg2,whynouse1,whynouse2,whynouse3,anyusint,prglngth,outcome,birthord,datend,agepreg,datecon,agecon,fmarout5,pmarpreg,rmarout6,fmarcon5,learnprg,pncarewk,paydeliv,lbw1,bfeedwks,maternlv,oldwantr,oldwantp,wantresp,wantpart,cmbirth,ager,agescrn,fmarital,rmarital,educat,hieduc,race,hispanic,hisprace,rcurpreg,pregnum,parity,insuranc,pubassis,poverty,laborfor,religion,metro,brnout,yrstrus,prglngth_i,outcome_i,birthord_i,datend_i,agepreg_i,datecon_i,agecon_i,fmarout5_i,pmarpreg_i,rmarout6_i,fmarcon5_i,learnprg_i,pncarewk_i,paydeliv_i,lbw1_i,bfeedwks_i,maternlv_i,oldwantr_i,oldwantp_i,wantresp_i,wantpart_i,ager_i,fmarital_i,rmarital_i,educat_i,hieduc_i,race_i,hispanic_i,hisprace_i,rcurpreg_i,pregnum_i,parity_i,insuranc_i,pubassis_i,poverty_i,laborfor_i,religion_i,metro_i,basewgt,adj_mod_basewgt,finalwgt,secu_p,sest,cmintvw,totalwgt_lb
6485,5885,2,,,,,6.0,,1.0,,,1.0,1218.0,,1208.0,,,10.0,0.0,43.0,10.0,,,,0.0,,,1.0,9.0,8.0,,,,,,,,,,1218.0,16.0,21.0,1.0,3.0,,,2.0,,,2.0,1.0,1.0,1.0,3.0,,,1.0,5.0,2.0,,,,1.0,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1218.0,1186.0,1218.0,1186.0,1218.0,1218.0,1234.0,,,,,,,,,,5.0,5.0,,,,,,,,4.0,1.0,5,,,,1.0,1.0,0.0,0.0,,,4.0,,,1,43,1,2.0,1218.0,20.41,1208,1958,1.0,2.0,1.0,1,2.0,3.0,4.0,2.0,995.0,3.0,5,5,5,5,973,21,21,1,1,9,5,2,2,2,2,2,2,3,1,128,7,3,3,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2064.778326,2179.151111,3065.411867,1,61,,9.5
11431,10510,1,,,,,6.0,,1.0,,,1.0,1093.0,,1084.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,2.0,7.0,10.0,,,,,,,,,,1093.0,143.0,17.0,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,1.0,,1.0,1.0,1.0,,6.0,1.0,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1156.0,1093.0,1175.0,1057.0,1093.0,,,,,1.0,1.0,1.0,,,,,,,,,,2.0,,,1.0,,,1,2.0,5.0,1.0,,,,,,,,,,5,40,1,1.0,1093.0,16.83,1084,1608,1.0,2.0,1.0,5,,,,2.0,26.0,,2,2,2,2,891,28,28,1,1,12,9,2,2,2,2,3,2,3,1,64,1,1,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3195.720437,6133.206275,10315.439634,2,10,,7.625
12355,11426,2,,,,,6.0,,1.0,,,1.0,1106.0,,1097.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,2.0,6.0,15.0,,,,,,,,,,1106.0,125.0,27.0,,,,,,,,,,,,,,,,,,,,,5.0,1.0,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1194.0,1099.0,1194.0,1099.0,1106.0,,,,,5.0,,,,,,,1.0,,,,,2.0,,,1.0,,,1,2.0,5.0,1.0,,,,,,,,,,5,40,1,1.0,1106.0,25.33,1097,2458,1.0,2.0,1.0,5,,,,2.0,995.0,,2,2,2,2,802,35,35,3,4,17,12,1,2,3,2,3,2,2,2,289,1,3,2,5,,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1806.209036,2140.312939,3275.748953,2,57,,6.9375
2711,2385,1,,,,,6.0,,1.0,,,1.0,1180.0,,1173.0,,,7.0,0.0,30.0,7.0,,,,0.0,,,1.0,3.0,10.0,,,,,,,,,,1180.0,54.0,19.0,1.0,3.0,,,4.0,,,0.0,5.0,,1.0,6.0,,,5.0,,,,,,5.0,1.0,,,,,,,,1.0,,3.0,2.0,1.0,,2.0,1.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1180.0,1180.0,1180.0,1171.0,1180.0,1180.0,1234.0,,,1.0,5.0,,4.0,,,,,1.0,,,,1.0,2.0,2.0,,2.0,,1,1.0,5.0,5.0,1.0,1.0,,,,,,,,5,30,1,1.0,1180.0,15.83,1173,1525,5.0,1.0,6.0,5,4.0,6.0,4.0,1.0,9.0,0.0,3,3,3,3,990,20,20,5,6,10,6,2,2,2,2,1,1,4,1,100,1,3,2,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3612.950932,6536.613066,9195.0536,2,20,,3.625
2257,2017,3,,,,,3.0,,,,1202.0,2.0,1202.0,0.0,1201.0,,25.0,0.0,4.0,4.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1177.0,1160.0,1202.0,1177.0,1202.0,1202.0,1232.0,,,,,,7.0,11.0,20.0,,,1.0,,,,1.0,4.0,2.0,,1.0,1.0,1,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,2.0,,,,1,4,2,,1202.0,24.91,1201,2483,5.0,1.0,6.0,5,,,,,,,3,3,3,3,903,27,27,1,1,17,12,2,2,2,2,3,1,3,1,127,3,4,2,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4505.403774,5349.521752,8997.360634,2,23,,
3095,2740,1,,,,,4.0,,,,1075.0,2.0,1075.0,0.0,1070.0,,20.0,5.0,0.0,22.0,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1075.0,1181.0,990.0,1075.0,,,,,5.0,,,,,,,5.0,1.0,,,,2.0,,,1.0,,,1,2.0,1.0,1.0,1.0,1.0,,,,,,,,5,22,5,,1075.0,26.41,1070,2600,5.0,1.0,5.0,5,,,,,,,2,2,2,2,758,39,39,1,1,12,9,2,2,2,2,3,0,1,2,100,1,3,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4871.099223,5280.741932,9636.825952,2,38,,
10833,9979,3,,,,,5.0,,1.0,,,1.0,1203.0,,1194.0,,,9.0,0.0,39.0,9.0,,,,0.0,,,1.0,6.0,12.0,,,,,,,,,,1203.0,32.0,34.0,1.0,1.0,,,5.0,,,0.0,5.0,,1.0,6.0,,,1.0,1.0,,16.0,,16.0,5.0,1.0,,,,,,,,1.0,,2.0,1.0,2.0,,9.0,1.0,9.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1203.0,1045.0,1203.0,1191.0,1203.0,,,,,,,,,,,,1.0,,,,,2.0,,,1.0,,10.0,1,2.0,,,1.0,1.0,10.0,10.0,,,,,,1,39,1,1.0,1203.0,34.66,1194,3391,1.0,2.0,1.0,1,5.0,6.0,2.0,2.0,39.0,1.0,2,2,2,2,787,37,37,1,1,19,13,2,2,2,1,4,1,2,2,400,1,4,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4873.016667,6465.617461,11799.105305,1,15,,6.75
9179,8442,1,,,,,1.0,,,,1132.0,2.0,1132.0,1.0,1130.0,23.0,27.0,2.0,0.0,9.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1168.0,1132.0,1168.0,1027.0,1132.0,,,,,1.0,1.0,1.0,,,,,,,,,,3.0,,,1.0,,,1,3.0,,,5.0,,,,,,,,,5,9,4,,1132.0,23.25,1130,2308,1.0,2.0,1.0,1,,,,,,,1,1,1,1,853,31,31,3,4,13,10,2,2,2,2,3,2,2,2,127,2,2,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3408.342437,4225.73621,7331.652325,1,48,,
2226,2000,1,,,,,1.0,,,,1083.0,2.0,1083.0,0.0,1080.0,,32.0,3.0,0.0,13.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1163.0,1083.0,1163.0,1072.0,1083.0,,,,,1.0,1.0,1.0,,,,,,,,,,2.0,,,1.0,,,1,2.0,,,1.0,1.0,,,,,,,,5,13,4,,1083.0,22.0,1080,2175,1.0,2.0,1.0,1,,,,,,,2,2,2,2,819,34,34,1,1,12,9,2,1,1,2,3,2,1,2,78,2,2,2,1,1989.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3300.944073,3939.740507,4999.924677,2,46,,
13240,12194,1,,,,,6.0,,1.0,,,1.0,1030.0,,1021.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,1.0,7.0,3.0,,,,,,,,,,1030.0,204.0,22.0,,,,,,,,,,,,,,,,,,,,,5.0,1.0,,,,,,,,1.0,,1.0,2.0,0.0,,1.0,2.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1072.0,1030.0,1072.0,1002.0,1030.0,,,,,1.0,5.0,,3.0,,,,,1.0,,,,1.0,2.0,2.0,,3.0,,5,,5.0,5.0,5.0,,,,,,,,,5,40,1,1.0,1030.0,23.16,1021,2241,5.0,1.0,6.0,5,,,,2.0,1.0,,3,5,3,5,752,40,40,3,4,13,11,2,2,2,2,2,2,2,2,328,1,3,3,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13638.640857,16368.925374,27278.814136,2,74,,7.1875


In [3]:
df.columns

Index(['caseid', 'pregordr', 'howpreg_n', 'howpreg_p', 'moscurrp', 'nowprgdk',
       'pregend1', 'pregend2', 'nbrnaliv', 'multbrth',
       ...
       'laborfor_i', 'religion_i', 'metro_i', 'basewgt', 'adj_mod_basewgt',
       'finalwgt', 'secu_p', 'sest', 'cmintvw', 'totalwgt_lb'],
      dtype='object', length=244)

The **CleanFemPreg** function cleans up the **agepreg** columns as well as replaces the numerical N/A values with np.NaN values.

In [4]:
nsfg.CleanFemPreg(df)
df.sample(10, random_state=1)

Unnamed: 0,caseid,pregordr,howpreg_n,howpreg_p,moscurrp,nowprgdk,pregend1,pregend2,nbrnaliv,multbrth,cmotpreg,prgoutcome,cmprgend,flgdkmo1,cmprgbeg,ageatend,hpageend,gestasun_m,gestasun_w,wksgest,mosgest,dk1gest,dk2gest,dk3gest,bpa_bdscheck1,bpa_bdscheck2,bpa_bdscheck3,babysex,birthwgt_lb,birthwgt_oz,lobthwgt,babysex2,birthwgt_lb2,birthwgt_oz2,lobthwgt2,babysex3,birthwgt_lb3,birthwgt_oz3,lobthwgt3,cmbabdob,kidage,hpagelb,birthplc,paybirth1,paybirth2,paybirth3,knewpreg,trimestr,ltrimest,priorsmk,postsmks,npostsmk,getprena,bgnprena,pnctrim,lpnctri,workpreg,workborn,didwork,matweeks,weeksdk,matleave,matchfound,livehere,alivenow,cmkidied,cmkidlft,lastage,wherenow,legagree,parenend,anynurse,fedsolid,frsteatd_n,frsteatd_p,frsteatd,quitnurs,ageqtnur_n,ageqtnur_p,ageqtnur,matchfound2,livehere2,alivenow2,cmkidied2,cmkidlft2,lastage2,wherenow2,legagree2,parenend2,anynurse2,fedsolid2,frsteatd_n2,frsteatd_p2,frsteatd2,quitnurs2,ageqtnur_n2,ageqtnur_p2,ageqtnur2,matchfound3,livehere3,alivenow3,cmkidied3,cmkidlft3,lastage3,wherenow3,legagree3,parenend3,anynurse3,fedsolid3,frsteatd_n3,frsteatd_p3,frsteatd3,quitnurs3,ageqtnur_n3,ageqtnur_p3,ageqtnur3,cmlastlb,cmfstprg,cmlstprg,cmintstr,cmintfin,cmintstrop,cmintfinop,cmintstrcr,cmintfincr,evuseint,stopduse,whystopd,whatmeth01,whatmeth02,whatmeth03,whatmeth04,resnouse,wantbold,probbabe,cnfrmno,wantbld2,timingok,toosoon_n,toosoon_p,wthpart1,wthpart2,feelinpg,hpwnold,timokhp,cohpbeg,cohpend,tellfath,whentell,tryscale,wantscal,whyprg1,whyprg2,whynouse1,whynouse2,whynouse3,anyusint,prglngth,outcome,birthord,datend,agepreg,datecon,agecon,fmarout5,pmarpreg,rmarout6,fmarcon5,learnprg,pncarewk,paydeliv,lbw1,bfeedwks,maternlv,oldwantr,oldwantp,wantresp,wantpart,cmbirth,ager,agescrn,fmarital,rmarital,educat,hieduc,race,hispanic,hisprace,rcurpreg,pregnum,parity,insuranc,pubassis,poverty,laborfor,religion,metro,brnout,yrstrus,prglngth_i,outcome_i,birthord_i,datend_i,agepreg_i,datecon_i,agecon_i,fmarout5_i,pmarpreg_i,rmarout6_i,fmarcon5_i,learnprg_i,pncarewk_i,paydeliv_i,lbw1_i,bfeedwks_i,maternlv_i,oldwantr_i,oldwantp_i,wantresp_i,wantpart_i,ager_i,fmarital_i,rmarital_i,educat_i,hieduc_i,race_i,hispanic_i,hisprace_i,rcurpreg_i,pregnum_i,parity_i,insuranc_i,pubassis_i,poverty_i,laborfor_i,religion_i,metro_i,basewgt,adj_mod_basewgt,finalwgt,secu_p,sest,cmintvw,totalwgt_lb
6485,5885,2,,,,,6.0,,1.0,,,1.0,1218.0,,1208.0,,,10.0,0.0,43.0,10.0,,,,0.0,,,1.0,9.0,8.0,,,,,,,,,,1218.0,16.0,21.0,1.0,3.0,,,2.0,,,2.0,1.0,1.0,1.0,3.0,,,1.0,5.0,2.0,,,,1.0,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1218.0,1186.0,1218.0,1186.0,1218.0,1218.0,1234.0,,,,,,,,,,5.0,5.0,,,,,,,,4.0,1.0,5,,,,1.0,1.0,0.0,0.0,,,4.0,,,1,43,1,2.0,1218.0,0.2041,1208,1958,1.0,2.0,1.0,1,2.0,3.0,4.0,2.0,995.0,3.0,5,5,5,5,973,21,21,1,1,9,5,2,2,2,2,2,2,3,1,128,7,3,3,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2064.778326,2179.151111,3065.411867,1,61,,9.5
11431,10510,1,,,,,6.0,,1.0,,,1.0,1093.0,,1084.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,2.0,7.0,10.0,,,,,,,,,,1093.0,143.0,17.0,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,1.0,,1.0,1.0,1.0,,6.0,1.0,6.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1156.0,1093.0,1175.0,1057.0,1093.0,,,,,1.0,1.0,1.0,,,,,,,,,,2.0,,,1.0,,,1,2.0,5.0,1.0,,,,,,,,,,5,40,1,1.0,1093.0,0.1683,1084,1608,1.0,2.0,1.0,5,,,,2.0,26.0,,2,2,2,2,891,28,28,1,1,12,9,2,2,2,2,3,2,3,1,64,1,1,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3195.720437,6133.206275,10315.439634,2,10,,7.625
12355,11426,2,,,,,6.0,,1.0,,,1.0,1106.0,,1097.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,2.0,6.0,15.0,,,,,,,,,,1106.0,125.0,27.0,,,,,,,,,,,,,,,,,,,,,5.0,1.0,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1194.0,1099.0,1194.0,1099.0,1106.0,,,,,5.0,,,,,,,1.0,,,,,2.0,,,1.0,,,1,2.0,5.0,1.0,,,,,,,,,,5,40,1,1.0,1106.0,0.2533,1097,2458,1.0,2.0,1.0,5,,,,2.0,995.0,,2,2,2,2,802,35,35,3,4,17,12,1,2,3,2,3,2,2,2,289,1,3,2,5,,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1806.209036,2140.312939,3275.748953,2,57,,6.9375
2711,2385,1,,,,,6.0,,1.0,,,1.0,1180.0,,1173.0,,,7.0,0.0,30.0,7.0,,,,0.0,,,1.0,3.0,10.0,,,,,,,,,,1180.0,54.0,19.0,1.0,3.0,,,4.0,,,0.0,5.0,,1.0,6.0,,,5.0,,,,,,5.0,1.0,,,,,,,,1.0,,3.0,2.0,1.0,,2.0,1.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1180.0,1180.0,1180.0,1171.0,1180.0,1180.0,1234.0,,,1.0,5.0,,4.0,,,,,1.0,,,,1.0,2.0,2.0,,2.0,,1,1.0,5.0,5.0,1.0,1.0,,,,,,,,5,30,1,1.0,1180.0,0.1583,1173,1525,5.0,1.0,6.0,5,4.0,6.0,4.0,1.0,9.0,0.0,3,3,3,3,990,20,20,5,6,10,6,2,2,2,2,1,1,4,1,100,1,3,2,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3612.950932,6536.613066,9195.0536,2,20,,3.625
2257,2017,3,,,,,3.0,,,,1202.0,2.0,1202.0,0.0,1201.0,,25.0,0.0,4.0,4.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1177.0,1160.0,1202.0,1177.0,1202.0,1202.0,1232.0,,,,,,7.0,11.0,20.0,,,1.0,,,,1.0,4.0,2.0,,1.0,1.0,1,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,2.0,,,,1,4,2,,1202.0,0.2491,1201,2483,5.0,1.0,6.0,5,,,,,,,3,3,3,3,903,27,27,1,1,17,12,2,2,2,2,3,1,3,1,127,3,4,2,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4505.403774,5349.521752,8997.360634,2,23,,
3095,2740,1,,,,,4.0,,,,1075.0,2.0,1075.0,0.0,1070.0,,20.0,5.0,0.0,22.0,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1075.0,1181.0,990.0,1075.0,,,,,5.0,,,,,,,5.0,1.0,,,,2.0,,,1.0,,,1,2.0,1.0,1.0,1.0,1.0,,,,,,,,5,22,5,,1075.0,0.2641,1070,2600,5.0,1.0,5.0,5,,,,,,,2,2,2,2,758,39,39,1,1,12,9,2,2,2,2,3,0,1,2,100,1,3,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4871.099223,5280.741932,9636.825952,2,38,,
10833,9979,3,,,,,5.0,,1.0,,,1.0,1203.0,,1194.0,,,9.0,0.0,39.0,9.0,,,,0.0,,,1.0,6.0,12.0,,,,,,,,,,1203.0,32.0,34.0,1.0,1.0,,,5.0,,,0.0,5.0,,1.0,6.0,,,1.0,1.0,,16.0,,16.0,5.0,1.0,,,,,,,,1.0,,2.0,1.0,2.0,,9.0,1.0,9.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1203.0,1045.0,1203.0,1191.0,1203.0,,,,,,,,,,,,1.0,,,,,2.0,,,1.0,,10.0,1,2.0,,,1.0,1.0,10.0,10.0,,,,,,1,39,1,1.0,1203.0,0.3466,1194,3391,1.0,2.0,1.0,1,5.0,6.0,2.0,2.0,39.0,1.0,2,2,2,2,787,37,37,1,1,19,13,2,2,2,1,4,1,2,2,400,1,4,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4873.016667,6465.617461,11799.105305,1,15,,6.75
9179,8442,1,,,,,1.0,,,,1132.0,2.0,1132.0,1.0,1130.0,23.0,27.0,2.0,0.0,9.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1168.0,1132.0,1168.0,1027.0,1132.0,,,,,1.0,1.0,1.0,,,,,,,,,,3.0,,,1.0,,,1,3.0,,,5.0,,,,,,,,,5,9,4,,1132.0,0.2325,1130,2308,1.0,2.0,1.0,1,,,,,,,1,1,1,1,853,31,31,3,4,13,10,2,2,2,2,3,2,2,2,127,2,2,1,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3408.342437,4225.73621,7331.652325,1,48,,
2226,2000,1,,,,,1.0,,,,1083.0,2.0,1083.0,0.0,1080.0,,32.0,3.0,0.0,13.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1163.0,1083.0,1163.0,1072.0,1083.0,,,,,1.0,1.0,1.0,,,,,,,,,,2.0,,,1.0,,,1,2.0,,,1.0,1.0,,,,,,,,5,13,4,,1083.0,0.22,1080,2175,1.0,2.0,1.0,1,,,,,,,2,2,2,2,819,34,34,1,1,12,9,2,1,1,2,3,2,1,2,78,2,2,2,1,1989.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3300.944073,3939.740507,4999.924677,2,46,,
13240,12194,1,,,,,6.0,,1.0,,,1.0,1030.0,,1021.0,,,0.0,40.0,40.0,9.0,,,,0.0,,,1.0,7.0,3.0,,,,,,,,,,1030.0,204.0,22.0,,,,,,,,,,,,,,,,,,,,,5.0,1.0,,,,,,,,1.0,,1.0,2.0,0.0,,1.0,2.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1072.0,1030.0,1072.0,1002.0,1030.0,,,,,1.0,5.0,,3.0,,,,,1.0,,,,1.0,2.0,2.0,,3.0,,5,,5.0,5.0,5.0,,,,,,,,,5,40,1,1.0,1030.0,0.2316,1021,2241,5.0,1.0,6.0,5,,,,2.0,1.0,,3,5,3,5,752,40,40,3,4,13,11,2,2,2,2,2,2,2,2,328,1,3,3,5,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13638.640857,16368.925374,27278.814136,2,74,,7.1875


## 1.7 Validating Data

Important to do this at the beginning. We may run into errors importing the data, or we may misinterpret some parts of the data. 

We will validate our data by making some simple calculations and comparing it to published results from NSFG codebook.

In [5]:
df.outcome.value_counts().sort_index()

1    9148
2    1862
3     120
4    1921
5     190
6     352
Name: outcome, dtype: int64

In [6]:
df.birthwgt_lb.value_counts().sort_index()

0.0        8
1.0       40
2.0       53
3.0       98
4.0      229
5.0      697
6.0     2223
7.0     3049
8.0     1889
9.0      623
10.0     132
11.0      26
12.0      10
13.0       3
14.0       3
15.0       1
Name: birthwgt_lb, dtype: int64

## 1.8 Intepretation

Let's take a look at the sequence of pregnancy outcomes for a few respondents. We have a function **MakePregMap** that maps all pregnancies back to each respondent. 

In [16]:
caseid = 10229
preg_map = nsfg.MakePregMap(df)
indeces =  preg_map[caseid]
df.outcome[indeces].values

array([4, 4, 4, 4, 4, 4, 1])

The above cell paints a picture about respondent 10229. After 6 mascarriages (outcome code 4), she had a live birth (outcome code 1).