## Chapter 13 Exercises
### DSC 530
### Holly Figueroa

In [1]:
from __future__ import print_function

import pandas
import numpy as np

import thinkplot
import thinkstats2
import survival

**Exercise 13-1**  
The variable cmdivorcx contains the date of divorce for the respondent's first marriage, if applicable encoded in century-months. Compute the duration of marriages that have end in divorce, and the duration, so far of marriages that are ongoing. Estimate the hazard and survival function for the duration of marriage. Use resampling to take into account sampling weights, and plot data from several resamples to visualize sampling error. Consider dividing the resondents into groups by decade of birth, and possible by age at first marriage.

In [2]:
def CleanData(resp):
  
    resp.cmdivorcx.replace([9998, 9999], np.nan, inplace=True)
    
    # Create new columns for marriage length for those currently married and divorced    
    resp['notdivorced'] = resp.cmdivorcx.isnull().astype(int)
    resp['marriage_len'] = (resp.cmdivorcx - resp.cmmarrhx) / 12.0
    resp['current_len'] = (resp.cmintvw - resp.cmmarrhx) / 12.0

    # Compute decade of birth( months since Dec 1899)
    month0 = pandas.to_datetime('1899-12-15')
    dates = [month0 + pandas.DateOffset(months=cm) 
             for cm in resp.cmbirth]
    resp['decade'] = (pandas.DatetimeIndex(dates).year - 1900) // 10

In [3]:
# Load NSFG respondent 6 and 7 data marriage and divorce data and clean data
resp6 = survival.ReadFemResp2002()
resp7 = survival.ReadFemResp2010()

CleanData(resp6)
CleanData(resp7)

married6 = resp6[resp6.evrmarry==1]
married7 = resp7[resp7.evrmarry==1]


In [4]:
def ResampleDivorceCurve(resps):
    #Plot divorce curves based on resampled data.
    
    for _ in range(41):
        samples = [thinkstats2.ResampleRowsWeighted(resp) 
                   for resp in resps]
        sample = pandas.concat(samples, ignore_index=True)
        PlotDivorceCurveByDecade(sample, color='#2005EA8', alpha=0.1)

    thinkplot.Show(xlabel='years',
                   axis=[0, 28, 0, 1])


def ResampleDivorceCurveByDecade(resps):
    #Plot divorce curves for each birth cohort.
    
    for i in range(41):
        samples = [thinkstats2.ResampleRowsWeighted(resp) 
                   for resp in resps]
        sample = pandas.concat(samples, ignore_index=True)
        groups = sample.groupby('decade')
        if i == 0:
            survival.AddLabelsByDecade(groups, alpha=0.7)

        EstimateSurvivalByDecade(groups, alpha=0.1)

    thinkplot.Save(root='survival7',
                   xlabel='years',
                   axis=[0, 28, 0, 1])


def EstimateSurvivalByDecade(groups, **options):
    # Group respondents by decade and plots survival curves.

    thinkplot.PrePlot(len(groups))
    for name, group in groups:
        print(name, len(group))
        _, sf = EstimateSurvival(group)
        thinkplot.Plot(sf, **options)


def EstimateSurvival(resp):
    #Estimates the survival curve.

    complete = resp[resp.notdivorced == 0].marriage_len.dropna()
    ongoing = resp[resp.notdivorced == 1].current_len.dropna()

    hf = survival.EstimateHazardFunction(complete, ongoing)
    sf = hf.MakeSurvival()

    return hf, sf


In [5]:
ResampleDivorceCurveByDecade([married6, married7])

5 513
6 4147
7 3815
8 1174
9 11
5 503
6 4167
7 3779
8 1206
9 5
5 510
6 4199
7 3799
8 1141
9 11
5 497
6 4268
7 3695
8 1194
9 6
5 493
6 4149
7 3845
8 1164
9 9
5 523
6 4194
7 3797
8 1134
9 12
5 524
6 4137
7 3781
8 1210
9 8
5 525
6 4157
7 3816
8 1156
9 6
5 529
6 4171
7 3827
8 1127
9 6
5 556
6 4185
7 3758
8 1154
9 7
5 495
6 4223
7 3724
8 1206
9 12
5 518
6 4130
7 3823
8 1177
9 12
5 546
6 4107
7 3807
8 1190
9 10
5 475
6 4241
7 3796
8 1133
9 15
5 476
6 4242
7 3722
8 1210
9 10
5 504
6 4241
7 3752
8 1153
9 10
5 492
6 4182
7 3812
8 1169
9 5
5 523
6 4174
7 3785
8 1171
9 7
5 490
6 4203
7 3849
8 1109
9 9
5 489
6 4102
7 3899
8 1166
9 4
5 497
6 4226
7 3770
8 1162
9 5
5 535
6 4158
7 3787
8 1172
9 8
5 488
6 4136
7 3853
8 1173
9 10
5 482
6 4138
7 3825
8 1205
9 10
5 496
6 4247
7 3791
8 1118
9 8
5 483
6 4202
7 3811
8 1151
9 13
5 498
6 4121
7 3858
8 1180
9 3
5 489
6 4141
7 3859
8 1162
9 9
5 505
6 4276
7 3673
8 1197
9 9
5 510
6 4159
7 3772
8 1208
9 11
5 518
6 4183
7 3792
8 1158
9 9
5 488
6 4222
7 3759
8 1177

<Figure size 576x432 with 0 Axes>

**Conclusion**  
After running this code and examining the plot pdf generated (Attached separately), it would appear that divorce, or surival of marriage, varies by age cohort. The output suggests that older cohorts appear to have lower probability curves for divorce compared to younger cohorts. This result would match common public perceptions as it was a popular social topic and concern in the 80's and 90's. The plot also illustrates an overall decrease in marriage survival over time, favoring older cohorts within that trend. After approx 25 years we see chorts fan out between 40 and 60 percent, making it appear that marriages longer than this are about 50/50.  