# Are 2nd Pregnancies usually more successful?
I think my initial question (How are age and first-pregnancy-success correlated)
 was too much right now - focus on comparing means first.

Sample
- Women who had at least 2 pregnancies -> look at resp data
  - NUMPREGS (96 - 97): all ids where numpregs >= 2
- First and Second unaborted pregnancy pairing, from preg data in nsfg
  - OUTCOME (277 - 277): SUCCESS 1 (live) / NOT-SUCCESS: 3 (stillbirth), 4 (miscarriage), 5 (ectopic)
  - pregordr (13 - 14): look at ( pregorder = 1, pregorder = 2 )
 
Check:
- Is the success rate of first pregnancies lower or higher than second pregnancies, by how much?


Get the pregnancy and respondant data, with code from: 
    Allen B. Downey
    MIT License: https://opensource.org/licenses/MIT

In [7]:
from os.path import basename, exists
def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve

        local, _ = urlretrieve(url, filename)
        print("Downloaded " + local)

download("https://github.com/AllenDowney/ThinkStats2/raw/master/code/2002FemPreg.dct")
download(
    "https://github.com/AllenDowney/ThinkStats2/raw/master/code/2002FemPreg.dat.gz"
)
download("https://github.com/AllenDowney/ThinkStats2/raw/master/code/2002FemResp.dct")
download("https://github.com/AllenDowney/ThinkStats2/raw/master/code/2002FemResp.dat.gz")

import numpy as np
import nsfg
preg = nsfg.ReadFemPreg()
resp = nsfg.ReadFemResp()

In [8]:
# Verify the preg data by comparing outcomes with codebook
def sortedValueCounts(series):
    return series.value_counts().sort_index()
    
sortedValueCounts(preg.outcome)

outcome
1    9148
2    1862
3     120
4    1921
5     190
6     352
Name: count, dtype: int64

Matches codebook, import successful!
Check the resp data by comparing numpregs with codebook. Note, 98 means refused, *not* 98 pregnancies.
This also matches the codebook.

In [9]:
sortedValueCounts(resp.numpregs)

numpregs
0.0     2600
1.0     1267
2.0     1432
3.0     1110
4.0      611
5.0      305
6.0      150
7.0       80
8.0       40
9.0       21
10.0       9
11.0       3
12.0       2
14.0       2
19.0       1
98.0       7
Name: count, dtype: int64

Next, we get our sample from the data

In [10]:
# Get a list of all respondent IDs (caseid) for whom numpregs is >= 2
mult_preg_resp = resp[(resp.numpregs >= 2) & (resp.numpregs != 98)] 
mult_preg_ids = mult_preg_resp.caseid
# Verify the counts
sortedValueCounts(mult_preg_resp.numpregs)

numpregs
2.0     1432
3.0     1110
4.0      611
5.0      305
6.0      150
7.0       80
8.0       40
9.0       21
10.0       9
11.0       3
12.0       2
14.0       2
19.0       1
Name: count, dtype: int64

In [12]:
# Get the pregnancies that are from women who've had multiple pregnancies
one_preg_of_mult = preg[preg.caseid.isin(mult_preg_ids)]

# Verify that women who had one pregnancy are not in the list
single_preg_ids = resp[resp.numpregs == 1].caseid
is_in_mult = one_preg_of_mult.caseid.isin(single_preg_ids)
if(is_in_mult.any()):
    print("ERROR IN SAMPLE!")
else:
    print("All good")

All good


In [46]:
# Filter down to pregnancies that are from women with multiple pregnancies, where
# none of the pregnancies were aborted or unfinished

first_or_second_preg = one_preg_of_mult[(one_preg_of_mult.pregordr == 1) | (one_preg_of_mult.pregordr == 2)]

first_preg_successes= [] # Stores 1s for successful pregnancy, 0 for unsuccessful
second_preg_successes= [] # Stores 1s for successful pregnancy, 0 for unsuccessful


n_women = 0
n_first_success = 0
n_second_success = 0
n_1st_s_2nd_f = 0
n_1st_f_2nd_s = 0
n_both_f = 0
n_both_s = 0

for id in first_or_second_preg.caseid:
    id_preg = first_or_second_preg[first_or_second_preg.caseid == id]
    first_id_preg = id_preg[id_preg.pregordr == 1]
    first_outcome = first_id_preg.outcome.values[0]

    second_id_preg = id_preg[id_preg.pregordr == 2]
    second_outcome = second_id_preg.outcome.values[0]

    if first_outcome == 2 or first_outcome == 6 or second_outcome == 2 or second_outcome == 6:
        continue
    
    
    pairings.append(pair_success)
    first_preg_successes.append(first_outcome == 1)
    second_preg_successes.append(second_outcome == 1)

    # Update counts
    n_women+=1
    if first_outcome == 1:
        n_first_success+=1          
        if second_outcome == 1:
            n_second_success+=1
            n_both_s += 1
        else:
            n_1st_s_2nd_f += 1
    else:    # first fail 
        if second_outcome == 1:
            n_1st_f_2nd_s += 1
            n_second_success += 1
        else :
            n_both_f+=1
        

(n_women, n_first_success, n_second_success)

(5638, 4504, 4684)

Calculate the mean success rate of first and second pregnancies

In [57]:
# print counts
print("Sample size: %d" %(n_women))
print("Two successful births: %2.2f%%, Two unsuccessful births: %2.2f%%. " %((n_both_s / n_women)*100, (n_both_f / n_women)*100))
print("First birth successful only: %2.2f%%, second birth successful only: %2.2f%%" %((n_1st_s_2nd_f/n_women)*100, (n_1st_f_2nd_s/n_women)*100))

# print means
print("\nMean success rate of first pregnancies: %1.4f. std dev is: %2.4f" %(n_first_success/n_women, np.std(first_preg_successes, dtype = np.float32)))
print("Mean success rate of second pregnancies: %1.4f. std dev is: %2.4f" %(n_second_success/n_women, np.std(second_preg_successes, dtype = np.float32)))


from scipy import stats
# Perform two-tailed t-test (assuming unequal variances)
t_statistic, p_value = stats.ttest_ind(first_preg_successes, second_preg_successes, equal_var=False)

# Print results
print(f"\nT-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.7f}")




Sample size: 5638
Two successful births: 68.32%, Two unsuccessful births: 5.36%. 
First birth successful only: 11.56%, second birth successful only: 14.76%

Mean success rate of first pregnancies: 0.7989. std dev is: 0.4008
Mean success rate of second pregnancies: 0.8308. std dev is: 0.3749

T-statistic: -4.3672
p-value: 0.0000127


## Basic Analysis
CAUTION: I am still learning statistics - this is not meant as a diffinitive answer but rather a speculative one based on practicing these skills.

The means show that in this study, second pregnancies among women who had two un-aborted and completed pregnancies had greater success, %83.08 vs %79.89 (live birth occuring instead of an issue such as stillbirth, miscarriage, or ectopic pregnancy). These are both very close to each other, near %80.

This analysis supports what is written on the UK's [National Childbirth Trust page](https://www.nct.org.uk/pregnancy/having-another-baby/second-baby-will-i-have-different-birth-experience-your-questions-answered) on second pregnancies :
>The good news is that second births are much more likely to be straightforward. If you’re planning to give birth at a midwifery unit or birth centre, you have a 97% chance of a straightforward birth, compared with 76% last time.