**Exercise 2.4** Using the variable totalwgt_lb, investigate whether first babies are lighter or heavier than others. Compute Cohen’s d to quantify the difference between the groups. How does it compare to the difference in pregnancy length?

**IMPORTANT!** [ThinkStats2](https://github.com/AllenDowney/ThinkStats2) should be cloned ../.. relative to this dir.

In [1]:
import sys
import os
import math

import numpy as np
import pandas as pd

In [6]:
import thinkstats2_path
import nsfg

In [12]:
def cohens_d(g1, g2):
    """Compute Cohen's Effect Size for 2 groups

    Args:
      g1, g2: DataSeries or DataFrames w/ the same columns.

    Returns:
      float/Data Series: The Cohen's effect size
    
    """

    mean1 = g1.mean()
    mean2 = g2.mean()

    var1 = g1.var()
    var2 = g2.var()
    
    n1 = len(g1)
    n2 = len(g2)

    v = (var1 * n1 + var2 * n2) / (n1 + n2)

    return (mean1 - mean2) / math.sqrt(v)


In [7]:
# Load pregnancy data:
preg = nsfg.ReadFemPreg()
# Keep only live births:
live = preg[preg.outcome == 1]
# Segregate first v. subsequent births:
firsts = live[live.birthord == 1]
others = live[live.birthord != 1]

In [18]:
firsts_totalwgt_lb_mean = firsts.totalwgt_lb.mean()
others_totalwgt_lb_mean = others.totalwgt_lb.mean()
print("Avg weight (lbs): firsts={:.2f}, others={:.2f}, diff={:.2f}".format(
    firsts_totalwgt_lb_mean, others_totalwgt_lb_mean, firsts_totalwgt_lb_mean - others_totalwgt_lb_mean))
print("Std (lbs): firsts={:.3f}, others={:.6f}".format(
    firsts.totalwgt_lb.std(), others.totalwgt_lb.std()))
print("Cohen's Effect Factor(firsts, others) for birth weight={:.6f}".format(
    cohens_d(firsts.totalwgt_lb, others.totalwgt_lb)))

Avg weight (lbs): firsts=7.20, others=7.33, diff=-0.12
Std (lbs): firsts=1.421, others=1.394195
Cohen's Effect Factor(firsts, others) for birth weight=-0.088673


**Conclusion:** While first babies are lighter on average, the effect size is **small** (0.01 .. 0.2) according to [Wikipedia](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d)

In [19]:
firsts_prglngth_mean = firsts.prglngth.mean()
others_prglngth_mean = others.prglngth.mean()
print("Avg pregnancy length (weeks): firsts={:.2f}, others={:.2f}, diff={:.2f}".format(
    firsts_prglngth_mean, others_prglngth_mean, firsts_prglngth_mean - others_prglngth_mean))
print("Cohen's Effect Factor(firsts, others) for pregnancy length={:.6f}".format(
    cohens_d(firsts.prglngth, others.prglngth)))

Avg pregnancy length (weeks): firsts=38.60, others=38.52, diff=0.08
Cohen's Effect Factor(firsts, others) for pregnancy length=0.028879


**Conclusion:** While first baby pregancies are longer on average (i.e. opposite direction than weight) the side effect is both small as per Cohen's d and insignificant in meaning (.08 weeks = 0.56 days = 13 hrs)