**Exercise:** In Section 9.3, we simulated the null hypothesis by permutation; that is, we treated the observed values as if they represented the entire population, and randomly assigned the members of the population to the two groups.

An alternative is to use the sample to estimate the distribution for the population, then draw a random sample from that distribution. This process is called resampling. There are several ways to implement resampling, but one of the simplest is to draw a sample with replacement from the observed values, as in Section 9.10.

Write a class named `DiffMeansResample` that inherits from `DiffMeansPermute` and overrides `RunModel` to implement resampling, rather than permutation.

Use this model to test the differences in pregnancy length and birth weight. How much does the model affect the results?

In [1]:
from __future__ import print_function, division
%matplotlib inline
import numpy as np
import random
import thinkstats2
import thinkplot

In [2]:
class HypothesisTest(object):

    def __init__(self, data):
        self.data = data
        self.MakeModel()
        self.actual = self.TestStatistic(data)

    def PValue(self, iters=1000):
        self.test_stats = [self.TestStatistic(self.RunModel()) 
                           for _ in range(iters)]

        count = sum(1 for x in self.test_stats if x >= self.actual)
        return count / iters

    def TestStatistic(self, data):
        raise UnimplementedMethodException()

    def MakeModel(self):
        pass

    def RunModel(self):
        raise UnimplementedMethodException()

class DiffMeansPermute(thinkstats2.HypothesisTest):

    def TestStatistic(self, data):
        group1, group2 = data
        test_stat = abs(group1.mean() - group2.mean())
        return test_stat

    def MakeModel(self):
        group1, group2 = self.data
        self.n, self.m = len(group1), len(group2)
        self.pool = np.hstack((group1, group2))

    def RunModel(self):
        np.random.shuffle(self.pool)
        data = self.pool[:self.n], self.pool[self.n:]
        return data

class DiffMeansResample(DiffMeansPermute):
    
    def RunModel(self):
        group1, group2 = self.data
        sample1 = thinkstats2.Resample(group1)
        sample2 = thinkstats2.Resample(group2)
        data = sample1, sample2
        return data
        
    def Resample(xs):
        return np.random.choice(xs, len(xs), replace=True)

In [3]:
import first

live, firsts, others = first.MakeFrames()
pl_data = firsts.prglngth.values, others.prglngth.values
w_data = firsts.totalwgt_lb.values, others.totalwgt_lb.values

per_pl_ht = DiffMeansPermute(pl_data)
per_pl_PVal = per_pl_ht.PValue()
print('Permutation pregnancy length p-value: {}'.format(per_pl_PVal))
res_pl_ht = DiffMeansResample(pl_data)
res_pl_PVal = res_pl_ht.PValue()
print('Resampling pregnancy length p-value: {}\n'.format(res_pl_PVal))

per_w_ht = DiffMeansPermute(w_data)
per_w_PVal = per_w_ht.PValue()
print('Permutation birth weight p-value: {}'.format(per_w_PVal))
res_w_ht = DiffMeansResample(w_data)
res_w_PVal = res_w_ht.PValue()
print('Resampling birth weight p-value: {}'.format(res_w_PVal))


Permutation pregnancy length p-value: 0.169
Resampling pregnancy length p-value: 0.512

Permutation birth weight p-value: 0.0
Resampling birth weight p-value: 0.0


The resampling model yields a much higher p-value for pregnancy length (0.512 for resampling versus 0.169 for permutation), whereas both models yield p = 0 for birth weight.