You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.



In [2]:
from IPython.display import Image


Image(url= "../images/FormulatingAndSimulatingAHypothesis/1.png", width=400)

In [3]:
Image(url= "../images/FormulatingAndSimulatingAHypothesis/2.png", width=400)

In [4]:
Image(url= "../images/FormulatingAndSimulatingAHypothesis/3.png", width=400)

In [5]:
Image(url= "../images/FormulatingAndSimulatingAHypothesis/4.png", width=400)

In [6]:
Image(url= "../images/FormulatingAndSimulatingAHypothesis/6.png", width=400)

<h4>Exercise</h4>
Generating a permutation sample
In the video, you learned that permutation sampling is a great way to simulate the hypothesis that two variables have identical probability distributions. This is often a hypothesis you want to test, so in this exercise, you will write a function to generate a permutation sample from two data sets.

Remember, a permutation sample of two arrays having respectively n1 and n2 entries is constructed by concatenating the arrays together, scrambling the contents of the concatenated array, and then taking the first n1 entries as the permutation sample of the first array and the last n2 entries as the permutation sample of the second array.

<h4>Instructions</h4>
<ul>
    <li>Concatenate the two input arrays into one using np.concatenate(). Be sure to pass in data1 and data2 as one argument (data1, data2).</li>
    <li>Use np.random.permutation() to permute the concatenated array.</li>
    <li>Store the first len(data1) entries of permuted_data as perm_sample_1 and the last len(data2) entries of permuted_data as perm_sample_2. In practice, this can be achieved by using :len(data1) and len(data1): to slice permuted_data.</li>
    <li>Return perm_sample_1 and perm_sample_2.</li>
</ul>

<h4>Exercise</h4>
Visualizing permutation sampling
To help see how permutation sampling works, in this exercise you will generate permutation samples and look at them graphically.

We will use the Sheffield Weather Station data again, this time considering the monthly rainfall in June (a dry month) and November (a wet month). We expect these might be differently distributed, so we will take permutation samples to see how their ECDFs would look if they were identically distributed.

The data are stored in the NumPy arrays rain_june and rain_november.

As a reminder, permutation_sample() has a function signature of permutation_sample(data_1, data_2) with a return value of permuted_data[:len(data_1)], permuted_data[len(data_1):], where permuted_data = np.random.permutation(np.concatenate((data_1, data_2))).

<h4>Instructions</h4>
<ul>
    <li>Write a for loop to generate 50 permutation samples, compute their ECDFs, and plot them.</li>
    <li>Generate a permutation sample pair from rain_june and rain_november using your permutation_sample() function.</li>
    <li>Generate the x and y values for an ECDF for each of the two permutation samples for the ECDF using your ecdf() function.
Plot the ECDF of the first permutation sample (x_1 and y_1) as dots. Do the same for the second permutation sample (x_2 and y_2).</li>
    <li>Generate x and y values for ECDFs for the rain_june and rain_november data and plot the ECDFs using respectively the keyword arguments color='red' and color='blue'.</li>
    <li>Label your axes, set a 2% margin, and show your plot. This has been done for you, so just hit submit to view the plot</li>
</ul>





In [7]:
Image(url= "../images/FormulatingAndSimulatingAHypothesis/7.png", width=400)

Great work! Notice that the permutation samples ECDFs overlap and give a purple haze. None of the ECDFs from the permutation samples overlap with the observed data, suggesting that the hypothesis is not commensurate with the data. June and November rainfall are not identically distributed.

In [3]:
Image(url= "../images/IntroductionToHypothesisTesting/8.png", width=400)

In [4]:
Image(url= "../images/IntroductionToHypothesisTesting/9.png", width=400)

In [5]:
Image(url= "../images/IntroductionToHypothesisTesting/10.png", width=400)

In [6]:
Image(url= "../images/IntroductionToHypothesisTesting/11.png", width=400)

In [7]:
Image(url= "../images/IntroductionToHypothesisTesting/12.png", width=400)

In [10]:
Image(url= "../images/IntroductionToHypothesisTesting/13.png", width=400)

<h4>Exercise</h4>
<h2>Generating permutation replicates</h2>
As discussed in the video, a permutation replicate is a single value of a statistic computed from a permutation sample. As the draw_bs_reps() function you wrote in chapter 2 is useful for you to generate bootstrap replicates, it is useful to have a similar function, draw_perm_reps(), to generate permutation replicates. You will write this useful function in this exercise.

The function has call signature draw_perm_reps(data_1, data_2, func, size=1). Importantly, func must be a function that takes two arrays as arguments. In most circumstances, func will be a function you write yourself.

<h4>Instructions</h4>
<ul>
    <li>Define a function with this signature: draw_perm_reps(data_1, data_2, func, size=1).
        <ul>
            <li>Initialize an array to hold the permutation replicates using np.empty().</li>
            <li>Write a for loop to:
                 <ul>
                    <li>Compute a permutation sample using your permutation_sample() function</li>
                    <li>Pass the samples into func() to compute the replicate and store the result in your array of replicates.</li>
                </ul>
            </li>
        </ul>
    </li>
    <li>Return the array of replicates.</li>
</ul>

In [9]:
def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

<h4>Exercise</h4>
<h2>Look before you leap: EDA before hypothesis testing</h2>

Kleinteich and Gorb (Sci. Rep., 4, 5225, 2014) performed an interesting experiment with South American horned frogs. They held a plate connected to a force transducer, along with a bait fly, in front of them. They then measured the impact force and adhesive force of the frog's tongue when it struck the target.

Frog A is an adult and Frog B is a juvenile. The researchers measured the impact force of 20 strikes for each frog. In the next exercise, we will test the hypothesis that the two frogs have the same distribution of impact forces. But, remember, it is important to do EDA first! Let's make a bee swarm plot for the data. They are stored in a pandas DataFrame, df, where column ID is the identity of the frog and column impact_force is the impact force in Newtons (N).

<h4>Instructions</h4>
<ul>
    <li>Use sns.swarmplot() to make a bee swarm plot of the data by specifying the x, y, and data keyword arguments.</li>
    <li>Label your axes.</li>
    <li>Show the plot.</li>
</ul>

In [14]:
pip install -U seaborn

Note: you may need to restart the kernel to use updated packages.


Eyeballing it, it does not look like they come from the same distribution. Frog A, the adult, has three or four very hard strikes, and Frog B, the juvenile, has a couple weak ones. However, it is possible that with only 20 samples it might be too difficult to tell if they have difference distributions, so we should proceed with the hypothesis test.

<h4>Exercise</h4>

<h2>Permutation test on frog data</h2>
<br>
The average strike force of Frog A was 0.71 Newtons (N), and that of Frog B was 0.42 N for a difference of 0.29 N. It is possible the frogs strike with the same force and this observed difference was by chance. You will compute the probability of getting at least a 0.29 N difference in mean strike force under the hypothesis that the distributions of strike forces for the two frogs are identical. We use a permutation test with a test statistic of the difference of means to test this hypothesis.

For your convenience, the data has been stored in the arrays force_a and force_b.

<h4>Instructions</h4>

<ul>
    <li>Define a function with call signature diff_of_means(data_1, data_2) that returns the differences in means between two data sets, mean of data_1 minus mean of data_2.</li>
    <li>Use this function to compute the empirical difference of means that was observed in the frogs.</li>
    <li>Draw 10,000 permutation replicates of the difference of means.</li>
    <li>Compute the p-value.</li>
    <li>Print the p-value.</li>
</ul>

p-value = 0.0063

<br>
The p-value tells you that there is about a 0.6% chance that you would get the difference of means observed in the experiment if frogs were exactly the same. A p-value below 0.01 is typically said to be "statistically significant," but: warning! warning! warning! You have computed a p-value; it is a number. I encourage you not to distill it to a yes-or-no phrase. p = 0.006 and p = 0.000000006 are both said to be "statistically significant," but they are definitely not the same!

In [9]:
Image(url= "../images/IntroductionToHypothesisTesting/14.png", width=400)

In [11]:
Image(url= "../images/IntroductionToHypothesisTesting/15.png", width=400)

In [12]:
Image(url= "../images/IntroductionToHypothesisTesting/16.png", width=400)

In [13]:
Image(url= "../images/IntroductionToHypothesisTesting/17.png", width=400)

In [14]:
Image(url= "../images/IntroductionToHypothesisTesting/18.png", width=400)

In [15]:
Image(url= "../images/IntroductionToHypothesisTesting/19.png", width=400)

In [16]:
Image(url= "../images/IntroductionToHypothesisTesting/20.png", width=400)

In [17]:
Image(url= "../images/IntroductionToHypothesisTesting/21.png", width=400)

<h4>Exercise</h4>
<h2>A one-sample bootstrap hypothesis test</h2>

Another juvenile frog was studied, Frog C, and you want to see if Frog B and Frog C have similar impact forces. Unfortunately, you do not have Frog C's impact forces available, but you know they have a mean of 0.55 N. Because you don't have the original data, you cannot do a permutation test, and you cannot assess the hypothesis that the forces from Frog B and Frog C come from the same distribution. You will therefore test another, less restrictive hypothesis: The mean strike force of Frog B is equal to that of Frog C.

To set up the bootstrap hypothesis test, you will take the mean as our test statistic. Remember, your goal is to calculate the probability of getting a mean impact force less than or equal to what was observed for Frog B if the hypothesis that the true mean of Frog B's impact forces is equal to that of Frog C is true. You first translate all of the data of Frog B such that the mean is 0.55 N. This involves adding the mean force of Frog C and subtracting the mean force of Frog B from each measurement of Frog B. This leaves other properties of Frog B's distribution, such as the variance, unchanged.

<h4>Instructions</h4>
<ul>
    <li>Translate the impact forces of Frog B such that its mean is 0.55 N.</li>
    <li>Use your draw_bs_reps() function to take 10,000 bootstrap replicates of the mean of your translated forces.</li>
    <li>Compute the p-value by finding the fraction of your bootstrap replicates that are less than the observed mean impact force of Frog B. Note that the variable of interest here is force_b.</li>
    <li>Print your p-value.</li>
</ul>

<script.py> output:
    p =  0.0046<br>
Great work! The low p-value suggests that the null hypothesis that Frog B and Frog C have the same mean impact force is false.

<h4>Exercise</h4>
<h2>A two-sample bootstrap hypothesis test for difference of means</h2>
We now want to test the hypothesis that Frog A and Frog B have the same mean impact force, but not necessarily the same distribution, which is also impossible with a permutation test.

To do the two-sample bootstrap test, we shift both arrays to have the same mean, since we are simulating the hypothesis that their means are, in fact, equal. We then draw bootstrap samples out of the shifted arrays and compute the difference in means. This constitutes a bootstrap replicate, and we generate many of them. The p-value is the fraction of replicates with a difference in means greater than or equal to what was observed.

The objects forces_concat and empirical_diff_means are already in your namespace.

<h4>Instructions</h4>
<ul>
    <li>Compute the mean of all forces (from forces_concat) using np.mean().</li>
    <li>Generate shifted data sets for both force_a and force_b such that the mean of each is the mean of the concatenated array of impact forces.</li>
    <li>Generate 10,000 bootstrap replicates of the mean each for the two shifted arrays.</li>
    <li>Compute the bootstrap replicates of the difference of means by subtracting the replicates of the shifted impact force of Frog B from those of Frog A.</li>
    <li>Compute and print the p-value from your bootstrap replicates.</li>
</ul>

Nice work! You got a similar result as when you did the permutation test. Nonetheless, remember that it is important to carefully think about what question you want to ask. Are you only interested in the mean impact force, or in the distribution of impact forces?