1.  Consider an experiment consisting of 4 Bernoulli trials, each with
    the same probability p of success (0=failure, 1=success). The
    outcome is binomial(n,p) with n = 4 and p = .26. List all the
    possible outcomes of this experiment (0000, 0010, etc.) and compute
    using the formula the binomial probabilities associated with each of
    them. \[6\]


In [2]:
from itertools import product

def binomial(p, n, k, combine=False):
    '''p (float): probability of success
    n (int): number of trials
    k (int): number of successes
    combine (bool): combine probabilities for permutations of sequence
    '''
    if combine:
        n_choose_k = factorial(n)/(factorial(k)*factorial(n-k))
        return n_choose_k*binomial(p, n, k)
    else:
        return (p**k) * (1-p)**(n-k)

for outcome in product([0,1], repeat=4):
    k = sum(outcome) # number of successes
    print(outcome, '--', round    
    (binomial(.26, 4, k), 3))

(0, 0, 0, 0) -- 0.3
(0, 0, 0, 1) -- 0.105
(0, 0, 1, 0) -- 0.105
(0, 0, 1, 1) -- 0.037
(0, 1, 0, 0) -- 0.105
(0, 1, 0, 1) -- 0.037
(0, 1, 1, 0) -- 0.037
(0, 1, 1, 1) -- 0.013
(1, 0, 0, 0) -- 0.105
(1, 0, 0, 1) -- 0.037
(1, 0, 1, 0) -- 0.037
(1, 0, 1, 1) -- 0.013
(1, 1, 0, 0) -- 0.037
(1, 1, 0, 1) -- 0.013
(1, 1, 1, 0) -- 0.013
(1, 1, 1, 1) -- 0.005


In [10]:
round(.5**19, 9)

1.907e-06

In [19]:
for k in range(9, 12):
    print(f'{binomial(0.5, 11, k, combine=True):.10f}')

0.0268554688
0.0053710938
0.0004882812


2.  American football games traditionally begin with a coin toss to
    determine who gets the first kickoff. In the 2015 season, the New
    England Patriots won 19 out of 26 tosses. Test the null hypothesis
    that fair coins were used (with a 2-tailed test). Do all
    calculations by hand, but you can double-check your answer with
    Excel, SPSS, or some other software. \[6\]


In [18]:
from math import factorial

prob = .5
num_trials = 11
observed_value = 9

as_or_more_extreme = sum(binomial(prob, num_trials, x, combine=True)
                         for x in range(observed_value, num_trials+1))

print(f'p-value (one-tailed): {as_or_more_extreme:.5f}')
print(f'p-value (two-tailed): {as_or_more_extreme*2:.5f}')

p-value (one-tailed): 0.03271
p-value (two-tailed): 0.06543


There is sufficient evidence to reject the null hypothesis that the coin
tosses were fair.

3.  Tape a penny and a quarter together with clear tape so that the
    quarter is "heads" and the penny is "tails." Flip this new 26¢ coin
    50 times, each time recording the outcome (heads=1, tails=0). Is
    there evidence to suggest that the "coin" is not fair? In the course
    of your investigation, use Excel's BINOM.DIST function and SPSS.
    Note any agreements and disagreements among the different software
    applications. \[6\]


In [None]:
# I flipped a student union button 50 times because I could not find a penny
observations = ('001111011111111110010000111111)

4.  You hypothesize that the frequency of left-handed and right-handed
    people is unequal in the population. To test this hypothesis you
    collect data from 19 people and observe 3 lefties. How probable was
    an outcome at least this extreme (in either direction) if the
    proportions are genuinely equal in the population? Obtain an exact
    p-value from a binomial distribution. Also conduct a 2 test of
    goodness of fit (by hand, with and without a correction for
    continuity), and compare the results to those you obtained using the
    binomial distribution. Precise p-values can be obtained for  2
    tests using any of a number of online utilities, for instance this
    one: http://www.graphpad.com/quickcalcs/PValue1.cfm. \[6\]


In [None]:
prob = .5
num_trials = 19
observed_value = 19-3

as_or_more_extreme = sum(binomial(prob, num_trials, x, combine=True)
                         for x in range(observed_value, num_trials+1))
print(f'p-value (one-tailed): {as_or_more_extreme:.3f}')
print(f'p-value (two-tailed): {as_or_more_extreme*2:.3f}')

In [20]:
def chi_square(observeds_expecteds, correct_continuity=False):
    '''observeds_expecteds iter(tuple(int|float)): iterable of observed-expected tuples
    correct_continuity (bool): whether to apply Yates' correction for continuity
    '''
    if correct_continuity:
        return sum(
            (((abs(o-e)-0.5)**2) / e)
            for o, e in observeds_expecteds
        )
    
    else:
        return sum(
            ((o-e)**2 / e)
            for o, e in observeds_expecteds
        )

count_data = ((16, 9.5), (3, 9.5))
count_data = ((9, 5.5), (2, 5.5))

chi_test_statistic = chi_square(count_data, correct_continuity=False)
corrected_chi = chi_square(count_data, correct_continuity=True)

print(f'The chi-squared test statistic is {chi_test_statistic:.3f}')
print(f'The corrected chi-squared test statistic is {corrected_chi:.3f}')

The chi-squared test statistic is 4.455
The corrected chi-squared test statistic is 3.273


The p-value for the uncorrected chi-squared test (df=1) is .0029.
For the corrected chi-squared test, the p-value is .0059. The uncorrected p-value is lower than the (two-tailed) binomial test, and the corrected p-value is higher than the two-tailed test, but all tests demonstrate sufficient evidence to reject the null hypothesis that the frequency of left- and right-handed people is equal in the population.

5.  Using enumeration, derive the exact sampling distribution of the
    number of runs when N = 7 observations, when m = 4 and n = 3. What
    is the p-value associated with observing 3 or fewer runs? \[5\]


In [None]:
for outcome in product([0,1], repeat=4):
    k = sum(outcome) # number of successes
    print(outcome, '--', round    
    (binomial(.26, 4, k), 3))

6.  Using SPSS, conduct a runs test of the null hypothesis that the
    following sequence of numbers is randomly ordered \[5\]:
    001111100001111011000010000001110000
    

In [22]:
from statsmodels.sandbox.stats.runs import runstest_1samp
data = [int(i) for i in '01010101000010101010010100101010101101011010101011']

corrected = runstest_1samp(data, correction=True)
uncorrected = runstest_1samp(data, correction=False)

print(f'Uncorrected:\nZ-score = {uncorrected[0]:.3f} p-value = {uncorrected[1]:.10f}')
print(f'Corrected:\nZ-score = {corrected[0]:.3f} p-value = {corrected[1]:.10f}')

Uncorrected:
Z-score = 4.591 p-value = 0.0000044047
Corrected:
Z-score = 4.591 p-value = 0.0000044047


7.  The following data represent measurement occasions and scores for
    one individual on hourly repeated administrations of Preacher's
    Irritability Scale (Schedule Y). Fit a simple linear regression
    model to these data. In regression, the errors are assumed
    independent. Yet with repeated measures on a single instrument, this
    assumption is questionable because adjacent errors are often
    correlated. Use an appropriate test of randomness to determine
    whether there is evidence for autocorrelation of residuals. Use a
    1-tailed test at  = .05 \[6\]