# Assignment 4 Solutions

In performing a two-sample t-test, there are two distinct situations to consider:

1.  The variances of the two samples are equal to one another (i.e. we are sampling from the same population).
2.  The variances of the two samples are not equal to one another (i.e. we are sampling from two different populations).

For this assignment, the textbook assumes always that situation 2 is the case!!!!!

In these instances, we calculate the standard error in the mean (SEM) and the combined number of degrees of freedom as follows:

$SEM = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$

$df = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}    \right)^2 }{\frac{ \left(\frac{s_1^2}{n_1}\right)^2   }{n_1-1} + \frac{ \left(\frac{s_2^2}{n_2}\right)^2   }{n_2-1}}$

In [54]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def sem_neq(n1,n2,s1,s2):
    sm = np.sqrt(s1**2/n1+s2**2/n2)
    return float(sm)

def ndof_neq(n1,n2,s1,s2):
    v1 = s1**2/n1
    v2 = s2**2/n2
    dof = (v1+v2)**2/(v1**2/(n1-1)+v2**2/(n2-1))
    return int(dof)

def sem_eq(n1,n2,s1,s2):
    sp = np.sqrt(((n1-1)*s1**2+(n2-1)*s2**2)/(n1+n2-2))
    sm = sp*np.sqrt(1.0/n1+1.0/n2)
    return float(sm)

def ndof_eq(n1,n2,s1,s2):
    dof = n2+n1-2
    return int(dof)

# Question 1

Determine the number of degrees of freedom for the two-sample t test or CI in each of the following situations. 
(Exact integer answers required.)

(a) m = 10, n = 13, s1 = 4.8, s2 = 5.7


(b) m = 14, n = 23, s1 = 5.1, s2 = 5.8


(c) m = 6, n = 7, s1 = 2.3, s2 = 6.2


(d) m = 10, n = 23, s1 = 4.1, s2 = 6.6

In [55]:
# (a)
n1, n2, s1, s2 = 6, 8, 5, 6.4
print(ndof_neq(n1,n2,s1,s2))

# (b)
n1, n2, s1, s2 = 6, 22, 4.9, 5.5
print(ndof_neq(n1,n2,s1,s2))

# (c)
n1, n2, s1, s2 = 8, 11, 2, 6.4
print(ndof_neq(n1,n2,s1,s2))

# (d)
n1, n2, s1, s2 = 10, 22, 3.7, 6.5
print(ndof_neq(n1,n2,s1,s2))


11
8
12
28


# Question 2

Let mu1 and mu2 denote true average densities for two different types of brick. 

Assuming normality of the two density distributions, test $H_o$: mu1 – mu2 = 0 versus $H_a$: mu1 – mu2 $\neq$ 0 using the following data: m = 6, x = 23.56, s1 = 0.169, n = 5, y = 20.66, and s2 = 0.223. 

(Use α = 0.05. Give ν to exact integer and t to 2 decimal places.)

In [56]:
n1 = 6
xbar1 = 23.56
s1 = 0.169

n2 = 5
xbar2 = 20.66
s2 = 0.233

alpha = 0.05

df = ndof_neq(n1,n2,s1,s2)
sm = sem_neq(n1,n2,s1,s2)

tvalue = (xbar1 - xbar2) / sm

print("Degrees of Freedom = %d" % (df))
print ("T value = %0.2f" % tvalue)

tdist = stats.t(df)
pvalue = tdist.cdf(-np.abs(tvalue))
tcritical = tdist.ppf(1 - alpha)

print("P-Value = %0.7f" % pvalue)
print ("T Critical = %0.3f" % tcritical)

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")


    print()
print("Python stats package result (unequal variances):")
t, pVal = stats.ttest_ind_from_stats(xbar1,s1,n1,xbar2,s2,n2,equal_var=False)
print ("T value = %0.2f" % t)
print("P-Value = %0.6f" % pVal)

Degrees of Freedom = 7
T value = 23.21
P-Value = 0.0000000
T Critical = 1.895
Reject the null hypothesis ... P-value is less than alpha
Python stats package result (unequal variances):
T value = 23.21
P-Value = 0.000000


# Question 3

Quantitative noninvasive techniques are needed for routinely assessing symptoms of peripheral neuropathies, such as carpal tunnel syndrome (CTS). An article reported on a test that involved sensing a tiny gap in an otherwise smooth surface by probing with a finger; this functionally resembles many work-related tactile activities, such as detecting scratches or surface defects. When finger probing was not allowed, the sample average gap detection threshold for m = 8 normal subjects was 1.8 mm, and the sample standard deviation was 0.49; for n = 12 CTS subjects, the sample mean and sample standard deviation were 2.52 and 0.85, respectively. Does this data suggest that the true average gap detection threshold for CTS subjects exceeds that for normal subjects? State and test the relevant hypotheses using a significance level of .01. (Give answers accurate to 2 decimal places.)

In [57]:
n1 = 7
xbar1 = 1.7
s1 = 0.52

n2 = 10
xbar2 = 2.57
s2 = 0.82


alpha = 0.01

df = ndof_neq(n1,n2,s1,s2)
sm = sem_neq(n1,n2,s1,s2)

tvalue = (xbar2 - xbar1) / sm

print("Degrees of Freedom = %d" % (df))
print ("T value = %0.2f" % tvalue)

tdist = stats.t(df)

pvalue = tdist.cdf(-np.abs(tvalue))

tlow = tdist.ppf(alpha/2)
thigh = tdist.ppf(1 - alpha/2)

print("P-Value = %0.6f" % pvalue)
print ("T Critical Values = %0.2f, %0.2f" % (tlow,thigh))

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

df = ndof_neq(n1,n2,s1,s2)
sm = sem_neq(n1,n2,s1,s2)

tvalue = (xbar2 - xbar1) / sm

print("Degrees of Freedom = %d" % (df))
print ("T value = %0.2f" % tvalue)

tdist = stats.t(df)

pvalue = 2.0*tdist.cdf(-np.abs(tvalue))

tlow = tdist.ppf(alpha/2)
thigh = tdist.ppf(1 - alpha/2)

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")


Degrees of Freedom = 14
T value = 2.67
P-Value = 0.009081
T Critical Values = -2.98, 2.98
Reject the null hypothesis ... P-value is less than alpha
Degrees of Freedom = 14
T value = 2.67
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 4

The slant shear test is widely accepted for evaluating the bond of resinous repair materials to concrete; it utilizes cylinder specimens made of two identical halves bonded at 30°. For 12 specimens prepared using wire-brushing, the sample mean shear strength (N/mm2) and sample standard deviation were 18.23 and 1.48, respectively, whereas for 12 hand-chiseled specimens, the corresponding values were 23.47 and 4.01. Does the true average strength appear to be different for the two methods of surface preparation? Test the relevant hypotheses using a significance level of .05. (Give ν to exact integer and t to 2 decimal places.)

In [58]:
n1 = 12
xbar1 = 18.59
s1 = 1.78

n2 = 12
xbar2 = 23.75
s2 = 3.99

alpha = 0.05

df = ndof_neq(n1,n2,s1,s2)
sm = sem_neq(n1,n2,s1,s2)

tvalue = (xbar2 - xbar1) / sm

print("Degrees of Freedom = %d" % (df))
print ("T value = %0.2f" % tvalue)

tdist = stats.t(df)

pvalue = 2.0*tdist.cdf(-np.abs(tvalue))

tlow = tdist.ppf(alpha/2)
thigh = tdist.ppf(1 - alpha/2)

print("P-Value = %0.6f" % pvalue)
print ("T Critical Values = %0.2f, %0.2f" % (tlow,thigh))

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

# Python stats package
print()
print("Python stats package result (unequal variances):")
t, pVal = stats.ttest_ind_from_stats(xbar1,s1,n1,xbar2,s2,n2,equal_var=False)
print ("T value = %0.2f" % t)
print("P-Value = %0.6f" % pVal)

# Now, let us compare with the equal variance assumption
df = ndof_eq(n1,n2,s1,s2)
sm = sem_eq(n1,n2,s1,s2)

tvalue = (xbar1 - xbar2) / sm

print("Degrees of Freedom = %d" % (df))
print ("T value = %0.2f" % tvalue)

tdist = stats.t(df)

pvalue = 2.0*tdist.cdf(-np.abs(tvalue))

tlow = tdist.ppf(alpha/2)
thigh = tdist.ppf(1 - alpha/2)

print("P-Value = %0.6f" % pvalue)
print ("T Critical Values = %0.2f, %0.2f" % (tlow,thigh))



if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")


print()
print("Python stats package result (unequal variances):")
t, pVal = stats.ttest_ind_from_stats(xbar1,s1,n1,xbar2,s2,n2,equal_var=False)
print ("T value = %0.2f" % t)
print("P-Value = %0.6f" % pVal)

Degrees of Freedom = 15
T value = 4.09
P-Value = 0.000963
T Critical Values = -2.13, 2.13
Reject the null hypothesis ... P-value is less than alpha

Python stats package result (unequal variances):
T value = -4.09
P-Value = 0.000938
Degrees of Freedom = 22
T value = -4.09
P-Value = 0.000483
T Critical Values = -2.07, 2.07
Reject the null hypothesis ... P-value is less than alpha

Python stats package result (unequal variances):
T value = -4.09
P-Value = 0.000938


# Question 5

Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both an unabraded condition and an abraded condition. Use the paired t test to test Ho: μD = 0 versus Ha: μD > 0 at significance level .01. (Give answers accurate to 2 decimal places.)

In [59]:
u = np.array([31.6, 55.0, 56.5, 38.7, 41.1, 48.8, 27.0, 49.8])
a = np.array([28.7, 20.0, 48.9, 34.5, 39.2, 52.5, 26.9, 46.5])

diff = u - a
mu = 0
alpha = 0.01

df = len(diff) - 1
tdist = stats.t(df)

tcritical = tdist.ppf(1-alpha)
print ("Critical t-value = %0.2f" % tcritical)

t, pVal = stats.ttest_1samp(diff, mu)
print ("T-Value = %0.2f, P-Value = %0.3f" % (t,2*pVal))
if (pVal < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

Critical t-value = 3.00
T-Value = 1.51, P-Value = 0.349
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 6

Data on the modulus of elasticity obtained 1 minute after loading in a certain configuration and 4 weeks after loading for the same lumber specimens is presented here.

Calculate and interpret an upper confidence bound for the true average difference between 1-minute modulus and 4-week modulus; first check the plausibility of any necessary assumptions. (Use α = 0.05. Round your answer to the nearest whole number.)

The data for this question is stored in a local file called A4Q6.csv

In [60]:
import pandas as pd

df = pd.read_csv('/home/justin/git/phys341/AssignmentTemplates/q6.csv')
df.head()

Unnamed: 0,Observation,1 min,4 weeks,Difference
0,1,10424,9352,1072
1,2,16620,13250,3370
2,3,17300,14720,2580
3,4,15137,12386,2751
4,5,12970,10120,2850


In [61]:
diff = df.Difference
xbar = diff.mean()
sem = stats.sem(diff)
dof = len(diff)-1

alpha = 0.05
cl = 1 - 2*alpha

c_interval = stats.t.interval(cl,dof,loc=xbar,scale=sem)
print ("%0.0f" % c_interval[1])

2774


# Question 7

Give as much information as you can about the P-value of the F test in each of the following situations. (Give answers accurate to 3 decimal places.)

(a) ν1 = 5, ν2 = 10, upper-tailed test, f = 2.52

(b) ν1 = 5, ν2 = 10, upper-tailed test, f = 5.64 

(c) ν1 = 5, ν2 = 10, two-tailed test, f = 5.64 

(d) ν1 = 5, ν2 = 10, lower-tailed test, f = 5.64

(e) ν1 = 40, ν2 = 20, upper-tailed test, f = 3.86

In [62]:
def fpvalue(fvalue,dof1,dof2,test):
    fdist = stats.f(dof1,dof2)

    if (fvalue > 1):
        if test == "upper":
            pvalue = (1-fdist.cdf(fvalue))
        if test == "two":
            pvalue = 2.0*(1-fdist.cdf(fvalue))
        if test == "lower":
            pvalue = fdist.cdf(fvalue)
    else:
        if test == "upper":
            pvalue = fdist.cdf(fvalue)
        if test == "two":
            pvalue = 2.0*fdist.cdf(fvalue)
        if test == "lower":
            pvalue = (1-fdist.cdf(fvalue))
            
    print ("Pvalue = %0.3f" % (pvalue))

In [63]:
fpvalue(fvalue= 3.33, dof1= 5, dof2= 10, test= "upper")

fpvalue(fvalue= 10.48, dof1= 5, dof2= 10, test= "upper")

fpvalue(fvalue= 3.33, dof1= 5, dof2= 10, test= "two")

fpvalue(fvalue= 5.64, dof1= 5, dof2= 10, test= "lower")

fpvalue(fvalue= 3.86, dof1= 40, dof2= 20, test= "upper")




Pvalue = 0.050
Pvalue = 0.001
Pvalue = 0.100
Pvalue = 0.990
Pvalue = 0.001


# Question 8

As the population ages, there is increasing concern about accident-related injuries to the elderly. An article reported on an experiment in which the maximum lean angle—the furthest a subject is able to lean and still recover in one step—was determined for both a sample of younger females (21-29 years) and a sample of older females (67-81 years). The following observations are consistent with summary data given in the article.

YF:	32,	29,	31,	26,	29,	36,	29,	27,	35,	26

OF:	17,	13,	21,	22,	22

Carry out a test at significance level .10 to see whether the population standard deviations for the two age groups are different (normal probability plots support the necessary normality assumption). (Give answer accurate to 2 decimal places.)

In [64]:
yf = np.array([	34,	26,	32,	36,	36,	32,	35,	35,	36,	31])
of = np.array([17,	15,	22,	12,	15])

In [65]:
n1 = len(yf)
n2 = len(of)

dof1 = n1-1
dof2 = n2-1

s1 = yf.std(ddof=1)
s2 = of.std(ddof=1)

xbar1 = yf.mean()
xbar2 = of.mean()

fvalue = s1**2/s2**2

print(fvalue)

0.7307380373073804
