In [60]:
from scipy.stats import norm, t
import numpy as np

<hr style="color: #009933; border: solid 1px">
<span style="color: #009933;">unknown std, $s$</span>

# <span style="color: #2455C3">Eating with friends</span>

You and your friends want to go out to eat, but you don't want to pay a lot. Yu decide to either go to Gettysburg or Wilma. You look online and find the average meal prices at 18 restaurants in Gettysburg and 14 restaurants in Wilma. You want to know if statistically there is a significant difference between the meal prices in each of these two areas of town.

## <span style="color: #85100F">Solution</span>

#### <span style="color: #85100F">1. Statating the hypothesis</span>

$$\begin{array}{cl}
H_0: & \mu_G = \mu_W\\
 & \text{there is no significantly difference in prices}\\
H_A: & \mu_G \neq \mu_W\\
 & \text{there is a significant difference in prices}
\end{array}$$

#### <span style="color: #85100F;">2. Analyzing sample data</span>

In [61]:
# known data
# two-sampple two-tailed test
alpha = 0.05

In [74]:
# samples - prices of reataurants of the area
gettysburg = np.array([9,5,6,11,8,5,7,13,12,13,9,8,10,6,11,9,7,12])
wilma = np.array([11,10,12,9,8,13,14,15,12,11,13,8,9,11])

In [75]:
# mean and s
xbarG = gettysburg.mean()
sG = gettysburg.std(ddof=1)
nG = len(gettysburg)
print "xbarG={:0.3f},  sG={:0.3f}, nG={:0.3f}".format(xbarG, sG, nG)
xbarW = wilma.mean()
sW = wilma.std(ddof=1)
nW = len(wilma)
print "xbarW={:0.3f}, sW={:0.3f}, nW={:0.3f}".format(xbarW, sW, nW)

xbarG=8.944,  sG=2.645, nG=18.000
xbarW=11.143, sW=2.179, nW=14.000


In [90]:
# pooled variance
SSG = sum((gettysburg - gettysburg.mean())**2)
SSW = sum((wilma - wilma.mean())**2)
Sp_q = (SSG+SSW)/(nG+nW-2)
Sp_q

6.0219576719576731

In [98]:
np.sqrt(Sp_q)

2.4539677406106368

In [91]:
SE = np.sqrt(Sp_q/nG + Sp_q/nW)
SE

0.87446728795816642

#### <span style="color: #85100F;">3. Test statistic calculation</span>

In [92]:
t_score = (xbarG-xbarW) / SE
t_score

-2.5140022144749081

#### <span style="color: #85100F;">4. Critical point determination</span>

In [93]:
dof = nG + nW - 2
t_critical = t.ppf(alpha/2, dof)
t_critical

-2.0422724563012382

#### <span style="color: #85100F;">5. Results interpretation</span>

<span style="color: #009933;">The <b>null hypothesis is rejected with $p < 0.05$</b>, which means that <b>there is a significant difference between the meal prices</b></span>

1) <u>Descriptive statistics:</u>
    $$\begin{array}{ccc}
    \bar{x}_G & = & 8.994\\
    \bar{x}_W & = & 11.143\\
    s_G & = & 2.645 \\
    s_W & = & 2.179 \\
    S_p & = & 2.45
    \end{array}$$

2) <u>Inferential statistics:</u>

t(30)=-2.58, p=.018, two-tailed<br>
Confidence interval on the mean difference meal prices between Gettysburg and Wilma<br>
95% CI = (-3.98 to -0.41)

3) <u>Effect size measures:</u>

* Cohen's d = -0.90
* R^2 = .17

So <b>17%</b> of the <b>difference in meal prices</b> can be <b>attributed to the area</b>.

In [94]:
# P-value
P_value = 2*t.cdf(t_score, dof)
print "P-value = {:0.3f}".format(P_value)

P-value = 0.018


In [95]:
# CI
t_char = t.ppf(1-alpha/2, dof)
me = t_char * SE
print "95% CI = ({:0.2f} to {:0.2f})".format((xbarG-xbarW)-me, (xbarG-xbarW)+me)

95% CI = (-3.98 to -0.41)


In [96]:
# Cohen's d
d = (xbarG-xbarW) / np.sqrt(Sp_q)
print "d = {:0.2f}".format(d)

d = -0.90


In [97]:
# R^2
r_squared = t_score**2 / (t_score**2 + (dof))
print "R^2 = {:0.2f}".format(r_squared)

R^2 = 0.17


# <span style="color: #2455C3">ADHD good behaviors</span>

A researcher examined the effects of two different incentives to improve behavior in two groups of boys diagnosed with ADHD. The researcher randomly assigned 10 boys to each group. The boys earned points for every time they engaged in good behavior (e.g. raising a hand to ask a question, helping a fellow student, completing assigned work). Points could be exchanged for different incentives. For one group, the points could be exchanged for extra time at recess. For the other group, the points could be exchanged for prizes from the school gift shop (e.g., pencils, small toys). The researcher measured the number of good behaviors in a single 20-minute class period. The mean for the recess group was 10 and the mean for the prize group was 7. The standard error is 0.94. The pooled standard deviation was 2.33

## <span style="color: #85100F">Solution</span>

#### <span style="color: #85100F">1. Statating the hypothesis</span>

$$\begin{array}{cl}
H_0: & \mu_{\text{recess}} = \mu_{\text{prize}}\\
 & \text{there will not be a difference in the number}\\
 & \text{of good behaviors between the two incentive conditions}\\
H_A: & \mu_{\text{recess}} \neq \mu_{\text{prize}}
\end{array}$$

#### <span style="color: #85100F;">2. Analyzing sample data</span>

In [46]:
# known data
# two-sample two-tailed test
alpha = 0.05
# recess -> 1
xbar1 = 10.0
n1 = 10.0
# prize -> 2
xbar2 = 7.0
n2 = 10
SE = 0.94
Sp = 2.33

#### <span style="color: #85100F;">3. Test statistic calculation</span>

In [45]:
t_score = (xbar1-xbar2) / SE
t_score

3.191489361702128

#### <span style="color: #85100F;">4. Critical point determination</span>

In [43]:
dof = n1 + n2 - 2
t_critical = t.ppf(1-alpha/2, dof)
t_critical

2.1009220402409601

#### <span style="color: #85100F;">5. Results interpretation</span>

<span style="color: #009933;">The <b>null hypothesis is rejected with $p < 0.01$</b>, which means that <b>there is a significant difference in the number of good behaviours between the two incentive conditions</b>. In fact because $d=1.29$ <b>the best incentive is the recess one</b></span>

1) <u>Descriptive statistics:</u>
    $$\begin{array}{lcl}
    \bar{x}_{\text{recess}} & = & 10.0\\
    \bar{x}_{\text{prize}} & = & 7.0\\
    S_P & = & 2.33
    \end{array}$$

2) <u>Inferential statistics:</u>

t(18)=3.19, p=.005, two-tailed<br>
Confidence interval on the mean difference of goodbehaviors between recess incentive and prize incentive<br>
95% CI = (1.03 - 4.97)

3) <u>Effect size measures:</u>

* Cohen's d = 1.29
* R^2 = .36

So <b>36%</b> of the <b>difference in number of good behaviors</b> can be <b>attributed to the incentive condition</b>.

In [54]:
# P-value
P_value = 2*t.sf(t_score, dof)
print "P-value = {:0.3f}".format(P_value)

P-value = 0.005


In [55]:
# CI
t_char = t.ppf(1-alpha/2, dof)
me = t_char * SE
print "95% CI = ({:0.2f} to {:0.2f})".format((xbar1-xbar2)-me, (xbar1-xbar2)+me)

95% CI = (1.03 to 4.97)


In [56]:
# Cohen's d
d = (xbar1-xbar2) / Sp
print "d = {:0.2f}".format(d)

d = 1.29


In [57]:
# R^2
r_squared = t_score**2 / (t_score**2 + (dof))
print "R^2 = {:0.2f}".format(r_squared)

R^2 = 0.36
