# Homework 1: Review of Inferential Statistics & Introduction to Model Comparison

## PSYC5501: Experimental Design & Statistics

### Seung Kim, 29 September 2020


**1. A set of reading scores for fourth grade students has a mean of 25 and a standard deviation of 5. A set of scores for ninth grade students has a mean of 30 and a standard deviation of 10. Assume that the distributions are normal.**


**A. What percentage of the fourth graders score better than the average ninth grader?**
    

**Solution.** The average reading score for fourth graders is 25 with a standard deviation of 5; the average score for ninth graders is 30, which is exactly at the first standard deviation of the distribution of the fourth graders' reading scores. This means that the portion of fourth graders with scores at $z > 1$ have scored better than the average ninth grader.

Or, more mechanically,

$$z = \frac{30-25}{5} = 1$$

Looking at the z-table, we see that the area of the right tail ('better than') bounded by $z = 1$ is 0.1587. Therefore, the percentage of fourth graders who score better than the average ninth grader is 15.9%. We can use R to double-check:

In [1]:
pnorm(30, mean=25, sd=5, lower.tail=FALSE) 

**B. What percentage of ninth graders score worse than the average fourth grader?**

**Solution.** The average reading score for ninth graders is 30 with a standard deviation of 10; the average score for fourth graders is 25. The score of 25 is at the -0.5th standard deviation of the distribution of ninth graders' scores. This means that the portion of ninth graders with scores at z < -0.5 have scored worse than the average fourth grader.

More mechanically:

$$z = \frac{25-30}{10} = -0.5$$

Looking at the z-table again, we see that the area of the left tail ('worse than') bounded by z = -0.5 is 0.3085. Therefore, the percentage of ninth graders who score worse than the average fourth grader is 30.1%. We can again use R to double-check:

In [2]:
pnorm(25, mean=30, sd=10, lower.tail=TRUE)

**2. Describe the relationships among a population distribution, a sample distribution, and a sampling distribution. What does the Central Limit Theorem tell us about the sampling distribution for a given population?**

**Solution.** 

* The **population distribution** is the distribution of every individual score in the population at study. The population distribution is rarely available in full, and may follow many distributional shapes (though most that are relevant to social sciences are assumed to be distributed normally). 
* The **sample distribution** is the distribution of individual scores within *one* sample. Given a large enough sample size, the sample shape and statistics tend to mirror the population shape and parameters, including mean and variance, though samples always come with sampling error—unless all scores in the population are the same or our sample is equal to the population (we somehow manage to get every single individual), the sample statistics will always be different from the population parameters. 
* The **sampling distribution** is the distribution of a statistic gathered from multiple samples of a population. We are most concerned with the sampling distribution of means—the distribution of means of repeated samples from the same population—because, according to the **Central Limit Theorem**, the sampling distribution approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This is advantageous because, given a large enough sample size, we can use the normality of the sampling distribution to make inferences about the probability of obtaining a sample mean even if the population is non-normal. 

**3. A recently admitted class of graduate students at a large state university has a mean GRE verbal score of 650 with a standard deviation of 50. The scores are reasonably normally distributed. Five students have parents who happen to be on the board of trustees, and these students were admitted with a mean GRE score of 550. Should the local newspaper editor write a scathing editorial about favoritism? Assume that you are only interested in whether this group of students has a lower GRE than the university average.**

**Solution.** We would like to see whether the five well-connected students belong to the population of recently admitted graduate students or a different population with a lower GRE average. We can articulate the hypotheses as follows:

* $H_0: \mu = 650$
* $H_1: \mu < 650$ (not $\neq$ because we're only concerned whether the sample is lower)

~~Since we know both the population mean and standard deviation, we can perform a $z$-test; and since you specified that we only care whether this group has a lower GRE than the average, we go with a one-tailed test.~~

~~Even from just looking at the facts, we can see that the sample mean of 550 is two standard deviations lower than the population mean of 650. We can calculate the $z$-score mechanically just to be thorough: $z = \frac{550-650}{50} = -2$.~~

We need to use the standard error here, because 

Looking at the $z$-table, we see that the area of the lower tail (the $p$-value) bounded by $z=-2$ is $0.0228$. This means that we have a 2.3% probability of seeing an individual with a score as low or lower than 550 given that they are in the $\mu=650$ population. That is more extreme than the (arbitrary) $\alpha=0.05$ threshold. Even if we were to increase the rigor by doing a two-tailed test, $0.0228$ is still more extreme than $0.025$. So, we have a solid basis for rejecting $H_0$—the five well-connected students are from a population with a mean that is lesser than 650. The local newspaper should go ham on them.

**4. Provide a precise definition of the conditional probability that you obtained in #3.**

**Solution.** The $p$-value we obtained (0.0228) is the probability that we observe a sample mean of 550 or less given that the population from which the sample was collected has a mean of 650 (i.e. given that the $H_0$ is true).

$$p = P(X\leq550|\mu=650)$$

**5. Can repressed anger lead to higher blood pressure? In a hypothetical study, 6 college students with very high repressed anger scores (derived from a series of questionnaires taken in an introductory psychology class) are called in to have their blood pressure measured. The systolic blood pressure readings for the 6 students are 115, 118, 127, 129, 135, and 126. If the mean systolic blood pressure in the population is 120, can you conclude that repressed anger is associated with higher blood pressure? Carry out a one-sample t-test to address this question. Use α=.05 and conduct a two-tailed test. Be sure to state the null and alternative hypotheses.**

**Solution.** We begin by articulating our hypotheses. We want to find out whether students with very high repressed anger scores have higher blood pressures than the general population of students. 

* $H_0: \mu = 120$ 
* $H_1: \mu \neq 120$ (not $>$ because this is a two-tailed test)

Then we can carry out a one-sample, two-tailed t-test as follows:

In [3]:
X = c(115, 118, 127, 129, 135, 126) # sample

In [4]:
mu = 120                 # population mean
s = sd(X)                # StDev of sample
se = s/sqrt(length(X))   # SE of sampling distribution

In [5]:
t = (mean(X) - mu)/se
t

Since $\alpha = 0.05$, $\text{df}=n-1=5$ and we are conducting a two-tailed test, our $t_{0.025, 5} = 2.571$—that is, we consider the mean sample blood pressure to be significantly different from the mean population blood pressure if our experimental $t > 2.571$ or $t < -2.571$. Since $t=1.667$ does not satisfy either of these criteria, we reject $H_1$ and retain the $H_0$. 

We can confirm using R's built-in `t.test` function. The $p$-value (0.1565) is less extreme than $\alpha =0.05$:

In [6]:
t.test(X, mu=120)


	One Sample t-test

data:  X
t = 1.6667, df = 5, p-value = 0.1565
alternative hypothesis: true mean is not equal to 120
95 percent confidence interval:
 117.2883 132.7117
sample estimates:
mean of x 
      125 


**6. Use the general linear model approach to answer the same question that is asked in #5. Be sure to fully write out the full and restricted models, and show all of your steps.**

**Solution.** The GLM approach compares two models, one 'complex' and one 'simple', to see whether the simpler model has a substantially worse fit that the complex one. The complex model estimates factors, while the simpler model does not assume anything (*I think it would be nice to have this explained more*).

* Full: $Y_i = \bar{Y} + \epsilon_i$
* Restricted: $Y_i = \mu + \epsilon_i$


We have six scores: 115, 118, 127, 129, 135, and 126, of which the sample mean is 125. The population mean is 120. So, we can set up the following complex (full) and simple (restricted) models:

* Full: $Y_i = 125 + \epsilon_i$
* Restricted: $Y_i = 120 + \epsilon_i$

The goal is to see whether these two differ significantly enough. To do this, we need the sums of individual squared errors from both models and their degrees of freedom. First, let's get the sums:

In [7]:
# Full model
e_F = X-mean(X)
E_F = sum(e_F^2)
E_F

In [8]:
# Restricted model
e_R = X-mu
E_R = sum(e_R^2)
E_R

So the sum of squared errors is 270 for the full model and 420 for the restricted model. Now, let's get the degrees of freedom:

In [9]:
n = length(X)
df_F = n-1
df_R = n
cat('Full DF:', df_F, '\n') 
cat('Restricted DF:',df_R)

Full DF: 5 
Restricted DF: 6

Now we have all the pieces we need to calculate the $F$-statistic. We'll calculate according to the following formula:

$$F = \frac{(E_R-E_F)/(df_R-df_F)}{E_F/df_F}$$

In [10]:
num = (E_R-E_F)/(df_R-df_F)
denom = (E_F/df_F)
F = num/denom
F

OK, so we have $F=2.778$. Since $df_{num}=1$ and $df_{denom}=5$, we look at the $F$-table to find $F_{crit}(1, 5)=6.6079$. This is much more extreme than $2.778$. So we do not have the evidence to claim that the more complex model performs significantly better than the simpler model. Just to check, we can calculate the $p$-value for $F=2.778$:

In [11]:
pf(F, df_R-df_F, df_F, lower.tail = FALSE)

This is indeed larger than $\alpha=0.05$, and equals the $p$-value we obtained in #5.

**7. How does the test statistic you computed in #6 relate to the one you computed in #5?**

**Solution.** The $F$-statistic in #6 is the square of the $t$-statistic in #5. We can quickly check to see that the square root of the $F$-statistic is equal to the $t$-statistic we observed (1.667):

In [12]:
all.equal(t, sqrt(F)) # checks out!

**8. Use R to carry out the analyses from #5 and #6. Paste your code and your output.**

**Solution.** See #5 and #6. 

**9. A researcher is interested in assessing the effectiveness of a prenatal care intervention on newborns’ birthweight. Adolescent pregnant women – who tend to have low birthweight infants – are identified and invited to participate in an experiment. Those who wish to participate are randomly assigned to one of two groups: an experimental group or a control group. Women in the experimental group participate in the prenatal care program, whereas women in the control group do not. After their babies were born, the following data were collected (DV = Newborn’s birthweight in lbs):**

In [13]:
Experimental <- c(5.6, 7.7, 8.1, 7.6, 8.8, 7.5, 6.6, 8.4, 7.2, 7.5)
Control <- c(5.6, 6.4, 5.6, 5.9, 6.6, 7.4, 6.4, 7.0, 4.8, 4.3)
Infants <- data.frame(cbind(Experimental, Control))
Infants

Experimental,Control
<dbl>,<dbl>
5.6,5.6
7.7,6.4
8.1,5.6
7.6,5.9
8.8,6.6
7.5,7.4
6.6,6.4
8.4,7.0
7.2,4.8
7.5,4.3


**Using the GLM model comparison approach, test whether the treatment program result in significantly different birthweights for the newborns. You may use R to do all of the calculations (e.g., to calculate means, $E_F$, and $E_R$), but be sure to write out the full and restricted models and show all necessary steps. Show any relevant code and output.**

**Solution.** Okay, the two-group approach is a little different. For the full model, we have to estimate the population mean for each group (so two), but for the restricted model, we assume that both groups come from the same population, and estimate only one population mean. The mean is 7.5 for the Experimental group, 6 for the Control group, and 6.75 for the grand mean. This gives us the following models:

* Full: $Y_{ij} = \mu_j+\epsilon_{ij}$ where $\mu_1=7.5$ and $\mu_2=6$,
* Restricted: $Y_{ij} = 6.75 + \epsilon_{ij}$

So, let's first get the sums of individual squared errors for both models ($E_F$ and $E_R$). 

In [14]:
# Full model
e_F = c((Experimental-mean(Experimental)), (Control-mean(Control)))
E_F = sum(e_F^2)
E_F

In [15]:
# Restricted model
all = c(Experimental, Control)
e_R = all-mean(all)
E_R = sum(e_R^2)
E_R

Then, we can get the degrees of freedom for both models.

In [16]:
N = length(all)
df_F = N-2 # estimated two parameters: through mean(Experimental) and mean(Control)
df_R = N-1 # estimated only one parameter: through mean(all)
cat('Full DF:', df_F, '\n') 
cat('Restricted DF:',df_R)

Full DF: 18 
Restricted DF: 19

Cool. Now we can put them all together!

In [17]:
num = (E_R-E_F)/(df_R-df_F)
denom = (E_F/df_F)
F = num/denom
F

Looking up $F_{crit}(1, 18)$, we obtain 4.4139. Our experimental $F=12.88$ is *much* more extreme than 4.4139. The full model does perform much better than the restricted model. So, we have a pretty solid basis for rejecting the null hypothesis—prenatal care in adolescent mothers does improve newborns' birth weights. Just to check, we can calculate the $p$-value, which is much smaller than $\alpha=0.05$:

In [18]:
pf(F, df_R-df_F, df_F, lower.tail = FALSE)

**10. Use R to carry out an independent-samples t-test using the data from #9. What is the relationship between the test statistic you computed in #9 and the test statistic you computed in #10? Show any relevant code and output.**

**Solution**. Doing a t-test gives us a $t$-statistic of 3.5891, and gives us the same $p$-value (0.002097) that we obtained in the GLM comparison in #9. 

In [19]:
t.test(Experimental, Control, alternative = "two.sided", var.equal = TRUE)


	Two Sample t-test

data:  Experimental and Control
t = 3.5891, df = 18, p-value = 0.002097
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.6219587 2.3780413
sample estimates:
mean of x mean of y 
      7.5       6.0 


The $t$-statistic is still the square root of the $F$-statistic we obtained in #9 (the very small error coming from the significant figures cutoff):

In [20]:
all.equal(3.5891, sqrt(F))