#Chapter 11.  Null Hypothesis Significance Testing

## Contents
### 11.1 NHST for the bias of a coin
### 11.2 Prior knowledge about the coin
### 11.3 Confidence interval and highest density interval
### 11.4 Multiple comparisons
### 11.5 What a sampling distribution is good for
## -----------------------------------------------------------------------------------------------------------------------------

  In null hypothesis significance testing(NHST), the goal of inference is to decide whether a particular value of a model parameter can be rejected. 
  
  For example, we might want to know whether a coin is fair, which in NHST becomes the question of whether we can reject the hypothesis that the bias of the coin has the specific value 0.5.
  To make the logic of NHST concrete, suppose we have a coin that we want to test for fairness. We decide that we will conduct an experiment wherein we flip the coin N = 26 times, and we observe how many times it comes up heads. If the coin is fair, it should usually come up heads about 13 times out of 26 flips. Only rarely will it come up with far
fewer or far greater than 13 heads. 
  Suppose we now conduct our experiment: We flip the coin N = 26 times and we happen to observe z = 8 heads. All we need to do is figure out the probability of getting that few heads if the coin were truly fair. If the probability of getting so few heads is sufficiently tiny, then we doubt that the coin is truly fair.
Notice that this reasoning depends on the notion of repeating the intended experiment, because we are computing the probability of getting 8 heads if we were to repeat an experiment with N = 26. In other words, we are figuring out the probability of getting 8 heads relative to the space of all possible outcomes when N = 26. Why do we restrict consideration to N = 26? Because that was the intention of the experimenter.

  

  The problem with NHST is that the interpretation of the observed outcome depends on the space of possible outcomes when the experiment is repeated. Why is that a problem? Because the definition of the space of possible outcomes depends on the intentions of the experimenter. 
   ## Intention!!!
   If the experimenter intended to flip the coin exactly N = 26 times, then the space of possibilities is all samples with N = 26. But if the experimenter intended to flip the coin for one minute (and merely happened to make 26 flips during that time) then the space of possibilities is all samples that could occur when flipping the coin for one minute. Some of those possibilities would have N = 26, but some would have N = 23, and some would have N = 32, etc. On the other hand, the experimenter might have intended to flip the coin until observing 8 heads, and it just happened to take 26 flips to get there. In this case, the space of possibilities is all samples that have the 8th head as the last flip. Notice that for any of those intended experiments (fixed N, fixed time, or fixed z), the actually-observed data are the same: z = 8 and N = 26. But the probability of the observed
data is different relative to each experiment space. The space of possibilities is determined by what the experimenter had in mind while flipping the coin.

  Do the observed data depend on what the experimenter had in mind? 
  ### We certainly hope not! 
  A good experiment is founded on the principle that the data are insulated from experimenter’s intentions. The coin “knows” only that it was flipped 26 times, regardless of what the experimenter had in mind while doing the flipping. Therefore our conclusion about the coin should not depend on what the experimenter had in mind while flipping it. This chapter explains some of the gory details of NHST, to bring mathematical rigor to the above comments, and to bring rigor mortis to NHST. You’ll see how NHST is committed to the notion that the covert intentions of the experimenter are crucial to interpreting the
data, even though the data are not supposed to be influenced by the covert intentions of the experimenter.



## 11.1 NHST for the bias of a coin
### 11.1.1 When the experimenter intends to fix N
> ☞ Binomial 

Now for some of the mathematical details of NHST. Suppose we intend to flip a coin N = 26 times and we happen to observe z = 8 heads. This result seems to suggest that the coin is biased, because the result is less than the 13 heads that we would expect to get from a fair coin. But someone who is skeptical about the claim that the coin is biased, i.e., a defender of the null hypothesis that the coin is fair, would argue that the seemingly biased result could have happened merely by chance from a genuinely fair coin. Because a “false alarm”(제 1종 오류), i.e., rejection of a null hypothesis when it is really true, is considered to be very costly in scientific practice, we decide that we will only reject the null hypothesis if the probability that it could generate the result is very small, conventionally less than 5%. In other words, to reject the null hypothesis, we need to show that the probability of getting something as extreme as z = 8, when N = 26, is less than 5%.(유의수준)

<figure id="fig.redline0" style="float: none"><img src="files/fig1.png"><figcaption> 
</figcaption></figure>

What is the probability of getting a particular number of heads when N is fixed? The
answer is provided by the binomial probability distribution, which states that the probability
of getting z heads out of N flips is

<figure id="fig.redline0" style="float: none"><img src="files/eq1.png"><figcaption> 
</figcaption></figure>

  Thus, the overall probability of getting z heads in N flips is the probability of any
particular sequence of z heads in N flips times the number of ways of choosing z slots from
among the N possible flips. The product appears in Equation 11.1. An illustration of a
binomial probability distribution is provided in the right panel of Figure 11.1, for N = 26
and θ = .5. Notice that the abscissa ranges from z = 0 to z = 26, because in N = 26 flips it
is possible to get anywhere from no heads to all heads.

## Bernoulli -> Binomial


**The binomial probability distribution in Figure 11.1 is also called a sampling distribution(or Empirical distribution).**
Sampling with replacement(ex. Bootstrapping)
Sampling without replacement




  Our goal is to determine whether the probability of getting the observed
result, z = 8, is tiny enough that we can reject the null hypothesis. By using the binomial probability formula in Equation 11.1, we determine that the probability of getting exactly z = 8 heads in N = 26 flips is 2.3%.

In [11]:
dbinom(8, 26, 0.5, log = FALSE)

In [12]:
x<-rbinom(1000, 26, 0.5)
x

In [13]:
hist(x,density=10)

ERROR: Error in png(tf, width, height, "in", pointsize, bg, res, type = "cairo", : unable to load winCairo.dll: was it built?


ERROR: Error in jpeg(tf, width, height, "in", pointsize, quality, bg, res, type = "cairo", : unable to load winCairo.dll: was it built?


Plot with title "Histogram of x"

<figure id="fig.redline0" style="float: none"><img src="files/hist.png"><figcaption> 
</figcaption></figure>

Therefore, instead of determining the probability of getting exactly the result z from the null hypothesis, we determine the probability of getting z or a result even more extreme than what we would expect. 


The reason for considering more extreme outcomes is this: If we would reject the null hypothesis because the result z is too far from what we would expect, then any other potential result, that has an even more extreme value, would also cause us to reject the null hypothesis. 


**Therefore we want to know the probability of getting the actual outcome or an outcome more extreme relative to what we expect. This total probability is referred to as “the p value”. If this p value is less than a critical amount, then we reject the null hypothesis.**

> ☞ p 값은 연구자가 설정한 진위의 영가설(귀무가설)에서 검정통계치를 희소 또는 극한 값으로 얻을 확률 값을 말한다. 산출된 p 값이 낮을수록 표본자료에서 영가설을 기각할 증거가 강하다는 것이다.



The **critical probability(유의확률)** is conventionally set to 5%. 
> **In other words, we will reject the null hypothesis whenever the total probability of the observed z or an outcome more extreme is less than 5%. **


### 11.1.2 When the experimenter intends to fix z
> ☞ Negative binomial

Suppose that the experimenter did not intend to stop flipping when N flips were reached.
Instead, the intention was to stop when z heads were reached. This scenario can happen
in many real-life situations.



For example, widgets on an assembly line can be checked for defects until z defective widgets are identified. In this situation, z is fixed in advance and N is the random variable. We don’t talk about the probability of getting z heads out of N flips, we instead talk about the probability of taking N flips to get z heads.




What is the probability of taking N flips to get z heads? To answer this question, consider this: We know that the N-th flip is the z-th head, because that is what signalled us to stop flipping. Therefore the previous N − 1 flips had z − 1 heads in some random sequence.



The probability
that the last flip comes up heads is θ. Therefore, the probability that it takes N flips to get z
heads is

<figure id="fig.redline0" style="float: none"><img src="files/eq2.png"><figcaption> 
</figcaption></figure>

<figure id="fig.redline0" style="float: none"><img src="files/eq3.png"><figcaption> 
</figcaption></figure>

<figure id="fig.redline0" style="float: none"><img src="files/Fiq7.png"><figcaption> 
</figcaption></figure>

<figure id="fig.redline0" style="float: none"><img src="files/fig2.png"><figcaption> 
</figcaption></figure>

In [14]:
dnbinom(8, 26, 0.5, log = FALSE)

In [15]:
y<-rnbinom(1000, 26, 0.5)
y

In [16]:
hist(y)

ERROR: Error in png(tf, width, height, "in", pointsize, bg, res, type = "cairo", : unable to load winCairo.dll: was it built?


ERROR: Error in jpeg(tf, width, height, "in", pointsize, quality, bg, res, type = "cairo", : unable to load winCairo.dll: was it built?


Plot with title "Histogram of y"

<figure id="fig.redline0" style="float: none"><img src="files/nb.png"><figcaption> 
</figcaption></figure>

<figure id="fig.redline0" style="float: none"><img src="files/fig2.png"><figcaption> 
</figcaption></figure>

Figure 11.2 shows an example of this probability distribution. This distribution is sometimes
called the **“negative binomial”**. Notice that values of N start at z and rise to infinity,
because it takes at least z flips to get z heads, and it might take a huge number of flips to
finally get the z-th flip.

### 11.1.4 Bayesian analysis

The Bayesian interpretation of data does not depend on the covert intentions of the data collector.
In general, for data that are independent across trials, the probability of the conjoint
set of data is simply the product of the probabilities of the individual outcomes. 



The likelihood function captures everything we assume to influence the data. In the case of the coin, we assume that
the bias of the coin is the only influence on its outcome, and that the flips are independent.
The Bernoulli likelihood function completely captures those assumptions.




**In summary, the NHST analysis and conclusion depend on the covert intentions of the
experimenter, because those intentions define the space of all possible (unobserved) data.**


This dependence of the analysis on the experimenter’s intentions conflicts with the opposite
assumption that the experimenter’s intentions have no effect on the observed data. 



** The Bayesian analysis does not depend on the space of possible unobserved data. The Bayesian
analysis operates only with the actual data obtained.**

## 11.2 Prior knowledge about the coin


Suppose that we are not flipping a coin, but we are flipping a flat-headed
nail. In a social science setting, this is like asking a survey question about left or right
handedness of the respondent, which we know is far from 50/50, as opposed to asking a
survey question about male or female sex of the respondent, which we know is close to
50/50. When we flip the nail, it can land with its point touching the ground (which I’ll call
tails) or it can land balanced on its head with its point sticking up (which I’ll call heads).
We believe, just by looking at the nail and our previous experience with nails, that it will
not come up heads and tails equally often. Indeed, with its narrow head, the nail will very
probably come to rest with its point touching the ground, i.e., “tails”. In other words, we
have a strong prior belief that the nail is tail-biased. Suppose we flip the nail 26 times and
it comes up heads on 8 flips. Is the nail “fair”? Would we use it to determine who gets to
kick off at the Superbowl?


Prior를 아는 경우 - coin

Prior를 모르는 경우 - nail

### 11.2.1 NHST analysis

The NHST analysis does not care if we are flipping coins or nails. The analysis proceeds
the same way as before. To determine whether the nail is biased, we first declare the experimenter’s
intentions and then compute the probability of getting 8 heads or more if the
nail were fair. As we saw in the previous section, if we declare that the intention was to
flip the nail 26 times, then an outcome of 8 heads means we do not reject the hypothesis
that the nail is fair. Let me say that again: We have a nail for which we have a strong prior
belief that it is tail biased. We flip the nail 26 times, and find it comes up heads 8 times.


** We conclude, therefore, that we cannot reject the null hypothesis that the nail can come up
heads or tails 50/50. Huh? This is a nail we’re talking about. How can you not reject the
null hypothesis?**
> ☞ 분석 불가!

### 11.2.2 Bayesian analysis 


**The Bayesian statistician starts the analysis with an expression of the prior knowledge.**
We know **from prior experience** that the narrow-headed nail is biased to show tails, so we
express that knowledge in a prior. In a scientific setting, the prior is established by appealing
to **publicly accessible and reputable previous research. **

> ☞ 경험적 prior를 이용하여 분석이 가능하다!

**The differing inferences for a coin and a nail make good intuitive sense. Our posterior
beliefs about the bias of the object should depend on our prior knowledge of the object: 8
heads in 26 flips of narrow-headed nail should leave us with a different opinion than 8 heads
in 26 flips of a coin.**

## * beta 분포?

### Beta distribution
A probability density of that form is called a beta distribution. Formally, a beta distribution has two parameters, called a and b, and the density itself is defined as
<figure id="fig.redline0" style="float: none"><img src="files/5.png"><figcaption> 
</figcaption></figure>
where B(a, b) is simply a normalizing constant that ensures that the area under the beta
density integrates to 1.0, as all probability density functions must. In other words, the
normalizer for the beta distribution is B(a, b) ie. beta function.

<figure id="fig.redline0" style="float: none"><img src="files/Beta_distribution.png"><figcaption> 
</figcaption></figure>

#### Application: Bayesian inference

The use of Beta distributions in Bayesian inference is due to the fact that they provide a family of conjugate prior probability distributions for binomial (including Bernoulli) and geometric distributions. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p.


베이즈 추론(Bayesian inference)은 통계적 추론의 한 방법으로, 추론해야 하는 대상의 사전 확률과 추가적인 관측을 통해 해당 대상의 사후 확률을 추론하는 방법이다. 베이즈 추론은 베이즈 확률론을 기반으로 하며, 이는 추론하는 대상을 확률변수로 보아 그 변수의 확률분포를 추정하는 것을 의미한다.    

## --------------------------------------------------------------------------------------------------------------

#### 11.2.2.1 Priors are overt and should influence

Some people might assert that prior beliefs are just as **mysterious as the experimenter’s
intentions. But this assertion is wrong. Prior beliefs are not capricious and idiosyncratic.
Prior beliefs are overt, explicitly debated, and consensual.**

> ☞ Prior를 이용한 베이즈 추론은 실험자의 intention이 들어간 분석이 아니다!


Some people might wonder, if subjective priors are allowed for Bayesian analyses, then why not allow subjective intentions for NHST? 


**Because the subjective intentions in the data collector’s mind do not influence the data and therefore should not influence the analysis.Subjective prior beliefs, on the other hand, are not about how beliefs influence the data,
but about how the data influence beliefs: Prior beliefs are the starting point from which we move in the light of new data.**

> ☞ subjective intentions와 prior는 데이터에 영향을 미치지 않는다. 다만, prior는 앞으로 새로운 데이터에 대한 믿음이다 ??

## 11.3 Confidence interval and highest density interval
### 11.3.1 NHST confidence interval


The primary goal of NHST is determining whether a particular ”null” value of a parameter
can be rejected. One can also ask what range of parameter values would not be rejected.
This range of non-rejectable parameter values is called the **confidence interval.** 

(There are different ways of defining an NHST confidence interval; this one is conceptually the most
general and coherent with NHST precepts.) 


The 95% confidence interval consists of all values of θ that would not be rejected by a (two-tailed) significance test that allows 5% false alarms.




For example, in a previous section we found that θ = .5 would not be rejected when
z = 8 and N = 26, for a flipper who intended to stop when N = 26. The question is, which
other values of θ would we not reject? Figure 11.4 shows the sampling distribution for
different values of θ. The upper row shows the case of θ = 0.144, for which the sampling
distribution has z = 8 snug against the upper rejection tail. In fact, if θ is nudged any smaller,
the rejection tail includes z = 8, which means that smaller values of θ can be rejected. The
lower row of Figure 11.4 shows the case of θ = 0.517, for which the sampling distribution
has z = 8 snug against the lower rejection tail. If θ is nudged any larger, the rejection tail
includes z = 8, which means that larger values of θ can be rejected. 




**
In summary, the range
of θ values we would not reject is θ ∈ [.144, .517]. This is the 95% confidence interval
when z = 8 and N = 26, for a flipper who intended to stop when N = 26.**

<figure id="fig.redline0" style="float: none"><img src="files/fig4.png"><figcaption> 
</figcaption></figure>

We can also determine the confidence interval for the experimenter who intended to
stop **when z = 8.** Figure 11.5 shows the sampling distribution for different values of θ. The upper row shows the case of θ = 0.144, for which the sampling distribution has N = 26
snug against the lower rejection tail. 


In fact, if θ is nudged any smaller, the rejection tail includes N = 26, which means that smaller values of θ can be rejected. The lower row of Figure 11.5 shows the case of θ = 0.493, for which the sampling distribution has N = 26
snug against the upper rejection tail. If θ is nudged any larger, the rejection tail includes N = 26, which means that larger values of θ can be rejected. In summary, the range of θ values we would not reject is θ ∈ [.144, .493]. This is the 95% confidence interval when z = 8 and N = 26, for a flipper who intended to stop when z = 8.

<figure id="fig.redline0" style="float: none"><img src="files/fig5.png"><figcaption> 
</figcaption></figure>

We have just seen that the NHST confidence interval depends on the covert intentions
of the experimenter. When the intention was to stop when N = 26, then the range of
biases that would not be rejected is θ ∈ [.144, .517]. But when the intention was to stop
when z = 8, then the range of biases that would not be rejected is θ ∈ [.144, .493] (the
fact that the lower ends of the confidence intervals are the same is merely an accidental
coincidence for this case). The confidence interval depends on the experimenter’s intention
because those intentions dictate the space of possible unobserved data relative to which the
actually observed data are judged. If the experimenter had other intentions, such as flipping for a fixed duration, then the confidence interval would be yet something different. Thus,
the interpretation of the NHST confidence interval is as convoluted as the interpretation of
NHST itself, because the confidence interval is merely the significance test conducted at
every candidate value of θ.


**
The confidence interval tells us something about the probability of extreme unobserved
data values that we might have gotten if we repeated the experiment according to the covert
intentions of the experimenter. But the confidence interval tells us little about the believability
of any particular θ value, which is what we want to know.**

> ☞  유의 수준은 판단의 기준이 된다.

### 11.3.2 Bayesian HDI

A concept in Bayesian inference, that is somewhat analogous to the NHST confidence interval,
is the ***highest density interval (HDI)**, which was introduced in Section 3.3.5, p. 34.




Let’s consider the HDI when we flip a coin and observe z = 8 and N = 26. Suppose we
have a prior informed by the fact that the coin appears to be authentic, which we express
here, for illustrative purposes, as a beta(θ|11, 11) distribution. The left side of Figure 11.3
shows that the 95% HDI goes from θ = 0.261 to θ = 0.533. These limits span the 95% most
believable values of the bias. Moreover, the posterior density shows exactly how believable
each bias is. In particular, we can see that θ = .5 is within the 95% HDI, which we might
use as a criterion if we are forced to categorically declare whether or not fairness is credible.

<figure id="fig.redline0" style="float: none"><img src="files/1.png"><figcaption> 
</figcaption></figure>

<figure id="fig.redline0" style="float: none"><img src="files/2.png"><figcaption> 
</figcaption></figure>

### There are at least three advantages of the HDI over an NHST confidence interval.


**First,the HDI has a direct interpretation in terms of the believabilities of values of θ.** The HDI
is explicitly about p(θ|D), which is exactly what we want to know. The NHST confidence
interval, on the other hand, has no direct relationship with what we want to know; there’s
no clear relationship between the probability of rejecting the value θ and the believability
of θ. 
> ☞   likelihood 개념

**Second, the HDI has no dependence on the intention of the experimenter during data
collection**, because the likelihood has no dependence on the intention of the experimenter
during data collection. The NHST confidence interval, in contrast, tells us about probabilities
of data relative to what might have been if we replicated the experimenter’s covert
intentions. 



**Third, the HDI is responsive to the analyst’s prior beliefs, as it should be.** The
Bayesian analysis indicates how much the new data should alter our beliefs. The prior beliefs
are overt and publicly decided. The NHST analysis, on the contrary, is ignorant of,
and unresponsive to, the accumulated prior knowledge of the scientific community.

### 11.5 What a sampling distribution is good for
I hope to have made it clear that sampling distributions aren’t as useful as posterior distributions
for making inferences about hypotheses from a set of observed data. The reason is that
sampling distributions tell us the probabilities of possible data if we run an intended experiment
given a particular hypothesis, rather than the believabilities of possible hypotheses
given that we have a particular set of data. Nevertheless, sampling distributions are appropriate
and useful for other applications. Two of those applications are described in the
following sections.

###  11.5.1 Planning an experiment

Until this point in the book, I have emphasized analysis of data that have already been
obtained. But a crucial part of conducting research is planning the study before actually
obtaining the data. When planning research, we have some hypothesis about how the world might be, and we want to gather data that will inform us about the viability of that hypothesis.
> ☞ 표본조사론
> ex) 중고등학교 학생 중 흡연자의 수 조사


### 11.5.2 Exploring model predictions (posterior predictive check)
A Bayesian analysis only indicates the relative veracities of the various parameter values or
models under consideration. The posterior distribution only tells us which parameter values
are **relatively less bad than the others. The posterior does not tell us whether the least bad
parameter values are actually any good.**