In [1]:
from math import comb
import numpy as np

**Exercise 11.2. [Purpose: To determine NHST CIs, and notice that they depend on the experimenter's intention.]** We continue with the scenario of the previous exercise: A dichotomous outcome, with $N = 45$ and $z=3$. 

**(A)** If the intention is to stop when $N=45$, what is the 95% CI?

*Hints*: Try this continuation of the R script from the previous exercise:

```
for ( theta in seq( 0.170 , 0.190 , 0.001) ) {
   show( c(
      theta ,
      2*sum( choose(N, lowTailZ) * theta^lowTailZ * (1-theta)^(N - lowTailZ) )
   ))
}

highTailZ = z:N
for ( theta in seq( 0.005 , 0.020 , 0.001) ) {
   show( c(
      theta ,
      2*sum( choose(N, highTailZ) * theta^highTailZ * (1-theta)^(N - highTailZ) )
   ))
}
```

Explain carefully what the code does and what it means!

In [2]:
N = 45 
theta = 1 / 6

# We calculate the probability for each value of z
dist = np.array([comb(N, z) * theta ** z * (1 - theta) ** (N - z) for z in range(N + 1)])

# We sort the values of z in increasing order
# of probability, and we start taking elements from the left and right tail using this
# ordering until we get a probabiliy mass of at least 0.05.
# Because the binomial distribution is unimodal, I am assuming that the 95% CI is 
# between the minimum and the maximum z values that were not selected during the 
# while loop
mass = 0
i = 0
values = list(range(N + 1))
indexes = np.argsort(dist)
while mass < 0.05:
    mass = mass + dist[indexes[i]]
    values.remove(indexes[i])
    i = i + 1
print([min(values), max(values)])

[4, 12]


Notice that $z=3$ is outside the 95% CI. If we add `print(indexes[i])` to the `while` loop above we will observe that $z=3$ is the last value to be removed and makes the probability mass "jump" from 0.04 to 0.07. We would reject the null hypothesis for $z=3$ using this sampling intention, as we already did in the first exercise.

**(B)** If the intention is to stop when $z = 3$, what is the 95% CI? Is the CI the same as for stopping when N = 45?

*Hint*: Modify the R script of the previous part for use with stopping at $z$, like the second part of the previous exercise.

In [3]:
N = 45 
theta = 1 / 6

dist = np.array([z / N * comb(N, z) * theta ** z * (1 - theta) ** (N - z) for z in range(N + 1)])

mass = 0
i = 0
values = list(range(N + 1))
indexes = np.argsort(dist)
while mass < 0.05:
    mass = mass + dist[indexes[i]]
    values.remove(indexes[i])
    i = i + 1
print([min(values), max(values)])

[6, 10]


The 95% CI is narrower than in the previous case. That seems to align with the results from exercise 1, in which the $p$ value when using this second sampling intention was much lower than when using the first sampling intention. 