In [None]:
%matplotlib inline
%load_ext rpy2.ipython
import matplotlib.pyplot as plt
import numpy as np
from numpy import mean, sqrt, std, fabs
from scipy.stats import t as tdist

# stats60 specific
from scipy.stats import norm as ndist
from code import roulette
from code.week1 import normal_curve
from code.probability import ProbabilitySpace, BoxModel, Normal
from code.week7 import studentT_curve
figsize = (8,8)


## Testing hypotheses 

### Is online gaming fair?

* If you play online poker or roulette, how do you know it’s fair?
* I placed 10 bets on  RED.
* These are the results: [0,0,1,0,1,0,0,1,0,1].
* Is the game fair?

### Example (continued)

- I *observed*
   4 successes.
- If the roulette wheel is fair, I would expect to see $10 \times 18/ 38=4.7$ successes, give or take $\sqrt{10} \times \sqrt{18/38 \times 20/38}=1.4$.
- In standardized units, my observed 4 successes converts to $$\frac{4 - 4.7}{1.4} = -0.5$$
- This seems reasonable, it is not very large relative to 1 and it could be just a chance error. After all, chance errors are typically size 1 in standardized units.

### Example (continued)

- I placed 10 more identical bets, with results [1,0,1,1,1,1,0,1,0,0]. So, in 20 bets, I have observed 10 successes.
- If the roulette game is fair, I would expect to see 
$$20 \times 18/ 38=9.5$$ successes, give or take $$\sqrt{20} \times \sqrt{18/38 \times 20/38}=2.2$$
- In standardized units, my observed 10 successes converts to $$\frac{10 - 9.5}{2.2} = 0.2$$
- Again, this seems reasonable, it is not very large relative to 1 and it could be just a chance error. After all, chance errors are typically size 1 in standardized units.

### What can I conclude?

- Based on betting on  RED, I cannot conclude that the roulette wheel is rigged.
- It is possible that some other bets might not have matched the expected number of successes as well. I can only draw a conclusion based on my 20 bets on  RED.
- In statistical terms, the statement "the roulette game is rigged" is called a **hypothesis.**
- In this case, we call it the **alternative hypothesis**.
- Alternative to what? It is an alternative to the **null hypothesis**
   which is "the roulette game is fair."

## Null and alternative hypotheses

- The naming of the hypotheses corresponds to an "innocent until proven guilty" approach.
- Since our observations (in standardized units) seem attributable to chance variation, we decided we cannot declare the null hypothesis to be false. Or, we cannot reject the null hypothesis.
- In legalese, "there is reasonable doubt to the guilt of the roulette game so we do not convict".

### A different scenario

- What if the second 10 results were [0,0,0,0,0,0,0,0,0,0]? Then, we would have observed only 4 successes in 20 draws from our 0-1 box.
- In standardized units, this 4 converts to 
$$ \frac{4 - 9.5}{2.2} \approx -2.5$$
- This would be a little suspect...
- In fact, if the roulette wheel were fair, there is only approximately a 1.4% chance (after continuity correction) that I would have had so few successes in 20 bets.

### What are the chances?

In [None]:
%%capture
normal_fig = plt.figure(figsize=figsize)
ax = normal_curve()
interval = np.linspace(-4,-2.2, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)
ax.set_title('The green area is %0.1f%%' % (100 * ndist.cdf(-2.2)), fontsize=20, color='green')


In [None]:
normal_fig

## $Z$-scores

- In this hypothesis test, the quantity
$$
Z = \frac{\text{observed} - \text{expected}}{\text{SE(observed)}}
$$
is called a **$z$-score**.

- The quantities **expected, SE(observed)** are computed **assuming the null hypothesis is true.**

- It measures how many standardiazed units the **observed** value is from what
is expected (if the null hypothesis were true).

## $P$-values

- The chances we computed are the chances, if the roulette game was fair, that we would observe a standardized less than our observed standardized value of  -2.2.
- In general, if we test a null hypothesis with some  observed data
  or  observed test statistic, the  $P$-value
   is the chance, assuming the null hypothesis is true, that we would observe such an extreme test statistic.
* When computing chances using a $z$ score, the test is called **$z$ tests.**
* **Note:**  $\bbox[5px,border:2px solid orange]{P-value}$ is random!


## More about $P$-values

* ** The P-value is NOT the chances that the null hypothesis is correct!**
  
* Why not?
     - The online roulette game is fair or it is not. The null hypothesis is true or it is false.
     - If the  $P$-value
         were the chances the null hypothesis is correct, these chances would be  random...
         
* In Bayesian statistics, one *can*
   compute the chances the null hypothesis is correct, but we have not addressed
   this methodology yet.
   
* The book uses two thresholds:
    - If the $P$-value is less than 5%, the result is *statistically significant*.
    - If the $P$-value is less than 1%, the result is called *highly significant*.

## A Bayesian calculation

- Suppose we declare, before seeing data: "the probability the success rate on  RED
   is 18/38 is 70%, and the probability the success rate on  RED
   is 12/38 is 30%".
- Call these two hypotheses $H_0, H_1$ and we have just said $P(\text{$H_0$ is true})=0.7, P(\text{$H_1$ is true}) = 0.3.$

- Suppose now, we observe 4 success from 20 bets but we
   do not know whether they were from the fair game, $H_0$ or
   the unfair game $H_1$.

- Bayes' rule says
   $$
   \begin{aligned}
   P(\text{$H_0$ is true} | \text{4 out of 20 successes})
   &= 
   \frac{0.7 \times \binom{20}{4} (\frac{18}{38})^4 (\frac{20}{38})^{16}}{0.7 \times \binom{20}{4} (\frac{18}{38})^4 (\frac{20}{38})^{16} + 0.3 \times \binom{20}{4} (\frac{12}{38})^4 (\frac{26}{38})^{16}} \\
   & \qquad = 15\%.
   \end{aligned}
   $$
  
- Are the chances above random or not random?


## A second testing scenario

* Suppose now we want to see the efficacy of a new drug on blood pressure.
* Our study design is: we will treat a large patient population with the drug and measure their blood pressure before and after taking the drug.
* One way to conclude that the drug is effective if the blood pressure has decreased. That is, if the average difference is negative.

## Setting up the test

* We could set this up as drawing from a box of *differences in blood pressure*.
* The *null hypothesis*, $H_0$ is: "the average difference is zero."
* The *alternative hypothesis*, $H_a$, is: "the average difference in the box is less than zero."
* Sometimes, people will test the alternative, $H_a$: "the average difference in the box is not zero."
* We test the null with observed data by estimating the average difference and converting to standardized units.

### Sample of blood pressures

In [None]:
class BloodPressure(ProbabilitySpace):

    alpha = 0.1
    ptsize = 5
    sample_ptsize = 60

    def __init__(self, draw_fig=True):
        self.npop, self.ndraw = 5000, 50
        self.box = BoxModel(np.random.random_integers(-15, 0,
                                                       size=(self.npop,)),
                                                       replace=True)
        self.X = (np.mgrid[0:1:10j,0:1:5j].reshape((2,50)) +
                  np.random.sample((2,50)) * 0.05)
        self.BG = np.random.sample((self.npop,2))
        self.X = self.X.T
        self.draw_fig = draw_fig
        if draw_fig:
            self.draw()

    def draw(self, color={'R':'red','B':'blue'}):
        self.figure.clf()
        ax, X, BG = self.axes, self.X, self.BG
        ax.scatter(BG[:,0], BG[:,1], s=self.ptsize, color='gray', alpha=self.alpha)
        ax.set_xticks([]);    ax.set_xlim([-0.1,1.1])
        ax.set_yticks([]);    ax.set_ylim([-0.1,1.1])

    @property
    def figure(self):
        if not hasattr(self, "_figure"):
            self._figure = plt.figure(figsize=figsize)
        self._axes = self._figure.gca()
        return self._figure

    @property
    def axes(self):
        self.figure
        return self._axes
    
    def draw_sample_pts(self, bgcolor={'R':'red','B':'blue'},
                        color={'R':'red','B':'blue'}):
        self.draw(color=bgcolor)
        ax, X, sample = self.axes, self.X, self._sample
        mean, sd = self.outcome
        for i in range(50):
            ax.text(X[i,0], X[i,1], '%d' % sample[i], color='red')
        ax.set_title("average(sample)=%0.1f, SD(sample)=%0.1f" % (np.mean(sample), np.std(sample)), fontsize=15)
        return self.figure

    def trial(self, bgcolor={'R':'red','B':'blue'},
              color={'R':'red','B':'blue'}):
        self._sample = self.box.sample(self.ndraw)
        self.outcome = np.mean(self._sample), np.std(self._sample)
        if self.draw_fig:
            self.draw_sample_pts(color=color, bgcolor=bgcolor)
        return self.outcome

    
BP = BloodPressure()


In [None]:
np.random.seed(0)
print BP.trial()
BP.figure

## Evaluating the test

- Our observed average is $-7.0$. We estimate its SE to be $4.5 / \sqrt{50} = 0.64$.
- In standardized units, our observed average converts to $$\frac{-7.0 - 0}{0.64}  \approx - 11$$
- The  $P$-value
   is 0: there is virtually no chance a standard normal would ever be so small. 

- We reject the null hypothesis $H_0$ and conclude $H_a$: "the average difference of the box is negative."

In [None]:
5.1 / sqrt(50)

## Example (continued)

* Suppose that our null hypothesis was different. We might have begun with the null hypothesis $H_0$: "the average decrease in blood pressure will be 7 mm Hg" with alternative $H_a$: "the average decrease in blood pressure is not 7mm Hg".
* How do we test this hypothesis?
* Well, under this null hypothesis our observed average converts to $$\frac{-7.0 -(- 7)}{0.64}  \approx 0$$

* This test is **two-sided**: we did not specify the alternative as greater than or larger than...

### What are the chances?

In [None]:
%%capture
normal_fig2 = plt.figure(figsize=figsize)
ax = normal_curve()
interval = np.linspace(-4,-0.6, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)
interval = np.linspace(0.6,4, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)

ax.set_title('The green area is %0.1f%%' % (2 * 100 * ndist.sf(0.6)), fontsize=20, color='green')



In [None]:
normal_fig2


Area is 55%, we cannot reject $H_0: \text{average(difference)}=-7$.

## How small should the chances be?

* In our examples so far, it has been fairly clear which of the null or alternative is more believable.
* In practice, we must decide a threshold at which to reject $H_0$.
This is the *size of the test.*
* A common choice is to use a threshold of 5%. We call this threshold the *level*
   or *size*
   of the test.
* The book declares a *$P$-value*
   of $5\%$ or less to be "significant", $1\%$ or less to be "highly significant."

## Rejection rule

* Knowing the null and alternative hypotheses and the size of the test, we can define a * rejection rule*.
* For example, if the size is 5%, and $$\begin{aligned}
       H_0 &= \text{average difference is 0 mm Hg} \\
       H_a &= \text{average difference is negative} \\
     \end{aligned}$$
* Then, we reject $H_0$ if our $z$ statistic is less than  -1.65.

### One sided test (alternative negative)

In [None]:
%%capture
normal_fig3 = plt.figure(figsize=figsize)
ax = normal_curve()
interval = np.linspace(-4,-1.65, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)
ax.set_title('The green area is %0.0f%%' % (100 * ndist.cdf(-1.65)), fontsize=20, color='green')


In [None]:
normal_fig3


 5% rejection rule
 when alternative is negative …

## Blood pressure continued

* Suppose the alternative is now $H_a$: "the average difference is not 0mm Hg".
* Do we reject $H_0$:"the average difference is 0mm Hg"?
* In standardized units, our observed average converts to $$\frac{-7.0 - 0}{0.64}  \approx - 11$$
* Now, -11 is extremely unlikely under $H_0$ but it is likely for values some values of the average difference under $H_a$.
* We reject this $H_0$ when the $z$-score is large in absolute value.

### Two sided test

In [None]:
%%capture
normal_fig4 = plt.figure(figsize=figsize)
ax = normal_curve()
interval = np.linspace(-4,-2, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)
interval = np.linspace(2,4, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)

ax.set_title('The green area is %0.0f%%' % (2 * 100 * ndist.cdf(-2)), fontsize=20, color='green')


In [None]:
normal_fig4


 5% rejection rule
 when alternative does not specify the sign (i.e. could be positive or negative...)

## Blood pressure again 

* Suppose the alternative is now $H_a$: "the average difference is positive".
* Do we reject $H_0$:"the average difference is 0mm Hg"?
* In standardized units, our observed average converts to $$\frac{-7.0 - 0}{0.64}  \approx - 11$$
* While -11 is extremely unlikely under $H_0$ it is even more unlikely under $H_a$.
* It seems reasonable to conclude that neither $H_0$ nor $H_a$ is true.
* But, to conclude $H_a$ is true, we should only reject this $H_0$ when the $z$-score is positive …

### One sided test (alternative positive)

In [None]:
%%capture
normal_fig5 = plt.figure(figsize=figsize)
ax = normal_curve()
interval = np.linspace(1.65,4, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='green', alpha=0.5)
ax.set_title('The green area is %0.0f%%' % (100 * ndist.sf(1.65)), fontsize=20, color='green')


In [None]:
normal_fig5


 5% rejection rule
 when alternative is positive …

## One-sided vs. two-sided

* If we want to conclude a one-sided alternative like $H_a$:"the average difference in blood pressure is less than -7 mm Hg".
* Then, we can take the null hypothesis to be $H_0$:"the average difference in blood pressure is greater than or equal to -7 mm Hg". We reject for $z$-scores that are negative and large in absolute value.
* On the other hand, if we want to conclude a two-sided alternative like $H_a$:"the average difference in blood pressure is not -7 mm Hg".
* Then, we can take the null hypothesis to be $H_0$:"the average difference in blood pressure is equal to -7 mm Hg". We reject for large $z$-scores in absolute value.


## Normal approximation and hypothesis tests

* If a normal approximation holds for $\bbox[5px,border:2px solid orange]{\widehat{\theta}}$
(i.e. $E(\widehat{\theta}) \approx \bbox[5px,border:2px solid blue]{\theta}$ and $\widehat{\theta}-\theta$ follows a normal curve with an SE we can approximate). 

* Then, we can test the null hypothesis $H_0:  \theta=\theta_0$ against $H_a:  \theta \neq \theta_0$ (or any variation of one-sided vs. two-sided).
* For instance, our first null hypothesis was $\theta_0=0$. In the second, $\theta_0=-7$.
* The test statistic, called a  $z$ score
   for testing $H_0: \theta=\theta_0$ is 
   $$z = \frac{\bbox[5px,border:2px solid orange]{\widehat{\theta}} - \bbox[5px,border:2px solid blue]{\theta_0}}{\text{SE}(\bbox[5px,border:2px solid orange]{\widehat{\theta}})}
   $$
   
* We call $z$ a $Z$-statistic or a $Z$-score.
* If $H_0$ is true, then $ z$ follows the standard normal curve.
* If $H_0$ is not true, then $z$ does not usually follow the standard normal curve. If it does, you have a very poor test.

## Normal approximation and hypothesis tests

* If $H_0$ is not true, then $Z$ does not usually follow the standard normal curve. If it does, you have a very poor test...
* It may follow a normal curve with mean $\neq 0$.
* The logic of the hypothesis test is as follows: if $H_0$ is true, then our observed test statistic should be a "typical value" under $H_0$.
* The  $P$-value
   depends on what $H_a$ is.
* It is often easier to use the rejection rule instead of the $P$-value.
* For null hypotheses like $H_0:\theta \leq \theta_0$ and $H_0:{ \theta \geq \theta_0}$ we use the rejection rules with the *same $z$-score* but whether we reject or not depends on whether the $z$-score is positive or negative.


### One sided test (alternative negative)

In [None]:
normal_fig3


 5% rejection rule
 for $H_0:\theta \geq \theta_0, H_a: \theta < \theta_0.$

### One sided test (alternative positive)

In [None]:
normal_fig5


 5% rejection rule
 for $H_0:\theta \leq \theta_0, H_a: \theta > \theta_0.$

### Two sided test

In [None]:
normal_fig4


 5% rejection rule
 for $ H_0: \theta = \theta_0, H_a: \theta \neq \theta_0.$

## Interpretation of 5% rejection rules

- Call the rejection rules 
$$
\begin{aligned}
R^+ &= [\theta_0 + 1.65 \cdot SE(\hat{\theta}), \infty) \\
R^- &= (-\infty, \theta_0 - 1.65 \cdot SE(\hat{\theta})] \\
R^{\pm} &= (-\infty, \theta_0 - 2 \cdot SE(\hat{\theta})] \cup [\theta_0 + 2 \cdot SE(\hat{\theta}), \infty)
\end{aligned}
$$
- So, $R^+$ corresponds to the pair $H_0: \theta \leq \theta_0, 
H_a: \theta > \theta_0$.
- The rejection rules are set up so that, for instance,
$$
\begin{aligned}
P(\hat{\theta} \in R^+) \leq 5\%, &\qquad \text{if $H_0: \theta \leq \theta_0$ is true.} \\
P(\hat{\theta} \in R^-) \leq 5\%, &\qquad \text{if $H_0: \theta \geq \theta_0$ is true.} \\
P(\hat{\theta} \in R^{\pm}) = 5\%, &\qquad \text{if $H_0: \theta = \theta_0$ is true.}
\end{aligned}
$$

- In other words, the rejection regions are set up so that there is less than a 5% chance of declaring a false positive.

- Here's an illustration of the $R^+$ rejection region. 

In [None]:
def Rplus(thetahat, theta0, SE_thetahat):
    return np.greater(thetahat, theta0 + 1.65 * SE_thetahat)
theta = Normal(2, 3)
thetahat = theta.trial()
thetahat, Rplus(thetahat, 2, 3), 2 + 1.65*3

Now, let's generate some data where $H_0$ is true.

In [None]:
Tsample = theta.sample(10000)
mean(Rplus(Tsample, 2, 3))

The null hypothesis is also true if the mean is below 2. Our `Zsample`
above has its estimate of $\hat{\theta}$ equal to 2.

If we look at the rejection rate for $\theta_0=4$, it should still be less than 5%.

In [None]:
mean(Rplus(Tsample, 4, 3))

## One-sided vs. two-sided

- The book often reports the one-sided $P$ values. 
- There is no problem with this, in principle, UNLESS you choose
$H_0$ (or, respectively $H_a$) AFTER seeing the $Z$ score.
- Most examples in the book (and solutions) seem to do this.
- The problem with this is that the 5% above roughly doubles to 10% 
so you are declaring more false positives than you think...
- Let's also define the test for $R^-$ and see what happens
when we decide which test to use after snooping at the sign of the data.

In [None]:
def Rminus(thetahat, theta0, SE_thetahat):
    return np.less(thetahat, theta0 - 1.65 * SE_thetahat)
mean(Rminus(Tsample, 2, 3))

In [None]:
def snooping_test(thetahat, theta0, SD_thetahat):
    test_pos = np.greater(thetahat, theta0) * Rplus(thetahat, theta0, SD_thetahat)
    test_neg = np.less(thetahat, theta0) * Rminus(thetahat, theta0, SD_thetahat)
    return test_pos + test_neg

mean(snooping_test(Tsample, 2, 3))

## Relation between hypothesis tests and confidence intervals

* Which values are reasonable?
* Well, -7.0 is certainly a reasonable value if the true average difference were -7 because our $z$ score would be 0.
* Hence, we would not reject $H_0$:"the average difference is -7" if we observed a sample average of -7
* The set of all values $\theta$ we would not reject $H_0$: "the average difference is $\theta$" at level 5% is basically the standard 95% confidence interval!
* Therefore, one can test $H_0:$"the average difference is 0" by checking to see whether 0 is in the confidence interval.

## Testing fairness via a confidence interval

- Let's go back to our roulette example. Suppose we make an additional 10 bets
and won 3 more times, making a total of 13 successes in 30 bets.
- An approximate 95% confidence for the true  RED
   success rate (fair or not) based on our 20 bets is $$ \frac{13}{30} \pm 2 * \sqrt{\frac{13}{30} \times \frac{17}{30} \frac{1}{30}} =  0.43 \pm 0.18$$
      
- (This assumes the online roulette game is doing independent trials, thought not necessarily fair trials)
- The success rate for  RED
   in the fair model is ${ 18/38 \approx 0.47}$.
- We see that 0.47 is within our 95% confidence interval. Therefore, we would not reject $H_0$:"the roulette table is fair" at level 5%.
* **Note:** we should ensure that we have enough trials so the normal approximation holds. 

## Tests and confidence intervals for small samples

* Our tests (and confidence intervals) have so far relied on normal approximations (i.e. we have used A-104 to compute all chances).
* If the sample size is small, the normal approximations may not be very good.
* If the sample size is small, we can sometimes get good confidence intervals using something called a $T$ statistic.
* The formula for the $T$ statistic is almost identical to the $z$ statistic, it is the *chances*
   that can be quite different.

## Tests and confidence intervals for small samples

* Suppose the Gauss model holds 

           measurement = true value + chance error

* **And, the histogram of the error box is not too different from a normal probability histogram or curve!**
  
* Then, there are very good confidence intervals even for very small samples.
* If the histogram of the error box is exactly a normal probability histogram, then these tests and confidence intervals are *exact*.

## The $T$ statistic

* Suppose we observed only 5 blood pressure changes: [-4,-6,-8,-2,-1].
* The average is -4.2 mm Hg, and the SD of the list is 2.6 mm Hg.
* Our usual $z$ score to test $H_0$: average difference $\geq 0$ against $H_a$: average difference $<0$ $${ z = \frac{-4.2}{2.6 / \sqrt{5}} \approx -3.7}$$
* The $T$ statistic replaces the SD of the list with SD$^+$ of the list which is 2.9 mm Hg. 
* The $T$ statistic is $${ \bbox[5px,border:2px solid orange]{ T} = \frac{-4.2}{2.9 / \sqrt{5}} \approx -3.3}$$

## What’s different about the $T$ statistic?

* For one thing, it uses $\text{SD}^+$ instead of $\text{SD}$.
* Why does it use $\text{SD}^+$?
* For small samples, $\text{SD}^+$ it is a better estimate of SD(box) than SD.
* Unfortunately, though, the $T$ statistic does not follow the normal curve. This is the biggest difference.

## Computing the chances for the $T$ test

* It *almost*
   follows the normal curve. For large samples, it gets closer and closer.
* For each sample size, there is a different curve, or probability histogram.
* These curves are indexed by what we call *degrees of freedom*.
* In this example, the degrees of freedom are $n-1$.

### Student’s $T$

In [None]:
%%capture
df=4
normal_fig6 = plt.figure(figsize=figsize)
ax = normal_fig6.gca()
normal_curve(ax=ax, label='Normal', color='blue', alpha=0.)
studentT_curve(ax=ax, label='$T_{%d}$' % df, color='green', alpha=0., df=df)
ax.set_title('Comparison of normal curve to $T_{%d}$' % df, fontsize=15)
ax.legend()

In [None]:
normal_fig6

### Student’s $T$

In [None]:
%%capture
df = 4 
normal_fig7 = plt.figure(figsize=figsize)
ax = normal_curve(alpha=0., color='blue')
interval = np.linspace(-4,ndist.ppf(0.025), 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='blue', alpha=0.5)
interval = np.linspace(ndist.ppf(0.975),4, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='blue', alpha=0.5)

studentT_curve(ax=ax, alpha=0., color='green', df=df)
interval = np.linspace(-4,tdist.ppf(0.025, df), 101)
ax.fill_between(interval, 0*interval, tdist.pdf(interval, df),
                hatch='+', color='green', alpha=0.2)
interval = np.linspace(tdist.ppf(0.975, df),4, 101)
ax.fill_between(interval, 0*interval, tdist.pdf(interval, df),
                hatch='+', color='green', alpha=0.2)



In [None]:
normal_fig7


Comparison of two-sided  5% rejection rule
, df=4

### Student’s $T$

In [None]:
%%capture
df=20
normal_fig8 = plt.figure(figsize=figsize)
ax = normal_fig8.gca()
normal_curve(ax=ax, label='Normal', color='blue', alpha=0.)
studentT_curve(ax=ax, label='$T_{%d}$' % df, color='green', alpha=0., df=df)
ax.set_title('Comparison of normal curve to $T_{%d}$' % df, fontsize=15)
ax.legend()



In [None]:
normal_fig8


Comparison with normal curve, degrees of freedom = 20

### Student’s $T$

In [None]:
%%capture
df = 20
normal_fig9 = plt.figure(figsize=figsize)
ax = normal_curve(alpha=0., color='blue')
interval = np.linspace(-4,ndist.ppf(0.025), 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='blue', alpha=0.5)
interval = np.linspace(ndist.ppf(0.975),4, 101)
ax.fill_between(interval, 0*interval, ndist.pdf(interval),
                hatch='+', color='blue', alpha=0.5)

studentT_curve(ax=ax, alpha=0., color='green', df=df)
interval = np.linspace(-4,tdist.ppf(0.025, df), 101)
ax.fill_between(interval, 0*interval, tdist.pdf(interval, df),
                hatch='+', color='green', alpha=0.2)
interval = np.linspace(tdist.ppf(0.975, df),4, 101)
ax.fill_between(interval, 0*interval, tdist.pdf(interval, df),
                hatch='+', color='green', alpha=0.2)



In [None]:
normal_fig9


Comparison of two-sided  5% rejection rule
, df=20