# Hypothesis Testing: Mean
- When sigma is know

## Hypothesis Testing
- Left-tailed test: Area to the left
    - H1 is less than H0
- Right-tailed test: Area to the right
    - H1 is greater than H0
- Two-tailed test: Area to the left and to the right
    - H1 does not equal to H0

### Level of Significance
|       Test          | Significance = 0.05 |  z-value    |
| :-------------------|:-------------------:|:-----------:|
| Left-tailed test    |z(0.05)              | -1.64       |
| Right-tailed test   |z(0.05)              | 1.64        |
| Two-tailed test     |z(0.025) , z(0.975)  | -1.96, 1.96 |

## Method of Testing
1. Classical Approach:
    - Compute the z value of the sample
    - Compute the z-value of the significance
    - Compare the z-values
2. p-Value Approach
    - Compute the p-value of the sample
    - Significance is already given in p-values
    - Compare the p-values

## Example 1: (Sigma is Known) SAT Exam: Right-tailed test
- Statement:
    - Student with 4 years of High School English, do better in SAT: Right-tailed test
- Population statistics:
     - Mean = 515
     - Sigma = 114
- Sample statistics:
    - Sample size, n = 40
    - Sample Mean = 540
- Level of Significance = 0.05
- Hypothesis
    - H0: u0 = 515
    - H1: u1 > 515 (540)

#### Solution: Classical approach:

In [1]:
pop.mean <- 515
sigma <- 114
 
n <- 40
sample.mean <- 540
sample.sd <- sigma/sqrt(n) #18.02

# Find Sample z-value
sample.z <- (sample.mean - pop.mean)/sample.sd #1.39

# Find Significance z-value
sig <- 0.05
sig.z <- qnorm(sig, lower.tail=FALSE) #1.64

# Compare z-values
sample.z > sig.z #FALSE: CANNOT reject null hypothesis

#### Solution: p-value approach:

In [2]:
# Population Stats:
pop.mean <- 515
sigma <- 114

# Sample Stats:
n <- 40
sample.mean <- 540
sample.sd <- sigma/sqrt(n) #18.02
 
# Find p-value of Sample
sample.z <- (sample.mean - pop.mean)/sample.sd #1.39
sample.p <- pnorm(sample.z, lower.tail=FALSE) #0.082

#Compare p-values: is it significant enough:
sig <- 0.05
sample.p < sig

- p-value not significant enough: 0.0823 > 0.05
- Probability is high enough that it can happen

## Example 2: (Sigma is Known) Phone Bill: Two-tailed test
- Statement:
    - Today’s average phone bill is not the same as in 2004: Two-tailed test
- Population statistics:
    -  Mean = 50.64
    - Sigma = 18.49
- Sample statistics:
    - Sample size, n = 12
    - Sample Mean = 65.01
- Level of Significance = 0.05
- Hypothesis
    - H0: u0 = 50.64
    - H1: u1 does not equal to 50.64

#### Solution: Classical approach:

In [3]:
# Population Stats:
pop.mean <- 50.64
sigma <- 18.49

# Sample Stats:
n <- 12
sample.mean <- 65.01
sample.sd <- sigma/sqrt(n) #5.34

# Find Sample z-value
sample.z <- (sample.mean - pop.mean)/sample.sd #2.69

# Find Significance z-value: Since two-tailed, divide by 2
sig <- 0.05
sig.z.right <- qnorm(sig/2, lower.tail=FALSE) #1.96
sig.z.left <- qnorm(sig/2, lower.tail=TRUE) #-1.96
 
# Check to see if Sample z-value falls within both corners
sample.z > sig.z.right || sample.z < sig.z.left # TRUE: Can reject null hypothesis

#### Solution: p-value approach:

In [4]:
# Population Stats:
pop.mean <- 50.64
sigma <- 18.49

# Sample Stats:
n <- 12
sample.mean <- 65.01
sample.sd <- sigma/sqrt(n) #5.34
 
# Find p-value of Sample: Both corners
sample.z <- (sample.mean - pop.mean)/sample.sd #2.69
sample.p.right <- pnorm(sample.z, lower.tail=FALSE) #0.0035
sample.p.left <- pnorm(sample.z, lower.tail=TRUE) #0.9964

# Miltiply the actual p-value (left in this case) by 2
p <- c(sample.p.left, sample.p.right)
p.value <- min(p)*2 #0.0071

# Check if p-value is less than significance:
p.value < sig #TRUE, Reject Null Hypothesis

- Probability (0.0071) is too low of that happening based on population data: Reject the Null Hypothesis

<hr>

# Hypothesis Testing: Mean
- When sigma is NOT know
- Replace z-statistics with t-statistics

## Example 3: (Sigma is NOT Known) Number of cigarettes smoked a day: Left-tailed test
- Statement:
    - Do retired people smoke less than 18.1 cigarettes
- Population statistics:
    - Mean = 18.1
    - Sigma = NOT known
- Sample statistics:
    - Sample size, n = 40
    - Sample Mean = 16.8
    - Sample SD = 4.7
- Level of Significance = 0.1
- Hypothesis
    - H0: u0 = 18.1
    - H1: u1 < 18.1 (16.8)

#### Solution: Classical approach (t-statistics):

In [5]:
# Population Stats:
pop.mean <- 18.1
 
# Sample Stats:
n <- 40
degree.freedom <- n - 1
sample.mean <- 16.8
sample.sd <- 4.7
 
# Find Sample t-value
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #-1.75
 
# Find Significance t-value
sig <- 0.1
sig.t <- qt(sig, degree.freedom, lower.tail=TRUE) #-1.304
 
# Compare t-values
sample.t < sig.t #TRUE: Can reject null hypothesis

#### Solution: p-value approach:

In [6]:
# Population Stats:
pop.mean <- 18.1
 
# Sample Stats:
n <- 40
degree.freedom <- n - 1
sample.mean <- 16.8
sample.sd <- 4.7
 
# Find p-value of Sample
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #-1.75
sample.p <- pt(sample.t, degree.freedom, lower.tail=TRUE) #0.044

# Check if p-value is less than significance:
sig <- 0.1
sample.p < sig

- p value: 0.044 < 0.05: Reject Null Hypothesis

<hr>

# Type 1 and Type 2 Errors

|     Conclusion      | H0 is TRUE          |  H1 is TRUE        |
|:-------------------:|:-------------------:|:------------------:|
| Do NOT Reject       |Correct Conclusion   | Type 2 Error       |
| Reject              |Type 1 Error         | Correct Conclusion |

## Level of Significance
- The level of significance 𝛼 is the probability of making Type 1 error
- Level of Significance:
    - 0.01 : CI 99%
    - 0.05 : CI 95%
    - 0.10 : CI 90%
- If 𝛼 is very small = 0.01
    - Reduces the probability of making Type 1 error
- Then 𝛽 will become large
    - Probability of making the Type II error will increase 