# Fundamental of Hypothesis Testing:
- Classical approach:
    1. Based on Significance level, find the left and/or right z values: qnorm() function
    2. Find Sample z-value: (Sample.mean - Pop.mean)/Sample.sd
    3. Compare to see if Sample z-value is less than the left z-value or higher than the right z-value
- p-value approach:
    1. Find Sample z-value: (Sample.mean - Pop.mean)/Sample.sd
    2. Find p-value based on the z-value: pnorm() function
    3. Compare if p-value is less than the Significance value

## Example 1: What is the p-value if, in a two-tailed hypothesis test z = -1.38
- In two-tail test, find the p-value of the right tail and p-value of the left tail.  Multiply the actual p-value (left or right) by 2

In [1]:
# Given:
z <- -1.38

# Find left and right p-values
p.left <- pnorm(z) #0.084
p.right <- pnorm(z, lower.tail=FALSE) #0.92

#Miltiply the actual p-value (left in this case) by 2
p <- c(p.left, p.right)
p.value <- min(p)*2
p.value

## Example 2: What decision will be made if z = 2.21? 
- Two-tailed hypothesis test 
- Significance level of 0.05

#### Classical approach: Sample z has to be higher than the right tail z-value or lower than the left tail z-value

In [2]:
# Sample Stats:
sample.z <- 2.21
 
# Since it's a two-tailed t-test, divide significance by half:
sig <- 0.05
sig.left <- sig/2 #0.025
sig.right <- sig/2 #0.025
 
# Find Sample z-value
sig.z.left <- qnorm(sig.left) #-1.96
sig.z.right <- qnorm(sig.right, lower.tail=FALSE) #1.96
 
#Check to see if Sample z-value falls within both corners
sample.z > sig.z.right #TRUE
sample.z < sig.z.left #FALSE
 
sample.z > sig.z.right || sample.z < sig.z.left #TRUE

- Since Sample z-value is higher than Significance z-value of the right tail, Reject Null Hypothesis

#### p-value approach:

In [3]:
# Sample Stats:
sample.z <- 2.21
 
# Find p-value of Sample: Miltiply the actual p-value (left in this case) by 2
sample.p.left <- pnorm(sample.z, lower.tail=TRUE) #0.986
sample.p.right <- pnorm(sample.z, lower.tail=FALSE) #0.0136
p <- c(sample.p.left, sample.p.right)
p.value <- min(p)*2
 
# Check if p-value is less than significance:
sig <- 0.05
p.value < sig #TRUE, Reject Null Hypothesis

- TRUE: Reject Null Hypothesis

## Example 3: (Sigma Known) – Amount of water in a 1-gallon bottle: Two-tail test
- Population Mean = 1
- Population SD = 0.02 gallons
- Sample size = 50
- Sample Mean = 0.995 gallon

### a) With significance of 0.01, Is there evidence that the mean amount is different from 1.0 gallon?
#### Classical approach using z-values:

In [4]:
# Population Stats:
pop.mean <- 1
sigma <- 0.02

# Sample Stats:
n <- 50
sample.mean <- 0.995
sample.sd <- sigma/(sqrt(n)) #0.0028

# Find Sample z-value
sample.z <- (sample.mean - pop.mean)/sample.sd #-1.77

# Find Significance z-value
sig <- 0.01
sig.left <- sig/2 #0.005
sig.right <- sig/2 #0.005
 
sig.z.left <- qnorm(sig.left) #-2.58
sig.z.right <- qnorm(sig.right, lower.tail=FALSE) #2.58
 
# Check to see if Sample z-value falls within both corners 
sample.z > sig.z.right || sample.z < sig.z.left 

- FALSE: CANNOT Reject Null Hypothesis. Mean amount is NOT different from 1.0 gallon

#### p-value approach: Find p of the left and the right, compare to the significance

In [5]:
#Population Stats:
pop.mean <- 1
pop.sd <- 0.02

#Sample Stats:
n <- 50
sample.mean <- 0.995
sample.sd <- pop.sd/(sqrt(n)) #0.0028

# Find Sample z-value
sample.z <- (sample.mean - pop.mean)/sample.sd #-1.77
 
# Find p-value of Sample: Both corners, Miltiply the actual p-value (left in this case) by 2
sample.p.left <- pnorm(sample.z) #0.0385 
sample.p.right <- pnorm(sample.z, lower.tail=FALSE) #0.961
p <- c(sample.p.left, sample.p.right)
p.value <- min(p)*2 #0.077
 
# Check if p-value is less than significance:
sig <- 0.01
p.value < sig #FALSE, CANNOT Reject Null Hypothesis

- FALSE: CANNOT Reject Null Hypothesis. High enough probability that this can happen by chance

### b) Construct a 99% confidence interval estimate of the population mean amount of water per bottle

In [6]:
#Population Stats:
pop.mean <- 1
sigma <- 0.02

#Sample Stats:
n <- 50
sample.mean <- 0.995
sample.sd <- sigma/(sqrt(n)) #0.0028

# Find Sample z-value
sample.z <- (sample.mean - pop.mean)/sample.sd #-1.77
 
#Compute Margin of Error:
confidence <- 0.99
z <- abs(qnorm((1-confidence)/2)) #2.58, same as sig.z.right and sig.z.left
std.error <- sigma/sqrt(n)   
margin.error <- z * std.error #0.0073
 
lower.interval <- sample.mean - margin.error
upper.interval <- sample.mean + margin.error
paste(lower.interval, " to ", upper.interval) #0.99  to  1.00

- 99% Confidence that there are 0.99 to 1.00 Gallons of Water 

## Example 4: (Sigma NOT known) – Savings from buying online: Two-tail test
- Sample Size = 100
- Sample Mean savings = 58 dollars
- Sample SD = 55 dollars

### With Significance of 0.05,  Is there evidence that Population Mean savings is different from 50 dollars?

#### Classical approach using z-values:

In [7]:
# Population Stats:
pop.mean <- 50

# Sample Stats:
n <- 100
degree.freedom <- n - 1
sample.mean <- 58
sample.sd <- 55
 
# Using t-statistics: Find Sample t-value
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #1.45
 
# Using t-statistics: Find Significance t-value
sig <- 0.05
sig.left <- sig/2 #0.025
sig.right <- sig/2 #0.025
 
sig.t.left <- qt(sig.left, degree.freedom, lower.tail=TRUE) #-1.984
sig.t.right <- qt(sig.right, degree.freedom, lower.tail=FALSE)#1.984
 
# Check to see if sample.t falls within both corners
sample.t > sig.t.right #FALSE
sample.t < sig.t.left #FALSE
 
sample.t > sig.t.right || sample.t < sig.t.left #FALSE, CANNOT Reject Null Hypothesis

- There is NO evidence that Population Mean savings is different from 50 dollars

#### p-value approach: Find p of the left and the right, compare to the significance

In [8]:
# Population Stats:
pop.mean <- 50

# Sample Stats:
n <- 100
degree.freedom <- n - 1
sample.mean <- 58
sample.sd <- 55
 
# Using t-statistics: Find Sample t-value
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #1.45

# Find p-value of Sample: Two-tail test, Miltiply the actual p-value (right in this case) by 2
sample.p.left <- pt(sample.t, degree.freedom, lower.tail=TRUE) #0.925
sample.p.right <- pt(sample.t, degree.freedom, lower.tail=FALSE) #0.074
p <- c(sample.p.left, sample.p.right)
p.value <- min(p)*2 #0.149

# Check if p-value is less than significance:
sig <- 0.05
p.value < sig #FALSE, CANNOT Reject Null Hypothesis

- FALSE, High enough probability (0.149) that this can happen by chance. CANNOT Reject Null Hypothesis

## Example 5: (Sigma NOT Known) Improving Wait time at appointments: One-tail test
- Sample Size = 355
- Sample Mean = 23.05 minutes
- Sample SD = 16.83 minutes

### With Significance of 0.01, is there evidence that Population Mean wait time is less than 25 minutes?

#### Classical approach using z-values:

In [9]:
#Population Stats:
pop.mean <- 25

#Sample Stats:
n <- 355
degree.freedom <- n - 1
sample.mean <- 23.05
sample.sd <- 16.83

#Using t-statistics: Find Sample t-value
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #-2.18
 
# Using t-statistics: Find Significance t-value
sig <- 0.01
sig.t <- qt(sig, degree.freedom, lower.tail=TRUE) #-2.34
 
#Check to see if Sample t-value falls lower than the left corners
sample.t < sig.t #FALSE

- There is NO evidence that Population Mean wait time is less than 25 minutes

#### p-value approach: Find p of the left, compare to the significance

In [10]:
#Population Stats:
pop.mean <- 25

#Sample Stats:
n <- 355
degree.freedom <- n - 1
sample.mean <- 23.05
sample.sd <- 16.83

#Using t-statistics: Find Sample t-value
sample.t <- (sample.mean - pop.mean)/(sample.sd/sqrt(n)) #-2.18

# Find p-value of Sample: Lower Tail since looking for evidence of less
sample.p <- pt(sample.t, degree.freedom, lower.tail=TRUE) #0.0148
 
# Check if p-value is less than significance:
sig <- 0.01
sample.p < sig #FALSE, CANNOT Reject Null Hypothesis

- FASLE, High enough probability (0.148) that this can happen by chance. CANNOT Reject Null Hypothesis

## Example 6: (Z Test for Proportion) In random sample of 400, 88 are defective:

#### a) What is the Sample Proportion?

In [11]:
sample.prop <- 88/400
sample.prop

#### b) If the Null Hypothesis is that 20% of samples are defective, what is the z-value?

In [12]:
# Population Stats:
p <- 0.20

# Sample Stats:
sample.sd <- sqrt(p * (1-p)/n) #0.021

# Find Sample z-value
sample.z <- (sample.prop - p)/sample.sd #0.94
sample.z

#### c) Two-tail hypothesis test that Proportion does NOT equal to 0.20, with Significance of 0.05

In [13]:
sig <- 0.05
sig.left <- sig/2 #0.025
sig.right <- sig/2 #0.025
 
sig.z.left <- qnorm(sig.left) #-1.96
sig.z.right <- qnorm(sig.right, lower.tail=FALSE) #1.96
 
sample.z > sig.z.right || sample.z < sig.z.left #FALSE, CANNOT Reject Null Hypothesis

- There is NO evidence that Population Proportion does not equal to 0.20

## Example 7: (Z Test for Proportion) In a survey, 328 out of 801 plan on spending 1,000 dollars
### At 0.05 level of Significance, is there evidence that proportion of people who plan on spending at leat 1,000 is different than 35%?

#### Solution: Classical approach:

In [14]:
#Population Stats:
pop.prop <- 0.35

#Sample Stats:
n <- 801
sample.prop <- 328/801 #0.409
sample.sd <- sqrt(pop.prop * (1-pop.prop)/n) #0.0168

# Find Sample z-value
sample.z <- (sample.prop - pop.prop)/sample.sd #3.529

# Find Significance z-value: Since two-tailed, divide by 2
sig <- 0.05
sig.left <- sig/2 #0.025
sig.right <- sig/2 #0.025
 
sig.z.left <- qnorm(sig.left) #-1.96
sig.z.right <- qnorm(sig.right, lower.tail=FALSE) #1.96
 
# Check to see if Sample z-value falls within both corners
sample.z > sig.z.right || sample.z < sig.z.left #TRUE, Reject Null Hypothesis

- TRUE, There is evidence that proportion of people who plan on spending at leat 1,000 is different than 35%

#### Solution: p-value approach:

In [15]:
#Population Stats:
pop.prop <- 0.35

#Sample Stats:
n <- 801
sample.prop <- 328/801 #0.409
sample.sd <- sqrt(pop.prop * (1-pop.prop)/n) #0.0168

# Find Sample z-value
sample.z <- (sample.prop - pop.prop)/sample.sd #3.529

# Find p-value of Sample: Both corners, Miltiply the actual p-value (left in this case) by 2
sample.p.left <- pnorm(sample.z, lower.tail=TRUE) #0.999
sample.p.right <- pnorm(sample.z, lower.tail=FALSE) #0.00021
p <- c(sample.p.left, sample.p.right)
p.value <- min(p)*2 #0.00042

# Check if p-value is less than significance:
sig <- 0.05
p.value < sig #TRUE, Reject Null Hypothesis