### Importing the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

### Q 1. You are a manager of a Chinese restaurant. You want to determine whether the mean waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. State the null and alternative hypotheses.

The null hypothesis is that there is no change in the mean waiting time:

$H_0$: µ = 4.5

The alternative hypothesis is that the mean waiting time has changed from its previous mean value of 4.5 minutes (in any direction):

$H_a$: µ $\neq$ 4.5

### Q 2. What is the p-value if, in a two-tailed hypothesis test, z-stat = +2.00?

In [13]:
# one way to do it
p_value = 2 * min(stats.norm.cdf(2), 1 - stats.norm.cdf(2))
p_value

0.04550026389635842

In [15]:
# another way: one-tailed p-value for z-stat * 2 (for the two-tailed test)
p_val = (1 - stats.norm.cdf(2))*2
p_val

0.04550026389635842

### Q 3. Samy, Product Manager of K2 Jeans, wants to launch a product line into a new market area. A Survey of a random sample of 400 households in that market showed a mean income per household of 30000 rupees. The standard deviation based on an earlier pilot study of households is 8000 rupees. Samy strongly believes the product line will be adequately profitable only in markets where the mean household income is greater than 29000 rupees. Samy wants our help in deciding whether the product line should be introduced in the new market. Perform statistical analysis(at a significance level of 0.05) and based on that draw a conclusion.

This is Hypothesis Test for Population Mean
One Sample Z-test (when population standard deviation is known)

Null hypothesis: the mean income of the household is less than or equal to 29,000

$H_0$: µ <= 29000

Alternative hypothesis: the mean income of the household is more than 29,000

$H_a$: µ > 29000

n = 400

x̅ = 30000

μ = 29000

σ= 8000


In [23]:
# (x_bar - mu) / (sigma/np.sqrt())

Z=(30000 - 29000) / (8000/np.sqrt(400))
print('Z-stat is = {}'.format(Z))

Z-stat is = 2.5


In [19]:
# Calculating the p-value for Zstat=2.5
p_val=1 - stats.norm.cdf(2.5)
p_val

0.006209665325776159

Since the p value is smaller than the significance level, we can reject the null hypothesis and conclude that the mean income of the household is indeed greater than 29,000.

### Q 4. One-sample t-test 

The mass of a sample of N = 20 acorns from a forest subjected to acid rain from a coal power plant are m = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, and 7.0 g. ]

Does this sample provide enough evidence (alpha = 0.05) to say that the average mass of all acorns is different from 10 g?

**a) Formulate the null and alternate hypothesis**

Null hypothesis: the average mass of all acorns is the same as 10g

$H_0$ : µ = 10

Alternative hypothesis: the average mass of all acorns is different from 10g

$H_a$ : µ $\neq$ 10

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [22]:
m = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, 7.0]
µ = 10

t, p = stats.ttest_1samp(m, popmean = µ)
print("tsat = ", t, ", p-value = ", p)

tsat =  -2.2491611580763973 , p-value =  0.03655562279112415


Since the p value is smaller than the significance level, we can reject the null and conclude that the average mass of all acorns is different from 10g

### Q 5. Independent (unpaired) two-sample t-test

The mass of N<sub>1</sub> = 20 acorns from oak trees up wind from a coal power plant and N<sub>2</sub> = 30 acorns from oak trees down wind from the same coal power plant are measured. Is the mass of acorns from trees down wind different from the ones from up wind at a significance level of 0.05? The sample sizes are not equal but we will assume that the population variance for sample 1 and sample 2 are equal.

#### sample up wind:
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]

#### sample down wind:
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]

**a) Formulate the null and alternate hypothesis.**

Null hypothesis: the mass of acorns from trees down wind is equal to the ones from up wind

$H_0$ : µ1 = µ2

Alternative hypothesis: the mass of acorns from trees down wind is different from the ones from up wind

$H_a$ : µ1 $\neq$ µ2

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [24]:
x1=[10.8,10.0,8.2,9.9,11.6,10.1,11.3,10.3,10.7,9.7,7.8,9.6,9.7,11.6,10.3,9.8,12.3,11.0,10.4,10.4]
x2=[7.8,7.5,9.5,11.7,8.1,8.8,8.8,7.7,9.7,7.0,9.0,9.7,11.3,8.7,8.8,10.9,10.3,9.6,8.4,6.6,7.2,7.6,11.5,6.6,8.6,10.5,8.4,8.5,10.2,9.2]


In [28]:
#import the required functions
from scipy.stats import ttest_ind

# find the p-value
test_stat, p_value = ttest_ind(x1,x2, equal_var = True, alternative = 'two-sided')
# note that above we're setting 'equal_var=True' because of assumption in the task that they're equal
print('tstat =', test_stat , ', and the p-value is' , p_value)

tstat = 3.5981947686898033 , and the p-value is 0.0007560337478801464


Since the p value < 0.05, we reject the H0, and there is a difference between the two means.

### Q 6. Paired samples t-test

The average mass of acorns from the same N = 30 trees downwind of a power plant is measured before (x<sub>1</sub>) and after (x<sub>2</sub>) the power plant converts from burning coal to burning natural gas. Does the mass of the acorns change after the conversion from coal to natural gas at a significance level of 0.05? 

### sample before conversion to natural gas
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9,
      11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3,
      8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])

### sample after conversion to natural gas
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8,
      12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0,
      9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])


**a) Formulate the null and alternate hypothesis.**

Null hypothesis: the mass of the acorns is unchanged after the conversion from coal to natural gas

$H_0$ : µ1 = µ2

Alternative hypothesis: the mass of the acorns changes after the conversion from coal to natural gas

$H_a$ : µ1 $\neq$ µ2

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [29]:
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9, 11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3, 8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8, 12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0, 9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])

In [32]:
#import the required functions
from scipy.stats import ttest_rel

# find the p-value
test_stat, p_value = ttest_rel(x1, x2) # why are we not using the alternative here?
print('The p-value is ', p_value)

The p-value is  0.0005168689824684378


Since p-value(0.0005) < 0.05, we reject the null hypothesis and conclude that there is a significant difference between the means of the acorns before and after the power plant converts from burning coal to burning natural gas.