## Statistics - Assignment
### Z-Distribution Hypothesis Testing - Questions 11 - 24

In [1]:
import numpy as np

In [2]:
# Function to calculate z-stats for single sample
def calc_z_statistics_one_sample(sample):
    n = len(sample)
    
    print(f"Sample size: n = {n}")

    x_bar = np.mean(sample)
    print(f"Sample Mean: x_bar = {round(x_bar, 2)}")

    s = np.std(sample)
    
    print(f"Sample Standard Deviation: s = {round(s, 2)}")

    z_statistic = x_bar/ (s/np.sqrt(n))

    print("z_statistic = ", round(z_statistic, 2))

    return (z_statistic)

In [3]:
# Function to calculate z-stats for two samples
def calc_z_statistics_two_samples(sample1, sample2):
    n1 = len(sample1)
    n2 = len(sample2)

    print(f"Sample size: n1 = {n1} and sample size: n2 = {n2}")

    x1_bar = np.mean(sample1)
    x2_bar = np.mean(sample2)

    print(f"Sample Mean: x1_bar = {round(x1_bar, 2)} and x2_bar = {round(x2_bar, 2)}, diff_x1x2 = {round(abs(x1_bar - x2_bar), 2)}")

    s1 = np.std(sample1)
    s2 = np.std(sample2)

    print(f"Sample Standard Deviation: s1 = {round(s1, 2)} and s2 = {round(s2, 2)}")

    z_statistic = abs(x1_bar - x2_bar)/ ( np.sqrt( ((s1 ** 2)/n1) + ((s2 ** 2)/n2) ) )

    print("z_statistic = ", round(z_statistic, 2))

    return (z_statistic)

In [4]:
# Function to calculate z-stats for one proportion
def calc_z_statistics_one_proportion(sample, prop):
    pass

In [5]:
# Function to calculate z-stats for two proportion
def calc_z_statistics_two_proportion(n1, n2, p1, p2):

    print(f"Sample1 size and sample1 proportion: n1 = {n1}, p1 = {p1}")
    print(f"Sample2 size and sample2 proportion: n2 = {n2}, p2 = {p2}")     

    p = (n1*p1 + n2*p2)/(n1+n2)

    print(f"pooles proportion p = {round(p, 2)}")

    z_statistic = (p1 - p2)/ ( np.sqrt( p*(1-p)*(1/n1 + 1/n2) ) )

    print("z_statistic = ", round(z_statistic, 2))

    return (round(z_statistic, 2))

#### 11. A company wants to test if a new website layout leads to a higher conversion rate (percentage of visitors who make a purchase). They collect data from the old and new layouts to compare.

To generate the data use the following command:

```python
import numpy as np

# 50 purchases out of 1000 visitors
old_layout = np.array([1] * 50 + [0] * 950)

# 70 purchases out of 1000 visitors  
new_layout = np.array([1] * 70 + [0] * 930)
  ```

Apply z-test to find which layout is successful.

In [6]:
# 50 purchases out of 1000 visitors
old_layout = np.array([1] * 50 + [0] * 950)

# 70 purchases out of 1000 visitors  
new_layout = np.array([1] * 70 + [0] * 930)

p1 = 50/1000
p2 = 70/1000

z_statistic = calc_z_statistics_two_proportion(len(old_layout), len(new_layout), p1, p2)

Sample1 size and sample1 proportion: n1 = 1000, p1 = 0.05
Sample2 size and sample2 proportion: n2 = 1000, p2 = 0.07
pooles proportion p = 0.06
z_statistic =  -1.88


 #### Solution
 
 - There are two samples : old_layout and new_layout
 - For each layout some proportion is given in terms of purchases out of total visitors.
 - Hence this problem can be solved using two proportion z-test since sample size is 1000 which is > 30

#### Steps to perform Z test:

#### Step1:  (Framing Hypothesis)

  $H_0$: New layout increases purchase $\mu_2 > \mu_1$
  $H_A$: New layout will not increase purchase $mu_2 <= mu_1$ (left tailed test)

#### Step2: (Significance Level)

$\alpha = 0.05$

#### Step3: (Calculate Z statistics)

  Old layout (Sample 1)

  Proportion of purchase $p_1$ = 50/1000 = 0.05
  sample size $n_1$ = 1000

  New layout (Sample 2)

  Proportion of purchase $p_2$ = 70/1000 = 0.07
  sample size $n_2$ = 1000

  We use the two proportion formula to calculate the z-test statistics:

  $Z = (p1-p2) / \sqrt {p(1-p)(1/n1+1/n2)}$

  where p1 and p2 are the sample proportions, n1 and n2 are the sample sizes, and where p is the total pooled proportion calculated as:

  $p = (p1n1 + p2n2)/(n1+n2)  = (0.05*1000 + 0.07*1000)/(1000+1000)
                              = (50+70)/2000 = 0.06$
                              
  $Z = (0.05-0.07)/ \sqrt {0.06(1-0.06)(1/1000+1/1000)}
                              = -0.02/ \sqrt{0.0564(0.002)} = -0.02/0.0106 = -1.88$

  $Z_{statistcis} = -1.88$
        

#### Step4: (Z critical)

$Z_{critical} ^{0.05}$ from Z-table is -1.64

#### Step5: Conclusion

$Z_{statistics} < Z_{critical}$ So, we reject the $H_0$ hypothesis. i.e., New layout will not increase the purchase.


####  12. A tutoring service claims that its program improves students' exam scores. A sample of students who participated in the program was taken, and their scores before and after the program were recorded.

Use the below code to generate samples of respective arrays of marks:

```python
before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])
after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])
```

Use z-test to find if the claims made by tutor are true or false.

In [7]:
before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])
after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])

z_statistic = calc_z_statistics_two_samples(before_program, after_program)

Sample size: n1 = 10 and sample size: n2 = 10
Sample Mean: x1_bar = 82.7 and x2_bar = 86.5, diff_x1x2 = 3.8
Sample Standard Deviation: s1 = 6.65 and s2 = 5.1
z_statistic =  1.43


 #### Solution
 
 - There are two samples of exam score: Before and After the exam
 - Hence this problem can be solved using two sample z-test

#### Steps to perform Z test:

#### Step1:  (Framing Hypothesis)

  $H_0$: Exam score improves after exam $\mu_2 > \mu_1$                 
  $H_A$: Exam score will not improve after exam $\mu_2 <= \mu_1$ (left tailed test)

#### Step2: (Significance Level)

$\alpha = 0.05$

#### Step3: (Calculate Z statistics)

  Before exam scores (Sample 1)

  Sample size $n_1$ = 10 \
  Sample Mean $\bar x_1$ = 82.7 \
  Sample Standard Deviation $s_1$ = 6.65

  After exam scores (Sample 2)

  Sample size $n_1$ = 10 \
  Sample Mean $\bar x_2$ = 86.5 \
  Sample Standard Deviation $s_1$ = 5.1

  We use the two sample formula to calculate the z-test statistics:

  $Z = (\bar(x_1)-\bar(x_2) / \sqrt {s_1^2/n_1 + s_2^2/n_2}$

  $Z = (82.7-86.5)/ \sqrt {6.65^2/10 + 5.1^2/10} = -3.8/ \sqrt{4.4222 + 2.601} = -3.8/ \sqrt (7.0232) = -3.8/2.65$

  $Z_{statistcis} = -1.43$
        

#### Step4: (Z critical)

$Z_{critical} ^{0.05}$ from Z-table is -1.64

#### Step5: Conclusion

$Z_{statistics} > Z_{critical}$ So, we accept the $H_0$ hypothesis. i.e., Student's score improves after the exam. Claims made by Tuitor are True.


#### 13. A pharmaceutical company wants to determine if a new drug is effective in reducing blood pressure. They conduct a study and record blood pressure measurements before and after administering the drug.

Use the below code to generate samples of respective arrays of blood pressure:

```python
before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])
after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 130])
  ```

Implement z-test to find if the drug really works or not.

In [8]:
before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])
after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 130])

z_statistic = calc_z_statistics_two_samples(before_drug, after_drug)

Sample size: n1 = 10 and sample size: n2 = 10
Sample Mean: x1_bar = 145.3 and x2_bar = 135.2, diff_x1x2 = 10.1
Sample Standard Deviation: s1 = 8.98 and s2 = 7.15
z_statistic =  2.78


#### Solution
 
 - There are two samples of blood pressure: Before and After the new drug
 - Hence this problem can be solved using two sample z-test

#### Steps to perform Z test:

#### Step1:  (Framing Hypothesis)

$H_0$: With new drug, blood pressure reduces $\mu_2 < \mu_1$                 
$H_A$: No significant change in reducing blood pressure after new drug $\mu_2 >= \mu_1$ (right tailed test)

#### Step2: (Significance Level)

$\alpha = 0.05$

#### Step3: (Calculate Z statistics)

  Before new drug Blood pressure (Sample 1)

  Sample size $n_1$ = 10 \
  Sample Mean $\bar x_1$ = 145.3 \
  Sample Standard Deviation $s_1$ = 8.98

  After new drug Blood pressure (Sample 2)

  Sample size $n_1$ = 10 \
  Sample Mean $\bar x_2$ = 135.2 \
  Sample Standard Deviation $s_1$ = 7.15

  We use the two sample formula to calculate the z-test statistics:

  $Z = (\bar x_1-\bar x_2) / \sqrt {s_1^2/n_1 + s_2^2/n_2}$

  $Z = (145.3-135.2)/ \sqrt {8.98^2/10 + 7.15^2/10} = 10.1/ \sqrt{80.6404/10 + 51.1225/10} = 10.1/ \sqrt (8.06+5.11) = 10.1/3.629$

  $Z_{statistcis} = 2.78$
        

#### Step4: (Z critical)

$Z_{critical} ^{0.05}$ from Z-table is 1.64

#### Step5: Conclusion

$Z_{statistics} > Z_{critical}$ So, we reject the $H_0$ hypothesis. i.e., It is observed that the drug effect is not significant in reducing the blood pressure.


#### 14. A customer service department claims that their average response time is less than 5 minutes. A sample of recent customer interactions was taken, and the response times were recorded.

Implement the below code to generate the array of response time:

```python
response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])
```

Implement z-test to find the claims made by customer service department are true or false.

In [9]:
response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])

z_statistic = calc_z_statistics_one_sample(response_times)

Sample size: n = 10
Sample Mean: x_bar = 4.57
Sample Standard Deviation: s = 0.41
z_statistic =  35.67


 #### Solution
 
 - There is only one sample, Hence this problem can be solved using one sample z-test

#### Steps to perform Z test:

#### Step1:  (Framing Hypothesis)

  $H_0$: Average response time $\mu < 5$ \
  $H_A$: Average response time $\mu >= 5$ (right tailed test)

#### Step2: (Significance Level)

$\alpha = 0.05$

#### Step3: (Calculate Z statistics)

  Customer Service Response time (Sample)

  Sample size n = 10 \
  Sample Mean  $\bar{x}$ = 4.57 \
  Sample Standard Deviation s = 0.41

  We use one sample formula to calculate the z-test statistics:

  $Z = \bar{x} / {s/ \sqrt(n)}$

  $Z = 4.57/ (0.41 / \sqrt {10}) = 4.57/ (0.41/3.16) = 4.57/ 0.1297 = 35.23$

  $Z_{statistcis} = 35.23$
        

#### Step4: (Z critical)

$Z_{critical} ^{0.05}$ from Z-table is 1.64

#### Step5: Conclusion

$Z_{statistics} > Z_{critical}$ So, we reject the $H_0$ hypothesis. i.e., Customer Average response time is > 5 minutes. Customer Service claim is False.
