## Z-Test

### Certainly! Here's an example of a two-sample test for proportions:

Question: Is there a significant difference in the proportion of customers who purchased a product between two different marketing strategies, Strategy A and Strategy B?

### Null Hypothesis (H0): The proportions of customers who purchased a product are equal for Strategy A and Strategy B.
### Alternative Hypothesis (Ha): The proportions of customers who purchased a product are different for Strategy A and Strategy B.

Let's say we collected data from 200 customers for each strategy, and the results are as follows:

Strategy A: 
- Number of customers who purchased: 110
- Sample size: 200

Strategy B: 
- Number of customers who purchased: 130
- Sample size: 200

Now let's perform the two-sample test for proportions using the z-test:

1. State the null hypothesis (H0) and the alternative hypothesis (Ha):
   - H0: The proportions of customers who purchased a product are equal for Strategy A and Strategy B.
   - Ha: The proportions of customers who purchased a product are different for Strategy A and Strategy B.

2. Select the significance level (α) to determine the critical value(s) or the rejection region. Let's assume α = 0.05 for a 95% confidence level.

3. Collect the sample data from the two groups:
   - Group A: Number of customers who purchased (x1) = 110, Sample size (n1) = 200.
   - Group B: Number of customers who purchased (x2) = 130, Sample size (n2) = 200.

4. Calculate the sample proportions:
   - p1 = x1 / n1 = 110 / 200 = 0.55
   - p2 = x2 / n2 = 130 / 200 = 0.65

5. Calculate the standard error of the difference in proportions:
   - SE = sqrt((p1 * (1 - p1) / n1) + (p2 * (1 - p2) / n2))
   - SE = sqrt((0.55 * 0.45 / 200) + (0.65 * 0.35 / 200)) = 0.048

6. Calculate the test statistic (z-score):
   - z = (p1 - p2) / SE = (0.55 - 0.65) / 0.048 = -2.083

7. Determine the critical value(s) or the rejection region based on the chosen significance level and the type of test (two-tailed):
   - For a two-tailed test at α = 0.05, the critical z-values are -1.96 and 1.96.

8. Compare the test statistic with the critical value(s) or evaluate if it falls within the rejection region:
   - Since -2.083 falls outside the range of -1.96 to 1.96, we can reject the null hypothesis.

9. Make a decision:
   - Reject the null hypothesis. There is a significant difference in the proportion of customers who purchased a product between Strategy A and Strategy B.

10. Interpret the result in the context of the research question and the specific alternative hypothesis:
   - There is evidence to suggest that the marketing strategies, Strategy A and Strategy B, have different impacts on customer purchasing behavior. Strategy B appears to be more effective in generating product purchases compared to Strategy A.

Please note that the above example is for illustrative purposes, and actual statistical analysis should consider factors such as sample representativeness, random sampling, and other assumptions.

In [10]:
import scipy.stats as stats
import numpy as np

# Sample data
x1 = 19
n1 = 100
x2 = 62
n2 = 200

# Calculate the sample proportions
p1 = x1 / n1
p2 = x2 / n2

# Calculate the standard errors
se1 = np.sqrt(p1 * (1 - p1) / n1)
se2 = np.sqrt(p2 * (1 - p2) / n2)
SE = np.sqrt((p1 * (1 - p1) / n1) + (p2 * (1 - p2) / n2))

# Calculate the test statistic (z-score)
z_statistic = (p1 - p2) / SE

# Calculate the p-value
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_statistic)))

# Print the test statistic and p-value
print("Test Statistic (z-score):", z_statistic)
print("p-value:", p_value)


Test Statistic (z-score): -2.3495561349012988
p-value: 0.018795809724082124


## T-test (UNKNOWN POP VARIANCE)

In [4]:
import numpy as np
import scipy.stats as stats

# Given data
sample_mean1 = 4
sample_mean2 = 5
sample_std1 = 2.9155
sample_std2 = 2.0976
df = 9
n1 =4
n2 = 5

# Calculate pooled sample variance

squared_std1 = sample_std1 ** 2
squared_std2 = sample_std2 ** 2
pooled_variance = ((n1 - 1) * squared_std1 + (n2 - 1) * squared_std2) / (n1 + n2 - 2)

# Calculate t-score
t_score = (sample_mean1 - sample_mean2) / np.sqrt(pooled_variance * (1/n1 + 1/n2))

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(np.abs(t_score), df))

# Print the calculated values
print("Pooled Sample Variance:", pooled_variance)
print("t-score:", t_score)
print("p-value:", p_value)


Pooled Sample Variance: 6.1571605414285715
t-score: -0.6007634523671481
p-value: 0.5628192055793457
