<a href="https://colab.research.google.com/github/gbiamgaurav/Hypothesis-testing-AB-Testing/blob/main/Hypothesis_Testing_using_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## There are various techniques in Hypothesis Testing. Below are the Hypothesis Testing techniques:

* One-Sample t-Test
* Two-Sample t-Test
* Chi-Square Test
* ANOVA

## One-Sample t-Test

The one-sample t-test determines if the mean (average) of a single group or sample is significantly different from a known population mean. It involves comparing the sample mean to the known population mean while considering the variability within the sample.

* Example: Suppose you have a sample of test scores from a class. You want to test if their average is significantly different from the national average of 70.

In [1]:
import scipy.stats as stats
import numpy as np

In [3]:
## sample data

sample_scores = np.array([65, 78, 67, 72, 74, 62, 76, 70, 68, 71])

## known population mean = 70

population_mean = 70

# perform one sample t test

t_statistic, p_value = stats.ttest_1samp(sample_scores, population_mean)

print(f"t-statistic = {t_statistic}, p-value = {p_value}")

if p_value < 0.05:
  print("Reject the null hypothesis")
else:
  print("Fail to reject the null hypothesis")

t-statistic = 0.19097135526615505, p-value = 0.8527865916734706
Fail to reject the null hypothesis


## Two-Sample t-Test

The two-sample t-test helps determine if there’s a significant difference between the means of two independent groups or samples. It assesses if the difference in sample means is statistically significant while accounting for the variability within each group.

* Example: Comparing the average heights of two different groups of plants treated with different fertilizers.

In [4]:
# Sample data: heights of plants with different fertilizers

heights_fertilizer1 = np.array([15, 16, 17, 14, 16, 15, 16, 17])
heights_fertilizer2 = np.array([14, 15, 15, 15, 16, 14, 15, 15])

In [5]:
# perform 2 sample ttest

t_statistic, p_value = stats.ttest_ind(heights_fertilizer1, heights_fertilizer2)

print(f"t_statistic = {t_statistic}, p_value = {p_value}")

if p_value < 0.05:
  print("Reject the null hypothesis")
else:
  print("Fail to reject the null hypothesis")

t_statistic = 2.032862543430305, p_value = 0.06148225337599243
Fail to reject the null hypothesis


## Chi-Square Test

The chi-square test assesses the association or independence between two categorical variables. It involves comparing the observed frequency of data with the expected frequency assuming independence. The larger the chi-square statistic, the less likely the variables are independent.

* Example: Testing if there is an association between gender (male/female) and preference for a new product (like/dislike).

In [6]:
# Rows: Gender, Columns: Product Preference
data = np.array([[30, 10],  # 30 males like, 10 dislike
                 [35, 5]])  # 35 females like, 5 dislike

In [8]:
# perform chi-square test

chi2_statistic, p_value, dof, expected = stats.chi2_contingency(data)

print(f"t_statistic = {t_statistic}, P-value = {p_value}, Degrees of Freedom = {dof}, Expected frequencies = {expected}")

if p_value < 0.05:
  print("Reject the null hypothesis")
else:
  print("Fail to reject the null hypothesis")

t_statistic = 2.032862543430305, P-value = 0.2518846204641586, Degrees of Freedom = 1, Expected frequencies = [[32.5  7.5]
 [32.5  7.5]]
Fail to reject the null hypothesis


## ANOVA (Analysis of Variance)

ANOVA is used to analyze the differences among means of three or more groups. It tells you if there are statistically significant differences between these groups. ANOVA examines the variance within each group and between groups. It calculates an F-statistic to test if group means are equal.

Example: Testing if three different diets have different effects on weight loss.

In [9]:
from scipy.stats import f_oneway

In [10]:
# Sample data: weight loss for three different diets

diet1 = np.array([2, 3, 1, 2, 2])
diet2 = np.array([4, 5, 4, 4, 5])
diet3 = np.array([5, 6, 7, 6, 5])

In [11]:
# Perform Annova

f_statistic, p_value = f_oneway(diet1, diet2, diet3)

print(f"f_statistic = {f_statistic}, p_value = {p_value}")

if p_value < 0.05:
  print("Reject the null hypothesis")
else:
  print("Fail to reject the null hypothesis")

f_statistic = 36.933333333333294, p_value = 7.449718327740603e-06
Reject the null hypothesis
