1. Hypothesis Testing in Statistics

Definition: Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. It involves formulating a null hypothesis (H₀) and an alternative hypothesis (H₁) and then using statistical tests to determine whether to reject or fail to reject the null hypothesis.
2. Null Hypothesis (H₀) vs. Alternative Hypothesis (H₁)

Null Hypothesis (H₀): A statement of no effect, no difference, or no relationship between variables. It's the default assumption.
Alternative Hypothesis (H₁): A statement that contradicts the null hypothesis. It suggests an effect, difference, or relationship.
3. Significance Level (α)

Definition: The probability of rejecting the null hypothesis when it is actually true.
Importance: It controls the risk of making a Type I error. Common significance levels are 0.05 (5%) and 0.01 (1%).
4. P-value

Definition: The probability of observing the data (or more extreme data) if the null hypothesis were true.
Interpretation:
If P-value ≤ α, reject the null hypothesis.
If P-value > α, fail to reject the null hypothesis.
5. Type I and Type II Errors

Type I Error: Rejecting the null hypothesis when it is actually true (false positive).
Type II Error: Failing to reject the null hypothesis when it is actually false (false negative).
6. One-tailed vs. Two-tailed Tests

One-tailed Test: Tests for an effect in only one direction (e.g., greater than, less than).
Two-tailed Test: Tests for an effect in either direction (e.g., different from).
7. Z-test

Definition: A statistical test used to compare the mean of a sample to the mean of a known population with a known standard deviation.
Usage: When the population standard deviation is known and the sample size is large (usually n ≥ 30).
8. Z-score

Calculation: (Sample mean - Population mean) / (Population standard deviation / √sample size)
Interpretation: Represents the number of standard deviations a sample mean is from the population mean.
9. T-distribution

Definition: A probability distribution similar to the normal distribution but with heavier tails, used when the population standard deviation is unknown.
Usage: When the population standard deviation is unknown and the sample size is small (usually n < 30).
10. Z-test vs. T-test

Z-test: Uses population standard deviation, suitable for large samples.
T-test: Uses sample standard deviation, suitable for small samples.
11. T-test

Definition: A statistical test used to compare means between groups when the population standard deviation is unknown.
Types: One-sample t-test, independent samples t-test, paired samples t-test.
12. Relationship between Z-test and T-test

As sample size increases, the t-distribution approaches the standard normal distribution (Z-distribution).
13. Confidence Interval

Definition: A range of values within which the true population parameter is likely to fall with a certain level of confidence.
Interpretation: Provides a range of plausible values for the population parameter.
14. Margin of Error

Definition: The amount by which the sample statistic may differ from the true population parameter.
Effect on Confidence Interval: A larger margin of error results in a wider confidence interval.
15. Bayes' Theorem

Usage: Used to update probabilities based on new evidence.
Significance: Provides a framework for probabilistic reasoning and inference.
16. Chi-square Distribution

Definition: A probability distribution used to test hypotheses about categorical data.
Usage: Chi-square goodness-of-fit test, chi-square test for independence.
17. Chi-square Goodness-of-fit Test

Definition: Tests whether observed frequencies in a categorical variable match expected frequencies.
18. F-distribution

Definition: A probability distribution used to compare variances between two or more groups.
Usage: ANOVA (Analysis of Variance).


19.Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. It involves formulating a null hypothesis (a statement of no relationship or no difference between variables) and an alternative hypothesis and then using data and statistical analysis to determine which hypothesis is more likely to be true.

There are several types of hypothesis tests, including:

One-Sample Tests: One-Sample hypothesis test is used to compare a sample mean to a population mean. In other words, it is used to determine if the sample mean is significantly different from a specified value (population mean).


20. ANOVA (Analysis of Variance)

Definition: A statistical method used to compare the means of three or more groups.
Types: One-way ANOVA, two-way ANOVA.
21. ANOVA Assumptions

Normality: The data within each group should be normally distributed.
Homogeneity of variances: The variance of the dependent variable should be equal across all groups.
Independence: Observations within and between groups should be independent.
22. Types of ANOVA Tests

One-way ANOVA: Compares means of multiple groups for a single independent variable.
Two-way ANOVA: Compares means of multiple groups for two independent variables and their interaction.
23. F-test

Definition: A statistical test used to compare the variances of two or more groups.
Relationship to Hypothesis Testing: In ANOVA, the F-test determines whether there are statistically significant differences between group means.


In [None]:
1. Z-test for Comparing Sample Mean to Population Mean

Python

import scipy.stats as stats

# Sample data
sample_mean = 50
population_mean = 48
population_std = 10
sample_size = 50

# Calculate Z-score
z_score = (sample_mean - population_mean) / (population_std / (sample_size ** 0.5))

# Calculate p-value
p_value = stats.norm.sf(abs(z_score)) * 2  # Two-tailed test

print("Z-score:", z_score)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis. Sample mean is not significantly different from population mean.")
2. Simulate Data and Perform Hypothesis Testing

Python

import numpy as np
import scipy.stats as stats

# Simulate data from a normal distribution
population_mean = 50
population_std = 10
sample_size = 30
data = np.random.normal(loc=population_mean, scale=population_std, size=sample_size)

# Perform t-test (since population standard deviation is unknown)
t_stat, p_value = stats.ttest_1samp(data, population_mean)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis. Sample mean is not significantly different from population mean.")
3. One-sample Z-test

Python

import scipy.stats as stats

# Sample data
sample_mean = 50
population_mean = 48
population_std = 10
sample_size = 50

# Perform Z-test
z_stat, p_value = stats.zscore(sample_mean, a=population_mean, ddof=0), stats.norm.sf(abs(z_stat))*2

print("Z-statistic:", z_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis. Sample mean is not significantly different from population mean.")
4. Two-tailed Z-test with Visualization

Python

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Sample data
sample_mean = 50
population_mean = 48
population_std = 10
sample_size = 50

# Calculate Z-score
z_score = (sample_mean - population_mean) / (population_std / (sample_size ** 0.5))

# Calculate p-value
p_value = stats.norm.sf(abs(z_score)) * 2  # Two-tailed test

# Visualize
x = np.linspace(-4, 4, 100)
plt.plot(x, stats.norm.pdf(x))
plt.axvline(x=z_score, color='red', linestyle='--')
plt.axvline(x=-z_score, color='red', linestyle='--')
plt.fill_between(x[x > abs(z_score)], 0, stats.norm.pdf(x[x > abs(z_score)]), alpha=0.2, color='red')
plt.fill_between(x[x < -abs(z_score)], 0, stats.norm.pdf(x[x < -abs(z_score)]), alpha=0.2, color='red')
plt.xlabel('Z-score')
plt.ylabel('Probability Density')
plt.title('Two-tailed Z-test')
plt.show()

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis. Sample mean is not significantly different from population mean.")
5. Type I and Type II Error Function

Python

import numpy as np
import matplotlib.pyplot as plt

def visualize_type_i_type_ii_errors(alpha, true_mean, sample_mean, population_std, sample_size):
    # ... (Visualization code for Type I and Type II errors) ...
6. Independent Samples T-test

Python

import scipy.stats as stats

# Sample data for two groups
group1 = [5, 10, 15, 20, 25]
group2 = [10, 15, 20, 25, 30]

# Perform independent samples t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the means of the two groups.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the means of the two groups.")
7. Paired Samples T-test

Python

import scipy.stats as stats

# Sample data for paired observations
before = [10, 12, 15, 18, 20]
after = [12, 14, 17, 20, 22]

# Perform paired samples t-test
t_stat, p_value = stats.ttest_rel(before, after)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the before and after measurements.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the before and after measurements.")
8. Z-test vs. T-test Comparison

Python

# ... (Simulate data and perform Z-test and T-test, compare results) ...
9. Confidence Interval Calculation

Python

import scipy.stats as stats

# Sample data
sample_mean = 50
sample_std = 5
sample_size = 30
confidence_level = 0.95

# Calculate margin of error
margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1) * (sample_std / (sample_size ** 0.5))

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("Confidence Interval:", (lower_bound, upper_bound))
10. Margin of Error Calculation

Python

import scipy.stats as stats

# Sample data
sample_mean = 50
sample_std = 5
sample_size = 30
confidence_level = 0.95

# Calculate margin of error
margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1) * (sample_std / (sample_size ** 0.5))

print("Margin of Error:",



      11. Bayesian Inference with PyMC3

Python

import pymc3 as pm

# Define the model
with pm.Model() as model:
    # Prior distribution for the mean
    mu = pm.Normal('mu', mu=50, sigma=10)
    # Likelihood (assuming normal distribution)
    y_obs = pm.Normal('y_obs', mu=mu, sigma=5, observed=data)

    # Sample from the posterior distribution
    trace = pm.sample(1000, tune=1000)

# Extract posterior samples for the mean
posterior_samples = trace['mu']

# Analyze posterior distribution (e.g., calculate credible intervals)
# ...
12. Chi-square Test for Independence

Python

import scipy.stats as stats

# Observed frequencies in a contingency table
observed_freqs = np.array([[10, 20], [30, 15]])

# Perform chi-square test
chi2_stat, p_value, dof, expected_freqs = stats.chi2_contingency(observed_freqs)

print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant association between the two variables.")
else:
    print("Fail to reject the null hypothesis. There is no significant association between the two variables.")
13. Chi-square Goodness-of-fit Test

Python

import scipy.stats as stats

# Observed frequencies
observed_freqs = np.array([10, 20, 30, 40])

# Expected frequencies under the null hypothesis
expected_freqs = np.array([25, 25, 25, 25])

# Perform chi-square goodness-of-fit test
chi2_stat, p_value = stats.chisquare(observed_freqs, f_exp=expected_freqs)

print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The observed frequencies do not fit the expected distribution.")
else:
    print("Fail to reject the null hypothesis. The observed frequencies fit the expected distribution.")
14. Simulate and Visualize Chi-square Distribution

Python

import numpy as np
import matplotlib.pyplot as plt

# Degrees of freedom
df = 5

# Generate random samples from chi-square distribution
x = np.linspace(0, 20, 100)
y = stats.chi2.pdf(x, df)

# Plot the chi-square distribution
plt.plot(x, y)
plt.xlabel('Chi-square')
plt.ylabel('Probability Density')
plt.title('Chi-square Distribution (df={})'.format(df))
plt.show()
15. F-test for Comparing Variances

Python

import scipy.stats as stats

# Sample data for two groups
group1 = [5, 10, 15, 20, 25]
group2 = [10, 15, 20, 25, 30]

# Perform F-test
f_stat, p_value = stats.f_oneway(group1, group2)

print("F-statistic:", f_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The variances of the two groups are significantly different.")
else:
    print("Fail to reject the null hypothesis. The variances of the two groups are not significantly different.")
16. One-way ANOVA

Python

import scipy.stats as stats

# Sample data for multiple groups
group1 = [5, 10, 15, 20, 25]
group2 = [10, 15, 20, 25, 30]
group3 = [15, 20, 25, 30, 35]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print("F-statistic:", f_stat)
print("P-value:", p_value)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the means of the groups.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the means of the groups.")
17. Check ANOVA Assumptions

Python

# ... (Code to check normality, homogeneity of variances, and independence) ...
18. Two-way ANOVA

Python

# ... (Code to perform two-way ANOVA) ...
19. Visualize F-distribution

Python

# ... (Code to simulate and visualize F-distribution) ...
20. One-way ANOVA with Boxplots

Python

# ... (Code to perform one-way ANOVA and create boxplots for visualization) ...
21. Simulate Data and Perform Hypothesis Testing

Python

# ... (Code to simulate data, perform hypothesis testing, and evaluate means) ...
22. Hypothesis Test for Population Variance

Python

# ... (Code to perform hypothesis test for population variance using chi-square distribution) ...
23. Z-test for Comparing Proportions

Python

# ... (Code to perform Z-test for comparing proportions between two datasets or groups) ...
24. F-test for Comparing Variances

Python

# ... (Code to perform F-test for comparing variances of two datasets, interpret and visualize results) ...
25. Chi-square Goodness-of-fit Test with Simulated Data

Python

# ... (Code to perform Chi-square goodness-of-fit test with simulated data and analyze results

26. One-way ANOVA with Boxplots

Python

import scipy.stats as stats
import matplotlib.pyplot as plt

# Sample data for multiple groups
group1 = [5, 10, 15, 20, 25]
group2 = [10, 15, 20, 25, 30]
group3 = [15, 20, 25, 30, 35]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

# Create boxplots
plt.boxplot([group1, group2, group3], labels=['Group 1', 'Group 2', 'Group 3'])
plt.ylabel('Values')
plt.title('Boxplot of Groups')
plt.show()

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the means of the groups.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the means of the groups.")
27. Simulate Data and Perform Hypothesis Testing

Python

import numpy as np
import scipy.stats as stats

# Simulate data from a normal distribution (e.g., for a one-sample t-test)
population_mean = 50
population_std = 10
sample_size = 30
data = np.random.normal(loc=population_mean, scale=population_std, size=sample_size)

# Perform t-test
t_stat, p_value = stats.ttest_1samp(data, population_mean)

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. Sample mean is significantly different from population mean.")
else:
    print("Fail to reject the null hypothesis. Sample mean is not significantly dif