### 1. What is hypothesis testing in statistics

**Answer:**
Hypothesis testing is a statistical method that allows us to make inferences or educated guesses about a population based on a sample. It involves testing an assumption (hypothesis) using sample data and determining whether to reject or fail to reject the assumption.


### 2. What is the null hypothesis, and how does it differ from the alternative hypothesis

**Answer:**
The null hypothesis (H0) is the default assumption that there is no effect or no difference. The alternative hypothesis (H1 or Ha) is what you want to prove, indicating that there is an effect or difference. Hypothesis testing is about evaluating evidence against the null hypothesis.

### 3. What is the significance level in hypothesis testing, and why is it important

**Answer:**
The significance level (alpha) is the probability of rejecting the null hypothesis when it is actually true (Type I error). Common values are 0.05 or 0.01. It sets the threshold for determining whether a result is statistically significant.

### 4. What does a P-value represent in hypothesis testing

**Answer:**
The P-value is the probability of obtaining test results at least as extreme as the observed results, under the assumption that the null hypothesis is true. It quantifies the evidence against the null hypothesis.


### 5. How do you interpret the P-value in hypothesis testing

**Answer:**
A small P-value (typically < 0.05) indicates strong evidence against the null hypothesis and leads to its rejection. A large P-value suggests weak evidence and failure to reject the null hypothesis.



### 6. What are Type 1 and Type 2 errors in hypothesis testing

**Answer:**
Type 1 error occurs when we reject a true null hypothesis. Type 2 error happens when we fail to reject a false null hypothesis. Alpha represents the probability of a Type 1 error, while Beta is related to Type 2 error.

### 7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing

**Answer:**
A one-tailed test looks for an effect in one direction (greater or lesser), while a two-tailed test checks for any difference regardless of direction. Two-tailed tests are more conservative.


### 8. What is the Z-test, and when is it used in hypothesis testing

**Answer:**
The Z-test is used when the population standard deviation is known and the sample size is large (n > 30). It's applied to compare a sample mean to a known population mean.


### 9. How do you calculate the Z-score, and what does it represent in hypothesis testing
**Answer:**
Z-score = (X̄ - μ) / (σ / √n), where X̄ is sample mean, μ is population mean, σ is population standard deviation, and n is sample size. It tells us how many standard deviations the sample mean is from the population mean.


### 10. What is the T-distribution, and when should it be used instead of the normal distribution
**Answer:**
The T-distribution is used instead of the normal distribution when the sample size is small (n < 30) and/or the population standard deviation is unknown. It's wider and has heavier tails than the normal distribution.


### 11. What is the difference between a Z-test and a T-test
**Answer:**
The Z-test is used when the population standard deviation is known, while the T-test is used when it is unknown and estimated from the sample. T-tests are more common for small sample sizes.


### 12. What is the T-test, and how is it used in hypothesis testing
**Answer:**
A T-test compares the sample mean to a known value (one-sample), between two independent groups (independent T-test), or paired samples (paired T-test). It tests if means are statistically different.


### 13. What is the relationship between Z-test and T-test in hypothesis testing
**Answer:**
Z-test and T-test both compare means, but T-test adjusts for small samples and unknown variance. As the sample size grows, the T-distribution approaches the normal (Z) distribution.


### 14. What is a confidence interval, and how is it used to interpret statistical results
**Answer:**
A confidence interval gives a range of values within which the true population parameter is expected to lie with a certain level of confidence (e.g., 95%). It helps in understanding the precision of an estimate.

### 15. What is the margin of error, and how does it affect the confidence interval
**Answer:**
Margin of error is the range added and subtracted from the sample estimate to form a confidence interval. A larger margin means less precision, and it's influenced by sample size and confidence level.


### 16. How is Bayes' Theorem used in statistics, and what is its significance
**Answer:**
Bayes' Theorem is used to update the probability of a hypothesis based on new evidence. It's widely used in machine learning, spam filtering, and medical testing to improve predictions.


### 17. What is the Chi-square distribution, and when is it used
**Answer:**
The Chi-square distribution is used in tests of independence and goodness of fit for categorical data. It is positively skewed and depends on degrees of freedom.


### 18. What is the Chi-square goodness of fit test, and how is it applied
**Answer:**
The Chi-square goodness of fit test checks whether the observed frequency distribution matches an expected distribution. It uses the formula Σ((O-E)²/E) across categories.


### 19. What is the F-distribution, and when is it used in hypothesis testing
**Answer:**
The F-distribution arises in comparing two variances. It's used in ANOVA and regression analysis to assess overall significance.


### 20. What is an ANOVA test, and what are its assumptions
**Answer:**
ANOVA (Analysis of Variance) tests for differences among group means. Assumptions include normality, homogeneity of variance, and independence of observations.


### 21. What are the different types of ANOVA tests
**Answer:**
Types of ANOVA:
- One-way ANOVA (one factor)
- Two-way ANOVA (two factors)
- Repeated Measures ANOVA (same subjects over time)


### 22. What is the F-test, and how does it relate to hypothesis testing?
**Answer:**
The F-test compares variances to determine if the variability between groups is greater than expected by chance. It underlies the ANOVA test.

In [2]:
# Practical Part Starts Here


import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from statsmodels.stats.weightstats import ztest

# 1. Calculate margin of error for a given confidence level using sample data
def margin_of_error(data, confidence=0.95):
    n = len(data)
    se = stats.sem(data)
    moe = se * stats.t.ppf((1 + confidence) / 2., n-1)
    return moe

sample = np.random.normal(100, 10, 50)
print("\nMargin of Error (95%):", margin_of_error(sample))











Margin of Error (95%): 2.9733799671115837


In [1]:
# 2. Bayesian inference using Bayes' Theorem
def bayesian_inference(prior, likelihood, evidence):
    posterior = (prior * likelihood) / evidence
    return posterior

prior = 0.01
likelihood = 0.9
evidence = 0.05
posterior = bayesian_inference(prior, likelihood, evidence)
print("\nPosterior Probability:", posterior)


Posterior Probability: 0.18000000000000002


In [None]:


# 3. Chi-square test for independence
data = {'Gender': ['Male', 'Female'],
        'Smoker': [30, 20],
        'Non-Smoker': [70, 80]}
df = pd.DataFrame(data)
obs = np.array([[30, 70], [20, 80]])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print("\nChi-square Test for Independence:")
print("Chi2 Statistic:", chi2)
print("P-value:", p)

In [None]:
# 4. Calculate expected frequencies for Chi-square test
print("Expected Frequencies:\n", expected)

In [None]:
# 5. Chi-square goodness-of-fit test
observed = np.array([50, 30, 20])
expected = np.array([33.33, 33.33, 33.33])
chi2_stat, p_val = stats.chisquare(f_obs=observed, f_exp=expected)
print("\nChi-square Goodness of Fit:")
print("Chi2 Statistic:", chi2_stat)
print("P-value:", p_val)


In [None]:


# 6. Visualize the Chi-square distribution
x = np.linspace(0, 20, 1000)
y = stats.chi2.pdf(x, df=2)
plt.figure(figsize=(8, 5))
plt.plot(x, y)
plt.title("Chi-square Distribution (df=2)")
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.show()


In [None]:


# 7. F-test for comparing variances
sample1 = np.random.normal(20, 5, 100)
sample2 = np.random.normal(22, 3, 100)
f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
df1, df2 = len(sample1)-1, len(sample2)-1
p_value = 1 - stats.f.cdf(f_stat, df1, df2)
print("\nF-test:")
print("F Statistic:", f_stat)
print("P-value:", p_value)

In [None]:

# 8. ANOVA test to compare means between multiple groups
group1 = np.random.normal(60, 5, 30)
group2 = np.random.normal(65, 5, 30)
group3 = np.random.normal(70, 5, 30)
F_stat, p_val = stats.f_oneway(group1, group2, group3)
print("\nANOVA Test:")
print("F-statistic:", F_stat)
print("P-value:", p_val)

In [None]:

# 9. One-way ANOVA visualization with boxplots
data = [group1, group2, group3]
plt.figure(figsize=(8, 5))
plt.boxplot(data, labels=['Group1', 'Group2', 'Group3'])
plt.title("Boxplot of Groups (One-way ANOVA)")
plt.show()



In [None]:
# 10. Write a Python program to calculate the margin of error for a given confidence level using sample data
import numpy as np
from scipy.stats import norm

data = np.random.normal(50, 10, 100)
confidence = 0.95
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))
z = norm.ppf((1 + confidence) / 2)
margin_error = z * std_err

print("Margin of Error:", margin_error)


In [None]:
# 11. Implement a Bayesian inference method using Bayes’ Theorem in Python and explain the process
# Prior, Likelihood, Evidence
prior = 0.01          # P(Disease)
sensitivity = 0.95    # P(Positive | Disease)
false_positive = 0.05 # P(Positive | No Disease)
evidence = (sensitivity * prior) + (false_positive * (1 - prior))

# Posterior
posterior = (sensitivity * prior) / evidence
print("Posterior Probability (P(Disease | Positive)):", posterior)

In [None]:

# 12. Perform a Chi-square test for independence between two categorical variables in Python
import pandas as pd
from scipy.stats import chi2_contingency

# Create a contingency table
data = [[30, 10], [20, 40]]
chi2, p, dof, expected = chi2_contingency(data)

print("Chi-square statistic:", chi2)
print("P-value:", p)


In [None]:



# 13. Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data
observed = np.array([[10, 20], [20, 30]])
chi2, p, dof, expected = chi2_contingency(observed)

print("Expected Frequencies:\n", expected)

In [None]:

# 14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution
from scipy.stats import chisquare

observed = [18, 22, 20, 40]
expected = [25, 25, 25, 25]

chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)
print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)


In [None]:
# 15. Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics
import matplotlib.pyplot as plt
from scipy.stats import chi2

df = 10  # degrees of freedom
x = np.linspace(0, 40, 500)
y = chi2.pdf(x, df)

plt.plot(x, y, label=f"Chi-square (df={df})")
plt.title("Chi-square Distribution")
plt.xlabel("X")
plt.ylabel("Density")
plt.grid(True)
plt.legend()
plt.show()



In [None]:
# 16. Implement an F-test using Python to compare the variances of two random samples
from scipy.stats import f

data1 = np.random.normal(70, 10, 30)
data2 = np.random.normal(75, 15, 30)

var1 = np.var(data1, ddof=1)
var2 = np.var(data2, ddof=1)
f_stat = var1 / var2
df1 = len(data1) - 1
df2 = len(data2) - 1

p_value = 1 - f.cdf(f_stat, df1, df2)

print("F-statistic:", f_stat)
print("P-value:", p_value)


In [None]:
# 17. Write a Python program to perform an ANOVA test to compare means between multiple groups and interpret the results
from scipy.stats import f_oneway

group1 = np.random.normal(70, 5, 30)
group2 = np.random.normal(75, 5, 30)
group3 = np.random.normal(80, 5, 30)

f_stat, p_value = f_oneway(group1, group2, group3)

print("F-statistic:", f_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Significant difference between group means.")
else:
    print("No significant difference between group means.")

In [None]:
# 18. Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results

plt.boxplot([group1, group2, group3], labels=["Group 1", "Group 2", "Group 3"])
plt.title("One-Way ANOVA Boxplot")
plt.ylabel("Scores")
plt.grid(True)
plt.show()

In [None]:
# 19. Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA
from scipy.stats import shapiro, levene

def check_anova_assumptions(groups):
    for i, group in enumerate(groups):
        stat, p = shapiro(group)
        print(f"Group {i+1} Normality p-value:", p)
    stat, p = levene(*groups)
    print("Levene’s Equal Variance Test p-value:", p)

check_anova_assumptions([group1, group2, group3])

In [None]:
# 20. Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the results

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

df = pd.DataFrame({
    'Score': np.random.normal(70, 10, 60),
    'Gender': ['Male', 'Female'] * 30,
    'Treatment': ['A', 'B'] * 30
})

model = ols('Score ~ C(Gender) + C(Treatment) + C(Gender):C(Treatment)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

In [None]:
# 21. Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing

x = np.linspace(0, 5, 1000)
df1, df2 = 5, 20
y = f.pdf(x, df1, df2)

plt.plot(x, y, label=f'F-distribution df1={df1}, df2={df2}')
plt.title("F-distribution")
plt.xlabel("F value")
plt.ylabel("Density")
plt.grid(True)
plt.legend()
plt.show()

In [None]:
# 22. Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means
plt.boxplot([group1, group2, group3], labels=["Group 1", "Group 2", "Group 3"])
plt.title("One-Way ANOVA Boxplot")
plt.ylabel("Scores")
plt.grid(True)
plt.show()

In [None]:
# 23. Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means
from scipy.stats import ttest_1samp

data = np.random.normal(100, 10, 50)
t_stat, p_value = ttest_1samp(data, 100)
print("T-statistic:", t_stat)
print("P-value:", p_value)

In [None]:
# 24. Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results

from scipy.stats import chi2

sample = np.random.normal(100, 15, 30)
sample_var = np.var(sample, ddof=1)
hypo_var = 15**2
n = len(sample)

chi2_stat = (n - 1) * sample_var / hypo_var
p_value = 1 - chi2.cdf(chi2_stat, df=n - 1)

print("Chi-square stat:", chi2_stat)
print("P-value:", p_value)

In [None]:
# 25. Write a Python script to perform a Z-test for comparing proportions between two datasets or groups
from statsmodels.stats.proportion import proportions_ztest

successes = [40, 30]
samples = [100, 100]
z_stat, p_value = proportions_ztest(successes, samples)

print("Z-stat:", z_stat)
print("P-value:", p_value)

In [None]:
# 26. Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results
data1 = np.random.normal(60, 10, 40)
data2 = np.random.normal(60, 20, 40)

var1 = np.var(data1, ddof=1)
var2 = np.var(data2, ddof=1)
f_stat = var1 / var2
df1, df2 = len(data1)-1, len(data2)-1
p_value = 1 - f.cdf(f_stat, df1, df2)

print("F-stat:", f_stat)
print("P-value:", p_value)

In [1]:
# 27. Perform a Chi-square test for goodness of fit with simulated data and analyze the results
from scipy.stats import chisquare

observed = [18, 20, 22, 40]
expected = [25, 25, 25, 25]

chi2_stat, p = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square:", chi2_stat)
print("P-value:", p)

Chi-square: 12.32
P-value: 0.006363629995195272
