

---

### 01 What is hypothesis testing in statistics?**

Hypothesis testing is a statistical method used to make decisions about population parameters based on sample data. It evaluates whether there is enough evidence to reject a presumed statement (the **null hypothesis**) in favor of an **alternative hypothesis**.

---

### 02 What is the null hypothesis, and how does it differ from the alternative hypothesis?**

* **Null Hypothesis (H₀)**: Assumes no effect or no difference. It is the default or status quo.
* **Alternative Hypothesis (H₁ or Ha)**: Represents the research claim or the effect we are testing for.

**Example:**

* H₀: μ = 50 (mean is 50)
* Ha: μ ≠ 50 (mean is not 50)

---

### 03 What is the significance level in hypothesis testing, and why is it important?**

The **significance level (α)** is the threshold for rejecting the null hypothesis. Common values are **0.05** or **0.01**, which means a 5% or 1% chance of rejecting H₀ when it is actually true. It controls the **Type I error rate**.

---

### 04 What does a P-value represent in hypothesis testing?**

The **P-value** is the probability of obtaining a test statistic as extreme as the one observed (or more) under the assumption that the null hypothesis is true.

---

### 05 How do you interpret the P-value in hypothesis testing?**

* If **P-value ≤ α**: Reject the null hypothesis (evidence supports H₁).
* If **P-value > α**: Do not reject the null hypothesis (insufficient evidence to support H₁).

---

### 06 What are Type I and Type II errors in hypothesis testing?**

* **Type I Error (α)**: Rejecting a true null hypothesis (false positive).
* **Type II Error (β)**: Failing to reject a false null hypothesis (false negative).

---

### 07 What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**

* **One-tailed test**: Tests for effect in one direction (e.g., H₁: μ > 50).
* **Two-tailed test**: Tests for any difference (e.g., H₁: μ ≠ 50).

---

### 08 What is the Z-test, and when is it used in hypothesis testing?**

A **Z-test** is used to test hypotheses about the mean when the population standard deviation is known and the sample size is large (n ≥ 30). It assumes the data is normally distributed.

---

### 09 How do you calculate the Z-score, and what does it represent in hypothesis testing?**

$$
Z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}
$$

* **Z** represents how many standard deviations the sample mean $\bar{x}$ is from the population mean $\mu$.

---

### 10 What is the T-distribution, and when should it be used instead of the normal distribution?**

The **T-distribution** is used when:

* The population standard deviation is unknown.
* The sample size is small (n < 30).
  It accounts for additional variability due to estimating the population standard deviation.

---

### 11 What is the difference between a Z-test and a T-test?**

* **Z-test**: Known population standard deviation; large sample.
* **T-test**: Unknown population standard deviation; small sample.

---

### 12 What is the T-test, and how is it used in hypothesis testing?**

A **T-test** compares sample means to a known value or to each other to determine if there's a significant difference. Types include:

* One-sample T-test
* Independent two-sample T-test
* Paired T-test

---

### 13 What is the relationship between Z-test and T-test in hypothesis testing?**

Both test means but differ in assumptions:

* T-test converges to Z-test as sample size increases.
* T-test handles more uncertainty due to estimating σ.

---

### 14 What is a confidence interval, and how is it used to interpret statistical results?**

A **confidence interval (CI)** gives a range of plausible values for a population parameter. For example, a 95% CI means we’re 95% confident the true value lies within the interval.

---

### 15 What is the margin of error, and how does it affect the confidence interval?**

The **margin of error** is half the width of the confidence interval. A smaller margin indicates more precision. It depends on the sample size and confidence level.

---

### 16 How is Bayes' Theorem used in statistics, and what is its significance?**

**Bayes' Theorem** updates the probability of a hypothesis based on new evidence. It’s essential in **Bayesian statistics**, allowing dynamic, evidence-based inference:

$$
P(H|E) = \frac{P(E|H)P(H)}{P(E)}
$$

---

### 17 What is the Chi-square distribution, and when is it used?**

The **Chi-square distribution** is used for tests involving **categorical data** and **variances**, such as:

* Goodness-of-fit test
* Test of independence
* Test for variance

---

### 18 What is the Chi-square goodness of fit test, and how is it applied?**

It tests whether observed frequencies match expected frequencies in categorical data. If the difference is statistically significant, the distribution does not fit.

---

### **19 What is the F-distribution, and when is it used in hypothesis testing?**

The **F-distribution** is used to compare two variances or in **ANOVA** to test if multiple groups have the same mean. It’s always positive and skewed right.

---

### **02 What is an ANOVA test, and what are its assumptions?**

**ANOVA (Analysis of Variance)** compares means across three or more groups. Assumptions:

* Independence of observations
* Normality
* Equal variances (homoscedasticity)

---

### **21 What are the different types of ANOVA tests?**

* **One-way ANOVA**: One independent variable
* **Two-way ANOVA**: Two independent variables
* **Repeated Measures ANOVA**: Same subjects under different conditions

---

### **22 What is the F-test, and how does it relate to hypothesis testing?**

An **F-test** compares variances or model fits. In ANOVA, it tests whether the variance between group means is larger than expected by chance.

---




In [None]:
# 01. Generate a random variable and display its value
import random
print("Random Variable:", random.randint(1, 100))

# 02. Discrete uniform distribution and PMF
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint
x = np.arange(1, 7)
pmf = randint.pmf(x, 1, 7)
plt.stem(x, pmf, basefmt=" ")
plt.title("Discrete Uniform Distribution PMF")
plt.xlabel("Value")
plt.ylabel("PMF")
plt.show()

# 03. Bernoulli distribution PDF
from scipy.stats import bernoulli
def bernoulli_pdf(p):
    x = [0, 1]
    pdf = bernoulli.pmf(x, p)
    return dict(zip(x, pdf))
print(bernoulli_pdf(0.6))

# 04. Binomial distribution simulation
binom_data = np.random.binomial(10, 0.5, 1000)
plt.hist(binom_data, bins=range(12), align='left', rwidth=0.8)
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Successes")
plt.ylabel("Frequency")
plt.show()

# 05. Poisson distribution visualization
from scipy.stats import poisson
mu = 3
x = np.arange(0, 10)
poisson_pmf = poisson.pmf(x, mu)
plt.stem(x, poisson_pmf, basefmt=" ")
plt.title("Poisson Distribution (mu=3)")
plt.xlabel("x")
plt.ylabel("PMF")
plt.show()

# 06. CDF of discrete uniform distribution
cdf = randint.cdf(x, 1, 7)
plt.step(x, cdf, where="mid")
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("CDF")
plt.grid(True)
plt.show()

# 07. Continuous uniform distribution
from scipy.stats import uniform
x = np.linspace(0, 1, 1000)
pdf = uniform.pdf(x, loc=0, scale=1)
plt.plot(x, pdf)
plt.title("Continuous Uniform Distribution")
plt.xlabel("x")
plt.ylabel("PDF")
plt.grid(True)
plt.show()

# 08. Normal distribution simulation
normal_data = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(normal_data, bins=30, density=True, alpha=0.6)
plt.title("Histogram of Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

# 09. Z-scores and plot
def z_scores(data):
    mean = np.mean(data)
    std = np.std(data)
    return (data - mean) / std
zs = z_scores(normal_data)
plt.hist(zs, bins=30)
plt.title("Z-scores")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()

# 10. CLT using non-normal distribution
samples = [np.mean(np.random.exponential(scale=2, size=30)) for _ in range(1000)]
plt.hist(samples, bins=30, density=True, alpha=0.6)
plt.title("CLT from Exponential Distribution")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()

# 11. CLT with normal distribution
sample_means = [np.mean(np.random.normal(0, 1, 30)) for _ in range(1000)]
plt.hist(sample_means, bins=30, density=True)
plt.title("CLT with Normal Samples")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.show()

# 12. Standard normal distribution plot
x = np.linspace(-4, 4, 1000)
y = (1/np.sqrt(2*np.pi)) * np.exp(-0.5 * x**2)
plt.plot(x, y)
plt.title("Standard Normal Distribution")
plt.xlabel("x")
plt.ylabel("PDF")
plt.grid(True)
plt.show()

# 13. Binomial random variables and probabilities
probs = binom.pmf(range(11), 10, 0.5)
plt.bar(range(11), probs)
plt.title("Binomial PMF (n=10, p=0.5)")
plt.xlabel("x")
plt.ylabel("Probability")
plt.show()

# 14. Z-score comparison
x_value = 1.2
z = (x_value - 0) / 1
print(f"Z-score of {x_value}: {z}")

# 15. Hypothesis testing using Z-statistic
sample = np.random.normal(5.2, 1, 30)
pop_mean = 5
z_stat = (np.mean(sample) - pop_mean) / (np.std(sample, ddof=1) / np.sqrt(len(sample)))
print(f"Z-statistic: {z_stat}")

# 16. Confidence interval
sample = np.random.normal(100, 15, 50)
mean = np.mean(sample)
std_err = np.std(sample, ddof=1) / np.sqrt(len(sample))
conf_int = (mean - 1.96 * std_err, mean + 1.96 * std_err)
print(f"95% Confidence Interval: {conf_int}")

# 17. CI for normal data
normal_sample = np.random.normal(0, 1, 100)
mean = np.mean(normal_sample)
stderr = np.std(normal_sample, ddof=1) / np.sqrt(100)
print(f"CI: ({mean - 1.96*stderr}, {mean + 1.96*stderr})")

# 18. PDF of normal distribution
from scipy.stats import norm
x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x)
plt.plot(x, pdf)
plt.title("PDF of Standard Normal Distribution")
plt.xlabel("x")
plt.ylabel("PDF")
plt.grid(True)
plt.show()

# 19. Poisson CDF
cdf = poisson.cdf(x, mu=3)
plt.step(x, cdf, where="mid")
plt.title("CDF of Poisson Distribution (mu=3)")
plt.xlabel("x")
plt.ylabel("CDF")
plt.grid(True)
plt.show()

# 20. Expected value of uniform
uniform_data = np.random.uniform(0, 1, 10000)
print("Expected value:", np.mean(uniform_data))

# 21. Compare std deviations
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(0, 2, 100)
plt.hist(data1, alpha=0.5, label='std=1')
plt.hist(data2, alpha=0.5, label='std=2')
plt.legend()
plt.title("Comparison of Standard Deviations")
plt.show()

# 22. Range and IQR
data = np.random.normal(50, 10, 1000)
range_val = np.max(data) - np.min(data)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
print(f"Range: {range_val}, IQR: {iqr}")

# 23. Z-score normalization
normalized = (data - np.mean(data)) / np.std(data)
plt.hist(normalized, bins=30)
plt.title("Z-score Normalized Data")
plt.show()

# 24. Skewness and kurtosis
from scipy.stats import skew, kurtosis
print(f"Skewness: {skew(data)}")
print(f"Kurtosis: {kurtosis(data)}")


In [None]:
# 01. Z-test for comparing sample mean to population mean
from scipy.stats import norm
sample = np.random.normal(102, 10, 50)
pop_mean = 100
sample_mean = np.mean(sample)
sample_std = np.std(sample, ddof=1)
se = sample_std / np.sqrt(len(sample))
z_score = (sample_mean - pop_mean) / se
p_value = 2 * (1 - norm.cdf(abs(z_score)))
print(f"Z-Score: {z_score}, P-Value: {p_value}")

# 02. Hypothesis testing with random data
sample = np.random.normal(50, 5, 100)
pop_mean = 52
z_score = (np.mean(sample) - pop_mean) / (np.std(sample, ddof=1) / np.sqrt(len(sample)))
p_value = 2 * (1 - norm.cdf(abs(z_score)))
print(f"Z-Test Result -> Z-score: {z_score}, P-value: {p_value}")

# 03. One-sample Z-test
from scipy.stats import zscore
def one_sample_z_test(data, pop_mean):
    sample_mean = np.mean(data)
    se = np.std(data, ddof=1) / np.sqrt(len(data))
    z = (sample_mean - pop_mean) / se
    p = 2 * (1 - norm.cdf(abs(z)))
    return z, p
sample = np.random.normal(98, 8, 40)
z, p = one_sample_z_test(sample, 100)
print(f"One-sample Z-test: Z={z}, P={p}")

# 04. Two-tailed Z-test with visualization
x_vals = np.linspace(-4, 4, 1000)
y_vals = norm.pdf(x_vals)
z = 2.2
plt.plot(x_vals, y_vals)
plt.fill_between(x_vals, 0, y_vals, where=(x_vals < -1.96) | (x_vals > 1.96), color='red', alpha=0.5)
plt.axvline(z, color='blue', linestyle='--')
plt.title("Two-tailed Z-test Decision Region")
plt.show()

# 05. Type 1 and Type 2 errors
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 1000)
power = norm.pdf(x, loc=2)
null = norm.pdf(x)
plt.plot(x, null, label='H0: mu=0')
plt.plot(x, power, label='H1: mu=2')
plt.fill_between(x, 0, null, where=(x > 1.96), color='red', alpha=0.5, label='Type I Error')
plt.fill_between(x, 0, power, where=(x < 1.96), color='blue', alpha=0.5, label='Type II Error')
plt.legend()
plt.title("Type I and Type II Errors")
plt.show()

# 06. Independent T-test
from scipy.stats import ttest_ind
group1 = np.random.normal(100, 15, 30)
group2 = np.random.normal(105, 15, 30)
t_stat, p_val = ttest_ind(group1, group2)
print(f"Independent T-test: t={t_stat}, p={p_val}")

# 07. Paired T-test
from scipy.stats import ttest_rel
before = np.random.normal(100, 10, 30)
after = before + np.random.normal(2, 5, 30)
t_stat, p_val = ttest_rel(before, after)
plt.hist(before, alpha=0.5, label='Before')
plt.hist(after, alpha=0.5, label='After')
plt.legend()
plt.title("Paired Sample Comparison")
plt.show()
print(f"Paired T-test: t={t_stat}, p={p_val}")

# 08. Compare Z-test and T-test
sample = np.random.normal(50, 10, 30)
pop_mean = 52
z_score = (np.mean(sample) - pop_mean) / (np.std(sample, ddof=1) / np.sqrt(len(sample)))
from scipy.stats import ttest_1samp
t_stat, t_p = ttest_1samp(sample, pop_mean)
print(f"Z-test: Z={z_score}, T-test: t={t_stat}, p={t_p}")

# 09. Confidence interval function
def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    se = np.std(data, ddof=1) / np.sqrt(len(data))
    margin = norm.ppf((1 + confidence) / 2) * se
    return mean - margin, mean + margin
sample = np.random.normal(60, 8, 100)
print(f"Confidence Interval: {confidence_interval(sample)}")

# 10. Margin of error
sample = np.random.normal(100, 15, 50)
se = np.std(sample, ddof=1) / np.sqrt(len(sample))
margin_error = 1.96 * se
print(f"Margin of Error (95%): {margin_error}")

# 11. Bayesian Inference using Bayes' Theorem
def bayes_theorem(prior_A, prior_B, likelihood_A, likelihood_B):
    evidence = prior_A * likelihood_A + prior_B * likelihood_B
    posterior_A = (prior_A * likelihood_A) / evidence
    posterior_B = (prior_B * likelihood_B) / evidence
    return posterior_A, posterior_B
posterior = bayes_theorem(0.4, 0.6, 0.7, 0.2)
print(f"Posterior Probabilities: A={posterior[0]}, B={posterior[1]}")

# 12. Chi-square test for independence
import pandas as pd
from scipy.stats import chi2_contingency
data = pd.DataFrame([[10, 20], [20, 40]])
chi2, p, dof, expected = chi2_contingency(data)
print(f"Chi-square Test: chi2={chi2}, p={p}")

# 13. Expected frequencies
observed = np.array([[30, 10], [20, 40]])
_, _, _, expected = chi2_contingency(observed)
print("Expected Frequencies:\n", expected)

# 14. Goodness-of-fit test
observed = np.array([18, 22, 20, 25, 15])
expected = np.array([20] * 5)
from scipy.stats import chisquare
chi_stat, p_val = chisquare(f_obs=observed, f_exp=expected)
print(f"Goodness-of-fit: chi2={chi_stat}, p={p_val}")
