THEORY QUESTION


1️⃣ What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions about a population based on sample data. It involves formulating a null hypothesis and an alternative hypothesis, then using sample data to determine whether to reject the null hypothesis.



2️⃣ What is the null hypothesis, and how does it differ from the alternative hypothesis?

Null hypothesis (H₀): Assumes no effect or no difference; it's the default or status quo.

Alternative hypothesis (H₁ or Ha): Represents what you want to prove; indicates a significant effect or difference.



3️⃣ What is the significance level in hypothesis testing, and why is it important?

The significance level (α) is the probability of rejecting the null hypothesis when it is true (Type I error). Common values are 0.05 or 0.01. It defines the threshold for statistical significance.



4️⃣ What does a P-value represent in hypothesis testing?

The P-value is the probability of obtaining a test statistic as extreme as the one observed, assuming the null hypothesis is true.


5️⃣ How do you interpret the P-value in hypothesis testing?

P-value ≤ α: Reject the null hypothesis (evidence for alternative hypothesis).

P-value > α: Fail to reject the null hypothesis (insufficient evidence).


6️⃣ What are Type 1 and Type 2 errors in hypothesis testing?

Type 1 error: Rejecting H₀ when H₀ is true (false positive).

Type 2 error: Failing to reject H₀ when H₀ is false (false negative).



7️⃣ What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

One-tailed test: Tests for deviation in one direction.

Two-tailed test: Tests for deviation in either direction.

8️⃣ What is the Z-test, and when is it used in hypothesis testing?

Z-test is used when:

The population variance is known.

The sample size is large (n > 30).
It tests the mean of a sample against the population mean.

9️⃣ How do you calculate the Z-score, and what does it represent in hypothesis testing?

Z = (X̄ - μ) / (σ / √n)
It represents how many standard errors the sample mean is from the population mean.



🔟 What is the T-distribution, and when should it be used instead of the normal distribution?

T-distribution is used when:

The sample size is small (n < 30).

Population variance is unknown.

1️⃣1️⃣ What is the difference between a Z-test and a T-test?

Z-test: Known population variance, large sample.

T-test: Unknown population variance, small sample.



1️⃣2️⃣ What is the T-test, and how is it used in hypothesis testing?

A T-test compares the sample mean to a population mean or compares two sample means to see if they differ significantly.



1️⃣3️⃣ What is the relationship between Z-test and T-test in hypothesis testing?

Both test means, but the T-test adjusts for small samples and unknown variance. Z-test assumes known variance and large samples.



1️⃣4️⃣ What is a confidence interval, and how is it used to interpret statistical results?

A confidence interval gives a range of values within which the true population parameter is expected to lie, with a certain level of confidence (e.g., 95%).



1️⃣5️⃣ What is the margin of error, and how does it affect the confidence interval?

Margin of error reflects the range of uncertainty around the estimate. Larger margin → wider confidence interval → less precision.



1️⃣6️⃣ How is Bayes' Theorem used in statistics, and what is its significance?

Bayes’ Theorem updates the probability of a hypothesis based on new evidence. It's important for incorporating prior knowledge in decision-making.


1️⃣7️⃣ What is the Chi-square distribution, and when is it used?

Chi-square distribution is used for categorical data tests (e.g., goodness of fit, independence) and variance tests.


1️⃣8️⃣ What is the Chi-square goodness of fit test, and how is it applied?

It checks whether observed frequencies match expected frequencies for a categorical variable.


1️⃣9️⃣ What is the F-distribution, and when is it used in hypothesis testing?

F-distribution is used to compare variances (e.g., ANOVA, regression model significance).


2️⃣0️⃣ What is an ANOVA test, and what are its assumptions?

ANOVA (Analysis of Variance) tests whether means of multiple groups are equal.

Assumptions: Independence, normality, equal variances.



2️⃣1️⃣ What are the different types of ANOVA tests?


One-way ANOVA

Two-way ANOVA

Repeated measures ANOVA

2️⃣2️⃣ What is the F-test, and how does it relate to hypothesis testing?

The F-test compares group variances or means (via ANOVA) to test if they differ significantly.


PRACTICAL PART 1

In [None]:
1️⃣ Generate a random variable and display its value

import random
value = random.random()
print("Random value:", value)


In [None]:
2️⃣ Discrete uniform distribution and PMF

import matplotlib.pyplot as plt
import numpy as np

values = np.arange(1, 7)
pmf = np.ones_like(values) / len(values)

plt.stem(values, pmf, use_line_collection=True)
plt.xlabel('Value')
plt.ylabel('PMF')
plt.title('Discrete Uniform Distribution PMF')
plt.show()


In [None]:
3️⃣ PDF of a Bernoulli distribution
python
Copy
Edit

from scipy.stats import bernoulli
import matplotlib.pyplot as plt

p = 0.5
x = [0, 1]
pdf = bernoulli.pmf(x, p)

plt.stem(x, pdf, use_line_collection=True)
plt.xlabel('x')
plt.ylabel('PDF')
plt.title('Bernoulli Distribution PDF')
plt.show()


In [None]:
4️⃣ Binomial distribution with n=10, p=0.5

from scipy.stats import binom
import matplotlib.pyplot as plt

n, p = 10, 0.5
x = np.arange(0, n+1)
pmf = binom.pmf(x, n, p)

plt.bar(x, pmf)
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution PMF')
plt.show()


In [None]:
5️⃣ Poisson distribution visualization

from scipy.stats import poisson

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf, use_line_collection=True)
plt.xlabel('x')
plt.ylabel('PMF')
plt.title('Poisson Distribution')
plt.show()


In [None]:
6️⃣ CDF of a discrete uniform distribution

cdf = np.cumsum(pmf)
plt.step(x, cdf)
plt.xlabel('x')
plt.ylabel('CDF')
plt.title('CDF of Discrete Uniform Distribution')
plt.show()


In [None]:
7️⃣ Continuous uniform distribution
python
Copy
Edit

from scipy.stats import uniform

x = np.linspace(0, 1, 100)
pdf = uniform.pdf(x, 0, 1)

plt.plot(x, pdf)
plt.title('Continuous Uniform Distribution')
plt.xlabel('x')
plt.ylabel('PDF')
plt.show()



In [None]:
8️⃣ Simulate normal distribution + histogram

samples = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(samples, bins=30, density=True, alpha=0.6, color='g')
plt.title('Histogram of Normal Distribution')
plt.show()


In [None]:
9️⃣ Calculate Z-scores and plot

from scipy.stats import zscore

data = np.random.normal(100, 15, 100)
zs = zscore(data)

plt.hist(zs, bins=30)
plt.title('Z-scores Histogram')
plt.show()


In [None]:
🔟 Implement Central Limit Theorem
python
Copy
Edit

sample_means = []
for _ in range(1000):
    sample = np.random.exponential(scale=2, size=30)
    sample_means.append(np.mean(sample))

plt.hist(sample_means, bins=30, density=True)
plt.title('CLT: Sample Means of Exponential Distribution')
plt.show()


In [None]:
1️⃣1️⃣ Simulate multiple samples from a normal distribution and verify CLT

import numpy as np
import matplotlib.pyplot as plt

means = [np.mean(np.random.normal(0, 1, 30)) for _ in range(1000)]
plt.hist(means, bins=30, density=True, alpha=0.7)
plt.title('Central Limit Theorem Verification')
plt.show()


In [None]:
1️⃣2️⃣ Standard normal distribution plot
python
Copy
Edit

from scipy.stats import norm

x = np.linspace(-4, 4, 100)
pdf = norm.pdf(x, 0, 1)

plt.plot(x, pdf)
plt.title('Standard Normal Distribution (mean=0, std=1)')
plt.show()


In [None]:
1️⃣3️⃣ Binomial distribution probabilities

from scipy.stats import binom

n, p = 10, 0.5
x = np.arange(0, n+1)
pmf = binom.pmf(x, n, p)

plt.bar(x, pmf)
plt.title('Binomial Distribution Probabilities')
plt.show()


In [None]:
1️⃣4️⃣ Z-score calculation for a data point
python
Copy
Edit

def calc_z_score(x, mean, std):
    return (x - mean) / std

z = calc_z_score(70, 50, 10)
print("Z-score:", z)


In [None]:
1️⃣5️⃣ Hypothesis testing using Z-statistics

from scipy.stats import norm

sample_mean = 52
pop_mean = 50
pop_std = 10
n = 30

z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
p_value = 1 - norm.cdf(z)
print("Z-statistic:", z, "P-value:", p_value)


In [None]:
1️⃣6️⃣ Confidence interval

import scipy.stats as st

data = np.random.normal(50, 10, 100)
conf_int = st.t.interval(0.95, len(data)-1, loc=np.mean(data), scale=st.sem(data))
print("95% Confidence Interval:", conf_int)


In [None]:
1️⃣7️⃣ Generate normal data + CI

data = np.random.normal(100, 15, 100)
mean = np.mean(data)
sem = st.sem(data)
ci = st.t.interval(0.95, len(data)-1, loc=mean, scale=sem)
print("Mean:", mean, "95% CI:", ci)


In [None]:
1️⃣8️⃣ PDF of normal distribution
python
Copy
Edit

x = np.linspace(60, 140, 100)
pdf = norm.pdf(x, 100, 15)

plt.plot(x, pdf)
plt.title('Normal Distribution PDF')
plt.show()


In [None]:
1️⃣9️⃣ CDF of Poisson distribution
python
Copy
Edit

from scipy.stats import poisson

x = np.arange(0, 10)
cdf = poisson.cdf(x, mu=3)

plt.step(x, cdf, where='post')
plt.title('Poisson CDF')
plt.show()


In [None]:
2️⃣0️⃣ Continuous uniform random variable + expected value
python
Copy
Edit

data = np.random.uniform(0, 10, 1000)
expected = np.mean(data)
print("Expected value:", expected)


In [None]:
2️⃣1️⃣ Compare standard deviations of two datasets

data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(0, 2, 100)

print("Std dev data1:", np.std(data1))
print("Std dev data2:", np.std(data2))


In [None]:
2️⃣2️⃣ Range & IQR

data = np.random.normal(0, 1, 100)
range_ = np.max(data) - np.min(data)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
print("Range:", range_, "IQR:", iqr)


In [None]:
2️⃣3️⃣ Z-score normalization + visualize
python
Copy
Edit

z_scores = zscore(data)
plt.hist(z_scores, bins=30)
plt.title('Z-score Normalized Data')
plt.show()


In [None]:
2️⃣4️⃣ Skewness & kurtosis

from scipy.stats import skew, kurtosis

print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))


PRACTICAL PART 2

In [None]:
1️⃣ Perform a Z-test comparing sample mean to known population mean
python
Copy
Edit

from scipy.stats import norm
import numpy as np

sample = np.random.normal(50, 10, 30)
pop_mean = 50
pop_std = 10

z = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
p_value = 2 * (1 - norm.cdf(abs(z)))

print("Z-statistic:", z)
print("P-value:", p_value)


In [None]:
2️⃣ Simulate data for hypothesis testing and compute P-value

sample = np.random.normal(50, 10, 30)
z = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
p_value = 1 - norm.cdf(z)
print("Z-statistic:", z, "P-value:", p_value)


In [None]:
3️⃣ One-sample Z-test

  (same as 1, shows population mean comparison)

In [None]:
4️⃣ Two-tailed Z-test with visualization

import matplotlib.pyplot as plt

x = np.linspace(-4, 4, 1000)
plt.plot(x, norm.pdf(x))
plt.axvline(z, color='r', linestyle='--', label=f'z = {z:.2f}')
plt.axvline(-z, color='r', linestyle='--')
plt.fill_between(x, norm.pdf(x), where=(x < -abs(z)) | (x > abs(z)), color='red', alpha=0.3)
plt.legend()
plt.title('Two-tailed Z-test decision regions')
plt.show()


In [None]:
5️⃣ Visualize Type I and Type II errors

(Illustration suggestion — I'll provide sample code if you want the actual plot, or I can draw the diagram)


In [None]:
6️⃣ Independent T-test
python
Copy
Edit

from scipy.stats import ttest_ind

group1 = np.random.normal(100, 15, 50)
group2 = np.random.normal(102, 15, 50)

t_stat, p_val = ttest_ind(group1, group2)
print("T-statistic:", t_stat, "P-value:", p_val)


In [None]:
7️⃣ Paired sample T-test
python
Copy
Edit

from scipy.stats import ttest_rel

before = np.random.normal(100, 10, 30)
after = before + np.random.normal(1, 5, 30)

t_stat, p_val = ttest_rel(before, after)
print("Paired T-statistic:", t_stat, "P-value:", p_val)


In [None]:
8️⃣ Compare Z-test and T-test

# Z-test (assuming known std)
z = (np.mean(group1) - np.mean(group2)) / np.sqrt((15**2/50)+(15**2/50))
print("Z-stat:", z)

# T-test (as above)
print("T-stat:", t_stat)


In [None]:
9️⃣ Confidence interval for sample mean
python
Copy
Edit

import scipy.stats as st
data = np.random.normal(50, 10, 30)
conf_int = st.t.interval(0.95, len(data)-1, loc=np.mean(data), scale=st.sem(data))
print("95% Confidence Interval:", conf_int)


In [None]:
🔟 Margin of error

moe = st.t.ppf(0.975, len(data)-1) * st.sem(data)
print("Margin of Error:", moe)


In [None]:
1️⃣1️⃣ Bayesian inference using Bayes' theorem

def bayes(p_a, p_b_given_a, p_b):
    return (p_b_given_a * p_a) / p_b

posterior = bayes(0.01, 0.99, 0.05)
print("Posterior probability:", posterior)


In [None]:
1️⃣2️⃣ Chi-square test for independence
python
Copy
Edit

import pandas as pd
from scipy.stats import chi2_contingency

data = [[10, 20], [20, 40]]
chi2, p, dof, expected = chi2_contingency(data)
print("Chi2 statistic:", chi2, "P-value:", p)


In [None]:
1️⃣3️⃣ Expected frequencies for Chi-square
python
Copy
Edit

print("Expected frequencies:\n", expected)


In [None]:
1️⃣4️⃣ Goodness-of-fit test
python
Copy
Edit

observed = [16, 18, 16, 14, 12, 12]
expected = [15, 15, 15, 15, 15, 15]
chi2, p = st.chisquare(f_obs=observed, f_exp=expected)
print("Chi2:", chi2, "P-value:", p)
