Question 1: What is hypothesis testing in statistics?

Answer:
Hypothesis testing is a statistical method used to make decisions or draw conclusions about a population based on sample data. It helps determine whether there is enough evidence to support a certain belief or claim (called a hypothesis). The process involves comparing sample data with what we would expect if a specific claim about the population were true.

Question 2: What is the null hypothesis, and how does it differ from the alternative hypothesis?

Answer:

The null hypothesis (H₀) is a statement that there is no effect, no difference, or that a parameter equals a specific value. It is the default assumption.

The alternative hypothesis (H₁ or Ha) is a statement that contradicts the null hypothesis, suggesting that there is an effect or difference.

Difference:
H₀ assumes the status quo is true, while H₁ represents what the researcher wants to prove.

Question 3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.

Answer:
The significance level (α) is the probability of rejecting the null hypothesis when it is actually true.
Common values are 0.05, 0.01, 0.10.

Role:

If the p-value ≤ α → reject H₀

If the p-value > α → fail to reject H₀

It sets the threshold for how strong the evidence must be to reject the null hypothesis.

Question 4: What are Type I and Type II errors? Give examples of each.

Answer:

Type I Error (False Positive): Rejecting H₀ when it is true.
Example: Concluding that a new medicine works when it actually does not.

Type II Error (False Negative): Failing to reject H₀ when it is false.
Example: Concluding that a medicine does not work when it actually does.

Question 5: Difference between Z-test and T-test — When to use each?

Answer:

Feature	Z-test	T-test
Population standard deviation	Known	Unknown
Sample size	Large (n ≥ 30)	Small (n < 30)
Distribution	Normal	t-distribution
Use case	Compare sample mean to population mean	Same, but when σ is not known

Use Z-test when:

Population standard deviation is known

Sample size is large

Use T-test when:

Population standard deviation is unknown

Sample size is small



In [None]:
Question 6: Python program – Generate a binomial distribution and plot histogram
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10
p = 0.5

# Generate 1000 random binomial numbers
data = np.random.binomial(n, p, 1000)

print("First 20 generated values:")
print(data[:20])

# Plot histogram
plt.hist(data, bins=11)
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.show()


Output (sample):

First 20 generated values:
[6 4 5 5 3 7 4 5 6 5 4 6 5 6 5 6 4 6 3 5]

Question 7: Hypothesis testing using Z-statistics in Python

Test if the population mean is µ = 50 using the given sample.

import numpy as np
from scipy.stats import norm

sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

data = np.array(sample_data)

# Known population standard deviation (assume σ = 1 for Z-test)
sigma = 1
mu0 = 50

sample_mean = np.mean(data)
n = len(data)

# Z-statistic
Z = (sample_mean - mu0) / (sigma / np.sqrt(n))

# p-value
p_value = 2 * (1 - norm.cdf(abs(Z)))

print("Sample Mean:", sample_mean)
print("Z-statistic:", Z)
print("p-value:", p_value)

Sample Output:

Sample Mean: 50.078
Z-statistic: 0.481
p-value: 0.630

Interpretation:

Since p-value > 0.05, we fail to reject H₀.
There is not enough evidence to say the population mean is different from 50.

Question 8: Simulate normal distribution, compute 95% confidence interval, and plot
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate normal data
data = np.random.normal(loc=50, scale=5, size=500)

mean = np.mean(data)
std = np.std(data, ddof=1)

# 95% CI
z = 1.96
lower = mean - z * std / np.sqrt(len(data))
upper = mean + z * std / np.sqrt(len(data))

print("Sample Mean:", mean)
print("95% Confidence Interval:", (lower, upper))

# Plot
plt.hist(data, bins=30)
plt.title("Normal Distribution Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


 Sample Output:

Sample Mean: 50.12
95% Confidence Interval: (49.67, 50.57)

Question 9: Python program to calculate Z-scores and plot histogram
import numpy as np
import matplotlib.pyplot as plt

# Sample dataset
data = np.random.randint(40, 60, 100)

# Compute Z-scores
mean = np.mean(data)
std = np.std(data)
z_scores = (data - mean) / std

print("First 20 Z-scores:")
print(z_scores[:20])

# Plot histogram of Z-scores
plt.hist(z_scores, bins=20)
plt.title("Z-score Standardized Data")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()


Sample Output:

First 20 Z-scores:
[-0.91 -0.56 -0.21  0.48  0.97 ...]

Question 9 Explanation:

Z-scores tell how many standard deviations each data point is from the mean:

Z = 0 → exactly at mean

Z = +1 → 1 SD above mean

Z = –2 → 2 SD below mean