# ANSWERS OF THEORY QUESTIONS.

Q1: What is hypothesis testing in statistics?

Answer:
Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data. It involves:

1. Formulating a null hypothesis (H₀) and an alternative hypothesis (H₁).


2. Choosing a significance level (α).


3. Using a test statistic (Z, T, χ², etc.).


4. Comparing the test statistic with a critical value or p-value to accept or reject H₀.



It is widely used in scientific research, business, and data analysis to validate assumptions.


---

Q2: What is the null hypothesis, and how does it differ from the alternative hypothesis?

Answer:

Null Hypothesis (H₀): Assumes no effect, no difference, or status quo. Example: “The average weight of apples = 100 g.”

Alternative Hypothesis (H₁): Contradicts H₀, representing the claim to be tested. Example: “The average weight of apples ≠ 100 g.”


Difference: H₀ represents the default assumption, while H₁ suggests there is a significant effect or difference.


---

Q3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.

Answer:
The significance level (α) is the probability of rejecting the null hypothesis when it is actually true. Common choices are 0.05 (5%) or 0.01 (1%).

If p-value < α, reject H₀.

If p-value ≥ α, fail to reject H₀.


It acts as a threshold for decision-making in hypothesis testing.


---

Q4: What are Type I and Type II errors? Give examples of each.

Answer:

Type I Error (False Positive): Rejecting H₀ when it is true.
Example: Concluding a new medicine works when it actually doesn’t.

Type II Error (False Negative): Failing to reject H₀ when H₁ is true.
Example: Concluding a medicine does not work when it actually does.



---

Q5: What is the difference between a Z-test and a T-test? Explain when to use each.

Answer:

Z-test: Used when population variance is known or sample size is large (n > 30). Based on standard normal distribution.

T-test: Used when population variance is unknown and sample size is small (n ≤ 30). Based on Student’s t-distribution.



---

Q6: Python program for binomial distribution

import numpy as np
import matplotlib.pyplot as plt

# Parameters
n, p = 10, 0.5

# Generate binomial distribution
data = np.random.binomial(n, p, 1000)

# Plot histogram
plt.hist(data, bins=range(0, n+2), edgecolor='black', density=True)
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.show()


---

Q7: Hypothesis testing using Z-statistics

import numpy as np
from scipy import stats

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Hypothesis: population mean = 50
mu0 = 50
mean = np.mean(sample_data)
std = np.std(sample_data, ddof=1)
n = len(sample_data)

# Z-statistic
z = (mean - mu0) / (std / np.sqrt(n))

# p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z)))

print("Sample Mean:", mean)
print("Z-statistic:", z)
print("p-value:", p_value)

Interpretation:

If p-value < 0.05, reject H₀ (mean ≠ 50).

If p-value ≥ 0.05, fail to reject H₀ (mean ≈ 50).



---

Q8: Python script for normal distribution & confidence interval

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt

# Simulate normal distribution
data = np.random.normal(loc=100, scale=15, size=500)

# Mean & 95% confidence interval
mean = np.mean(data)
sem = st.sem(data)
ci = st.t.interval(0.95, len(data)-1, loc=mean, scale=sem)

print("Mean:", mean)
print("95% CI:", ci)

# Plot histogram
plt.hist(data, bins=30, edgecolor='black', density=True)
plt.title("Normal Distribution with 95% CI")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


---

Q9: Python function to calculate Z-scores & histogram

import numpy as np
import matplotlib.pyplot as plt

def calculate_zscores(data):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = [(x - mean) / std for x in data]
    return z_scores

# Example dataset
data = [10, 12, 13, 9, 15, 18, 14, 16, 12, 11]
z_scores = calculate_zscores(data)

# Plot histogram
plt.hist(z_scores, bins=10, edgecolor='black')
plt.title("Z-scores Histogram")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()

print("Z-scores:", z_scores)

Explanation:

A Z-score tells how many standard deviations a value is from the mean.

Z = 0 → exactly the mean.

Z = +2 → 2 SDs above the mean.

Z = -1 → 1 SD below the mean.