Question 1: What is hypothesis testing in statistics?

    Hypothesis testing in statistics is a method used to make decisions or draw conclusions about a population based on sample data.

In simple words:

    It is a way to check whether a claim (assumption) about a population is likely to be true or not.

    Null hypothesis (H₀): The statement we assume to be true at the start (e.g., "There is no difference").

    Alternative hypothesis (H₁): The statement we want to test (e.g., "There is a difference").

    Process: We collect sample data → perform a test → decide whether to "reject H₀" or "fail to reject H₀."

Example:

    A company claims its bulb lasts 1000 hours on average.

    H₀: Mean life = 1000 hours

    H₁: Mean life ≠ 1000 hours

    We test using sample data to check if the claim holds true.

Question 2: What is the null hypothesis, and how does it differ from the alternative
hypothesis?

    The null hypothesis (H₀) is the statement that there is no effect, no difference, or no change. It represents the default assumption.

    The alternative hypothesis (H₁ or Ha) is the statement that there is an effect, a difference, or a change. It is what we want to test or prove.

Difference:

    Null hypothesis (H₀): "Nothing is happening." (e.g., mean = 50)

    Alternative hypothesis (H₁): "Something is happening." (e.g., mean ≠ 50)

Example:

    A medicine is tested to see if it lowers blood pressure.

    H₀: The medicine has no effect on blood pressure.

    H₁: The medicine does affect blood pressure.

Question 3: Explain the significance level in hypothesis testing and its role in deciding
the outcome of a test.

    The significance level (α) in hypothesis testing is the threshold for deciding whether to reject the null hypothesis (H₀).

    It tells us how much risk we are willing to take of making a Type I error (rejecting H₀ when it is actually true).

    Common values of α are 0.05 (5%) or 0.01 (1%).

    If the p-value ≤ α, we reject H₀ (evidence supports H₁).

    If the p-value > α, we fail to reject H₀ (not enough evidence against H₀).

Example:

    If α = 0.05, it means we allow a 5% chance of concluding the medicine works when in reality it doesn’t.

Question 4: What are Type I and Type II errors? Give examples of each.

Type I Error (False Positive):

    Rejecting a true null hypothesis.

    It means you conclude something is true when in reality it is not.

Example:

    A medical test says a healthy person has a disease.

Type II Error (False Negative):

    Failing to reject a false null hypothesis.

    It means you conclude something is false when in reality it is true.

Example:

    A medical test says a sick person is healthy.

In short:

    Type I Error = False Alarm

    Type II Error = Missed Detection

Question 5: What is the difference between a Z-test and a T-test? Explain when to use
each.

Z-test and T-test are both statistical tests used to compare means, but they differ in conditions:

Z-test

    Used when the sample size is large (n > 30).

    Population variance (σ²) or standard deviation (σ) is known.

    Follows the normal distribution.

    Example: Checking if the average marks of 1000 students differ from the national average, when population variance is known.

T-test

    Used when the sample size is small (n ≤ 30).

    Population variance is unknown (we use sample variance instead).

    Follows the t-distribution.

    Example: Comparing the average marks of 20 students in a class with the national average, when population variance is not known.

In short:

    Z-test → Large samples, population variance known.

    T-test → Small samples, population variance unknown.

Question 6: Write a Python program to generate a binomial distribution with n=10 and
p=0.5, then plot its histogram.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10   # number of trials
p = 0.5  # probability of success
size = 1000  # number of samples

# Generate binomial distribution
data = np.random.binomial(n, p, size)

# Plot histogram
plt.hist(data, bins=range(n+2), edgecolor='black', align='left')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()


Explanation:

    np.random.binomial(n, p, size) → generates random numbers following a binomial distribution.

    bins=range(n+2) → makes sure bins cover values from 0 to n (0–10).

    Histogram shows how often each outcome appears.

Question 7: Implement hypothesis testing using Z-statistics for a sample dataset in
Python. Show the Python code and interpret the results.

sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
                        50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
                        50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
                        50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

    We’ll assume the following:

    Null Hypothesis (H₀): Population mean μ = 50

    Alternative Hypothesis (H₁): μ ≠ 50 (two-tailed test)

    Significance Level (α): 0.05

In [None]:
import numpy as np
from scipy.stats import norm

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Hypothesized population mean
mu_0 = 50

# Compute sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)  # sample standard deviation
n = len(sample_data)

# Standard error
se = sample_std / np.sqrt(n)

# Z statistic
z = (sample_mean - mu_0) / se

# Two-tailed p-value
p_value = 2 * (1 - norm.cdf(abs(z)))

# Print results
print("Sample Mean:", round(sample_mean, 3))
print("Sample Std Dev:", round(sample_std, 3))
print("Z-statistic:", round(z, 3))
print("p-value:", round(p_value, 4))

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the Null Hypothesis (H0).")
else:
    print("Fail to Reject the Null Hypothesis (H0).")


Interpretation

    The Z-statistic tells how many standard errors the sample mean is away from the hypothesized mean.

    The p-value indicates the probability of observing such a sample (or more extreme) if the null hypothesis is true.

    If p-value < 0.05, we reject H₀ → sample provides strong evidence that μ ≠ 50.

    If p-value ≥ 0.05, we fail to reject H₀ → sample does not provide enough evidence to conclude μ ≠ 50.

Question 8: Write a Python script to simulate data from a normal distribution and
calculate the 95% confidence interval for its mean. Plot the data using Matplotlib.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Step 1: Simulate data from a normal distribution
np.random.seed(42)  # for reproducibility
data = np.random.normal(loc=50, scale=10, size=200)  # mean=50, std=10, sample size=200

# Step 2: Calculate sample statistics
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)  # sample standard deviation
n = len(data)

# Step 3: Compute 95% confidence interval for the mean
confidence = 0.95
alpha = 1 - confidence
t_critical = stats.t.ppf(1 - alpha/2, df=n-1)  # t-critical value

margin_of_error = t_critical * (sample_std / np.sqrt(n))
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

print(f"Sample Mean: {sample_mean:.2f}")
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")

# Step 4: Plot the data
plt.figure(figsize=(10,6))
plt.hist(data, bins=20, color='skyblue', edgecolor='black', alpha=0.7)
plt.axvline(sample_mean, color='red', linestyle='--', label=f"Mean = {sample_mean:.2f}")
plt.axvline(ci_lower, color='green', linestyle='--', label=f"95% CI Lower = {ci_lower:.2f}")
plt.axvline(ci_upper, color='green', linestyle='--', label=f"95% CI Upper = {ci_upper:.2f}")

plt.title("Histogram of Simulated Normal Data with 95% Confidence Interval")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()


What this does:

    Generates random data from a normal distribution (mean=50, std=10, n=200).

    Calculates the mean and standard deviation of the sample.

    Uses the t-distribution to compute the 95% confidence interval.

    Plots a histogram of the data with vertical lines showing the mean and CI bounds.

Question 9: Write a Python function to calculate the Z-scores from a dataset and
visualize the standardized data using a histogram. Explain what the Z-scores represent
in terms of standard deviations from the mean.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Function to calculate Z-scores
def calculate_z_scores(data):
    mean = np.mean(data)
    std = np.std(data, ddof=1)  # sample std deviation
    z_scores = (data - mean) / std
    return z_scores

# Example dataset (you can replace this with your own)
np.random.seed(0)
data = np.random.normal(loc=100, scale=15, size=200)  # mean=100, std=15

# Calculate Z-scores
z_scores = calculate_z_scores(data)

# Print some results
print(f"Original Mean: {np.mean(data):.2f}, Original Std Dev: {np.std(data, ddof=1):.2f}")
print(f"Z-Scores Mean: {np.mean(z_scores):.2f}, Std Dev: {np.std(z_scores, ddof=1):.2f}")

# Visualize standardized data
plt.figure(figsize=(10,6))
plt.hist(z_scores, bins=20, color='lightcoral', edgecolor='black', alpha=0.7)
plt.axvline(0, color='blue', linestyle='--', label="Mean (0 after standardization)")
plt.title("Histogram of Standardized Data (Z-Scores)")
plt.xlabel("Z-Score")
plt.ylabel("Frequency")
plt.legend()
plt.show()


    Z-score tells you how many standard deviations a data point is away from the mean.

Z = (X - μ) / σ

    Where:

    X = data point

    μ = mean of the dataset

    σ = standard deviation

Interpretation:

    Z = 0 → The data point is exactly at the mean.

    Z = 1 → The data point is 1 standard deviation above the mean.

    Z = -1 → The data point is 1 standard deviation below the mean.

    Extreme Z-scores (e.g., > 3 or < -3) often indicate outliers.