#Question 1 : What are Type I and Type II errors in hypothesis testing, and how do they impact decision-making?
  -**Type I Error (False Positive)**

Definition: Rejecting the null hypothesis (H₀) when it is actually true.

**Analogy**: Concluding there is an effect or difference when none really exists.

**Probability**: Denoted by α (alpha), also called the significance level.

Common choices: 0.05, 0.01.

Impact on decisions:

You incorrectly claim a discovery or change.

Example: Approving a drug as effective when it's actually ineffective.

**Type II Error (False Negative)**

Definition: Failing to reject the null hypothesis when it is actually false.

Analogy: Concluding there is no effect or difference when one actually exists.

Probability: Denoted by β (beta).

Power of a test:

Power = 1 – β

Measures the ability to detect an actual effect.

Impact on decisions:

You miss a real discovery.

Example: Rejecting a truly effective drug because the study didn’t detect its effect.

**Decision-Making Implications:**

High stakes for false positives → use very low α
(e.g., medical trials, criminal justice)

High stakes for false negatives → aim for high power
(e.g., detecting diseases, safety testing)

Balanced decisions require choosing α and sample size based on costs and risks.


#Question 2:What is the P-value in hypothesis testing, and how should it be interpreted in the context of the null hypothesis?
   -The P-value is the probability of observing data at least as extreme as what you actually observed, assuming that the null hypothesis is true.

Formally:

P-value = P(data as extreme as observed | H₀ is true)

It does not tell you the probability that the null hypothesis is true or false.

##How to Interpret the P-Value
**1. Small P-value (typically ≤ α)**

Means the observed result would be very unlikely if the null hypothesis were true.

Therefore, the data provide strong evidence against H₀.

Decision: Reject H₀.

**2. Large P-value (> α)**

Means the observed data are compatible with the null hypothesis.

There is not enough evidence to reject H₀.

Decision: Fail to reject H₀ (but this does not prove H₀ is true).

#Question 3:Explain the difference between a Z-test and a T-test, including when to use each.
   - A Z-test and a T-test are both statistical tests used to compare sample data to a population or compare two groups—but they differ in the assumptions they require and the situations in which they are appropriate.

   When to Use Each Test
**Use a Z-test when:**

✔ The population standard deviation (σ) is known
✔ The sample size is large (n ≥ 30)
✔ Data are approximately normally distributed or sample is large
✔ You are testing proportions (single or two-sample)

Examples:

Quality control with known process variation

**Use a T-test when:**

✔ The population standard deviation (σ) is unknown (most cases)
✔ The sample size is small (n < 30)
✔ The data come from a normally distributed population
✔ You want to compare means

Examples:

Comparing average test scores from two small classrooms

#Question 4:What is a confidence interval, and how does the margin of error influence its width and interpretation?
   - Confidence interval (CI) provides a range of values within which the true population parameter (such as a mean or proportion) is likely to fall, based on sample data.

   **Margin of Error Influences the Confidence Interval**
1. Larger Margin of Error → Wider Confidence Interval

More uncertainty about the estimate

Less precise, but more likely to capture the true parameter

2. Smaller Margin of Error → Narrower Confidence Interval

More precision

Less uncertainty

#Question 5: Describe the purpose and assumptions of an ANOVA test. How does it extend hypothesis testing to more than two groups?
   - ANOVA (Analysis of Variance) is a statistical method used to compare three or more group means to determine whether at least one group is significantly different from the others.

   **ANOVA Extends Hypothesis Testing Beyond Two Groups**

A two-sample t-test can only compare two means.

If you compare 3+ groups using multiple t-tests, the overall Type I error rate rises.

Example:
Comparing 4 groups → 6 pairwise t-tests → much higher chance of false positives.

ANOVA solves this problem by testing all groups simultaneously with one overall test, keeping the error rate controlled.

#Question 6: Write a Python program to perform a one-sample Z-test and interpret the result for a given dataset.
   - import numpy as np
from scipy.stats import norm

# ----------------------------------------
# Sample Data (replace with your dataset)
# ----------------------------------------
data = np.array([52, 47, 55, 49, 50, 53, 51, 48, 54, 50])

# Known population mean (H0)
mu_0 = 50  

# Known population standard deviation (σ)
sigma = 3  

# Significance level
alpha = 0.05

# ----------------------------------------
# Z-test Calculation
# ----------------------------------------
sample_mean = np.mean(data)
n = len(data)

# Standard error
se = sigma / np.sqrt(n)

# Z-statistic
z_stat = (sample_mean - mu_0) / se

# P-value (two-tailed)
p_value = 2 * (1 - norm.cdf(abs(z_stat)))

# ----------------------------------------
# Output
# ----------------------------------------
print("Sample Mean:", sample_mean)
print("Z-statistic:", z_stat)
print("P-value:", p_value)

# ----------------------------------------
# Interpretation
# ----------------------------------------
if p_value < alpha:
    print(f"\nSince p < {alpha}, we reject the null hypothesis.")
    print("Interpretation: The sample provides evidence that the population mean is different from", mu_0)
else:
    print(f"\nSince p ≥ {alpha}, we fail to reject the null hypothesis.")
    print("Interpretation: The sample does NOT provide sufficient evidence that the population mean differs from", mu_0)

 **Interpretation**

If p < α → reject H₀
Evidence shows mean is different from μ₀

If p ≥ α → fail to reject H₀
Not enough evidence to claim a difference

#Question 7:Simulate a dataset from a binomial distribution (n = 10, p = 0.5) using NumPy and plot the histogram.
   - import numpy as np
import matplotlib.pyplot as plt

# Simulation parameters
n = 10      # number of trials
p = 0.5     # probability of success
size = 1000 # number of simulated observations

# Simulate binomial dataset
data = np.random.binomial(n=n, p=p, size=size)

# Plot histogram
plt.hist(data, bins=np.arange(-0.5, n+1.5, 1), edgecolor='black')
plt.title("Histogram of Binomial(n=10, p=0.5) Data")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()

#Question 8: Generate multiple samples from a non-normal distribution and implement the Central Limit Theorem using Python
   - Python Code: CLT Simulation Using a Non-Normal Distribution
We will:

Use an exponential distribution (highly skewed → definitely non-normal)

Draw many samples of various sizes (n = 5, 30, 100)

Plot the distribution of sample means

Show how these means approach normality as n increases

          import numpy as np
import matplotlib.pyplot as plt

# -----------------------------------
# Step 1: Generate data from a non-normal distribution
# -----------------------------------
np.random.seed(42)

# Exponential distribution (skewed)
population = np.random.exponential(scale=1, size=100000)

# -----------------------------------
# Step 2: Function to generate sample means
# -----------------------------------
def sample_means(population, sample_size, num_samples=1000):
    means = []
    for _ in range(num_samples):
        sample = np.random.choice(population, size=sample_size, replace=True)
        means.append(np.mean(sample))
    return np.array(means)

# Sample sizes for CLT demonstration
sizes = [5, 30, 100]

# -----------------------------------
# Step 3: Generate sample means
# -----------------------------------
means_n5 = sample_means(population, 5)
means_n30 = sample_means(population, 30)
means_n100 = sample_means(population, 100)

# -----------------------------------
# Step 4: Plot histograms
# -----------------------------------
plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.hist(population, bins=50, color='gray', edgecolor='black')
plt.title("Original Population (Exponential Distribution)")
plt.xlabel("Value")
plt.ylabel("Frequency")

plt.subplot(2, 2, 2)
plt.hist(means_n5, bins=30, color='blue', edgecolor='black')
plt.title("Sampling Distribution (n = 5)")
plt.xlabel("Sample Mean")

plt.subplot(2, 2, 3)
plt.hist(means_n30, bins=30, color='green', edgecolor='black')
plt.title("Sampling Distribution (n = 30)")
plt.xlabel("Sample Mean")

plt.subplot(2, 2, 4)
plt.hist(means_n100, bins=30, color='red', edgecolor='black')
plt.title("Sampling Distribution (n = 100)")
plt.xlabel("Sample Mean")

plt.tight_layout()
plt.show()

#Question 9: Write a Python function to calculate and visualize the confidence interval for a sample mean.
   - import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t, norm

def plot_confidence_interval(data, confidence=0.95, use_t=True):
    """
    Calculate and visualize the confidence interval for the sample mean.
    
    Parameters:
        data (array-like): Sample dataset
        confidence (float): Confidence level (default = 0.95)
        use_t (bool): If True, use t-distribution; if False, use z-distribution
        
    Returns:
        (lower_bound, upper_bound): The confidence interval
    """
    
    data = np.array(data)
    n = len(data)
    mean = np.mean(data)
    std = np.std(data, ddof=1)   # sample standard deviation
    
    # Standard error
    se = std / np.sqrt(n)
    
    # Critical value
    if use_t:
        critical = t.ppf((1 + confidence) / 2, df=n-1)
    else:
        critical = norm.ppf((1 + confidence) / 2)
    
    margin_of_error = critical * se
    
    lower = mean - margin_of_error
    upper = mean + margin_of_error
    
    # Print results
    print(f"Sample Mean: {mean:.4f}")
    print(f"{int(confidence*100)}% Confidence Interval: ({lower:.4f}, {upper:.4f})")
    
    # -----------------------------
    # Visualization
    # -----------------------------
    plt.figure(figsize=(10, 4))
    
    # Plot data points
    plt.scatter(range(n), data, color='gray', alpha=0.7, label="Data Points")
    
    # Plot mean
    plt.axhline(mean, color='blue', linewidth=2, label="Sample Mean")
    
    # Plot CI
    plt.axhline(lower, color='red', linestyle='--', label="CI Lower Bound")
    plt.axhline(upper, color='red', linestyle='--', label="CI Upper Bound")
    
    plt.title(f"{int(confidence*100)}% Confidence Interval for the Mean")
    plt.xlabel("Observation Index")
    plt.ylabel("Value")
    plt.legend()
    plt.tight_layout()
    plt.show()
    
    return lower, upper

# ------------------------------------------------------
# Example usage:
# data = [12, 15, 14, 10, 18, 20, 13, 17, 16]
# plot_confidence_interval(data, confidence=0.95)
# ------------------------------------------------------

#Question 10: Perform a Chi-square goodness-of-fit test using Python to compare observed and expected distributions, and explain the outcome.
   - import numpy as np
from scipy.stats import chisquare

# -------------------------------------
# Example Data
# -------------------------------------
# Observed frequencies in 4 categories
observed = np.array([25, 30, 20, 25])

# Hypothesized expected distribution (uniform in this example)
expected = np.array([25, 25, 25, 25])

# -------------------------------------
# Chi-square Goodness of Fit Test
# -------------------------------------
chi_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square Statistic:", chi_stat)
print("P-value:", p_value)

# -------------------------------------
# Interpretation
# -------------------------------------
alpha = 0.05

if p_value < alpha:
    print("\nResult: Reject the null hypothesis.")
    print("Interpretation: The observed data do NOT match the expected distribution.")
else:
    print("\nResult: Fail to reject the null hypothesis.")
    print("Interpretation: The observed data are consistent with the expected distribution.")

Example Outcome (Typical for This Data)

Chi-square Statistic: 2.0
P-value: 0.5724
Result: Fail to reject the null hypothesis.
Interpretation: The observed data are consistent with the expected distribution.




