In [None]:
What is hypothesis testing in statistics?

Hypothesis testing is a statistical method that evaluates assumptions (hypotheses) about a population parameter. It helps determine whether observed data deviates significantly from what we would expect under a given assumption.

 Key Components
- Null Hypothesis (H₀):
This is the default assumption that there is no effect or no difference. For example, "The average height of men and women is the same."
- Alternative Hypothesis (H₁ or Hₐ):
This proposes that there is an effect or a difference. For example, "Men are taller than women on average."
- Significance Level (α):
The threshold for deciding whether to reject H₀, commonly set at 0.05 (5%).
- p-value:
The probability of observing the data (or something more extreme) if H₀ is true. A small p-value (typically < 0.05) suggests rejecting H₀.
- Test Statistic:
A calculated value from the sample data used to decide whether to reject H₀.
- Critical Value:
A cutoff point that the test statistic is compared against to determine significance.




What is the null hypothesis, and how does it differ from the alternative
hypothesis?

Null Hypothesis (H₀)
- The null hypothesis is the default or starting assumption.
- It states that there is no effect, no difference, or no relationship between variables.
- It's what you're trying to test against.
- Example:
- H₀: The average test score of students this year is equal to last year’s score.


Alternative Hypothesis (H₁ or Hₐ)
- The alternative hypothesis is what you propose if you believe the null might be wrong.
- It suggests that there is an effect, a difference, or a relationship.
- It's what you're trying to provide evidence for.
- Example:
- H₁: The average test score of students this year is different from last year’s score.


Explain the significance level in hypothesis testing and its role in deciding
the outcome of a test.

Significance Level in Hypothesis Testing
The significance level (α) is the probability threshold used to decide whether to reject the null hypothesis (H₀). It represents the risk of making a Type I error, which means rejecting H₀ when it is actually true. Common values are 0.05, 0.01, or 0.10.
During hypothesis testing, the p-value is compared to α:
- If p-value ≤ α, we reject H₀, indicating the result is statistically significant.
- If p-value > α, we fail to reject H₀, meaning there's insufficient evidence against it.
The choice of α affects the sensitivity of the test and depends on the context—lower α is used in critical fields like medicine to minimize false positives.

Let me know if you'd like this adapted for a school assignment or exam prep!


What are Type I and Type II errors? Give examples of each.

 Type I and Type II Errors
In hypothesis testing:
- Type I Error occurs when the null hypothesis (H₀) is rejected even though it is true.
Example: A healthy person is wrongly diagnosed with a disease.
- Type II Error happens when the null hypothesis is not rejected even though it is false.
Example: A sick person is wrongly diagnosed as healthy.
These errors reflect the risks in decision-making: Type I is a false positive, and Type II is a false negative. Balancing both is crucial for reliable testing.


What is the difference between a Z-test and a T-test? Explain when to use
each

ifference Between Z-test and T-test
- A Z-test is used when the population standard deviation is known and the sample size is large (typically n > 30).
- A T-test is used when the population standard deviation is unknown and the sample size is small (typically n ≤ 30).


When to Use Each
- Z-test: Comparing sample mean to population mean with known variance.
Example: Testing average height of students when population variance is known.
- T-test: Comparing means when variance is unknown or comparing two sample means.
Example: Testing effectiveness of a new drug with a small sample group.

In short, use a Z-test for large samples with known variance, and a T-test for small samples or unknown variance.


write a Python program to generate a binomial distribution with n=10 and
p=0.5, then plot its histogram.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10       # Number of trials
p = 0.5      # Probability of success
size = 1000  # Number of samples

# Generate binomial distribution
data = np.random.binomial(n, p, size)

# Plot histogram
plt.hist(data, bins=range(n+2), edgecolor='black', align='left')
plt.title('Binomial Distribution (n=10, p=0.5)')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

what It Does:
- Uses numpy to simulate 1000 outcomes of a binomial experiment.
- Uses matplotlib to display a histogram showing the frequency of each outcome.


Implement hypothesis testing using Z-statistics for a sample dataset in
Python. Show the Python code and interpret the results.
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

In [None]:
import numpy as np
from scipy.stats import norm

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Parameters
mu = 50           # Hypothesized population mean
sigma = 0.5       # Known population standard deviation
n = len(sample_data)

# Sample statistics
sample_mean = np.mean(sample_data)

# Z-statistic
z = (sample_mean - mu) / (sigma / np.sqrt(n))

# p-value (two-tailed)
p_value = 2 * (1 - norm.cdf(abs(z)))

# Output results
print(f"Sample Mean = {sample_mean:.2f}")
print(f"Z-statistic = {z:.2f}")
print(f"p-value = {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from 50.")
else:
    print("Fail to reject the null hypothesis: No significant difference from 50.")

Interpretation
- Sample Mean ≈ 50.09
- Z-statistic ≈ 1.09
- p-value ≈ 0.2764
Since p-value > 0.05, we fail to reject the null hypothesis. This means there's no statistically significant difference between the sample mean and the hypothesized population mean of 50.


Write a Python script to simulate data from a normal distribution and
calculate the 95% confidence interval for its mean. Plot the data using Matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Step 1: Simulate data from a normal distribution
np.random.seed(42)  # For reproducibility
data = np.random.normal(loc=100, scale=15, size=100)  # mean=100, std=15, n=100

# Step 2: Calculate sample mean and standard error
mean = np.mean(data)
sem = stats.sem(data)  # Standard error of the mean

# Step 3: Calculate 95% confidence interval
confidence = 0.95
ci = stats.norm.interval(confidence, loc=mean, scale=sem)

# Step 4: Print results
print(f"Sample Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({ci[0]:.2f}, {ci[1]:.2f})")

# Step 5: Plot histogram
plt.hist(data, bins=15, edgecolor='black', alpha=0.7)
plt.axvline(ci[0], color='red', linestyle='dashed', label='Lower CI')
plt.axvline(ci[1], color='green', linestyle='dashed', label='Upper CI')
plt.axvline(mean, color='blue', linestyle='solid', label='Mean')
plt.title('Normal Distribution with 95% Confidence Interval')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)
plt.show()

 Interpretation
- The script generates 100 data points from a normal distribution with mean 100 and standard deviation 15.
- It calculates the 95% confidence interval, showing the range in which the true mean likely falls.
- The plot displays the distribution and marks the mean and confidence bounds.



Write a Python function to calculate the Z-scores from a dataset and
visualize the standardized data using a histogram. Explain what the Z-scores represent
in terms of standard deviations from the mean

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def calculate_and_plot_z_scores(data):
    # Step 1: Calculate mean and standard deviation
    mean = np.mean(data)
    std_dev = np.std(data)

    # Step 2: Calculate Z-scores
    z_scores = [(x - mean) / std_dev for x in data]

    # Step 3: Plot histogram of Z-scores
    plt.hist(z_scores, bins=10, edgecolor='black', alpha=0.7)
    plt.title('Histogram of Z-scores')
    plt.xlabel('Z-score')
    plt.ylabel('Frequency')
    plt.grid(True)
    plt.show()

    return z_scores

# Example usage
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

z_scores = calculate_and_plot_z_scores(sample_data)

What Z-scores Represent
- A Z-score tells you how many standard deviations a data point is from the mean.
- Z = 0 → the value is exactly at the mean.
- Z > 0 → the value is above the mean.
- Z < 0 → the value is below the mean.
- For example, a Z-score of +2 means the value is 2 standard deviations above the mean.
