## STATISTICS ASSIGNMENT PART 2



# 1. What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions or inferences about a population based on sample data. It involves formulating two opposing hypotheses and using sample data to determine which one is more likely to be true.

---

#2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

 **Null Hypothesis (H₀):** The default assumption that there is no effect or no difference.
 **Alternative Hypothesis (H₁ or Ha):** Suggests that there is an effect or a difference.

**Example:**

* H₀: μ = 50 (the population mean is 50)
* Ha: μ ≠ 50 (the population mean is not 50)

---

#3. What is the significance level in hypothesis testing, and why is it important?

* **Significance level (α):** The threshold for deciding whether to reject the null hypothesis, commonly set at 0.05 (5%).
* It represents the probability of making a **Type I error** (rejecting a true null hypothesis).

---

#4. What does a P-value represent in hypothesis testing?

The **P-value** is the probability of observing the sample results, or more extreme, if the null hypothesis is true. It helps quantify the evidence against the null hypothesis.

---

#5. How do you interpret the P-value in hypothesis testing?

 **P ≤ α:** Reject H₀ (evidence suggests the effect is significant)
**P > α:** Fail to reject H₀ (not enough evidence to support Ha)

---

#6. What are Type 1 and Type 2 errors in hypothesis testing?

* **Type I Error (False Positive):** Rejecting H₀ when it is actually true. Probability = α.
* **Type II Error (False Negative):** Failing to reject H₀ when it is false. Probability = β.

---

#7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

* **One-tailed test:** Tests for an effect in **one direction** (e.g., μ > 50)
* **Two-tailed test:** Tests for an effect in **either direction** (e.g., μ ≠ 50)

---

#8. What is the Z-test, and when is it used in hypothesis testing?

A **Z-test** is used when:

* The population standard deviation is known.
* The sample size is large (n > 30).
  It compares a sample mean to a population mean.

---

#9. How do you calculate the Z-score, and what does it represent in hypothesis testing?

**Z = (X̄ - μ) / (σ / √n)**

* X̄ = sample mean
* μ = population mean
* σ = population standard deviation
* n = sample size

The Z-score tells how many standard deviations the sample mean is from the population mean.

---

#10. What is the T-distribution, and when should it be used instead of the normal distribution?

The **T-distribution** is used when:

* The population standard deviation is unknown.
* The sample size is small (n < 30).
  It’s similar to the normal distribution but has heavier tails.

---

#11. What is the difference between a Z-test and a T-test?

* **Z-test:** Used when σ is known and/or n > 30.
* **T-test:** Used when σ is unknown and n < 30.
  The T-test uses the sample standard deviation (s) and accounts for more uncertainty.

---

#12. What is the T-test, and how is it used in hypothesis testing?

A **T-test** assesses whether the means of two groups are statistically different.
Types include:

 **One-sample T-test**
 **Two-sample T-test**
 **Paired T-test**

---

#13. What is the relationship between Z-test and T-test in hypothesis testing?

Both are used to test hypotheses about means:

 They follow similar logic and formulas.
 The T-test is a generalization of the Z-test that adjusts for sample variability.

---

#14. What is a confidence interval, and how is it used to interpret statistical results?

A **confidence interval (CI)** estimates the range in which a population parameter likely falls, with a certain level of confidence (e.g., 95%).

**Example:** If a 95% CI for μ is (48, 52), we are 95% confident the true mean lies within that range.

---

#15. What is the margin of error, and how does it affect the confidence interval?

The **margin of error (ME)** defines the width of the confidence interval around the sample estimate.
**CI = X̄ ± ME**
Larger ME → wider interval → less precision.

---

#16. How is Bayes' Theorem used in statistics, and what is its significance?

**Bayes' Theorem** calculates conditional probabilities:
**P(A|B) = \[P(B|A) × P(A)] / P(B)**
It updates beliefs (probabilities) with new evidence and is foundational in **Bayesian statistics**.

---

#17. What is the Chi-square distribution, and when is it used?

A **Chi-square distribution** is used for categorical data, often in tests of independence or goodness-of-fit. It’s based on squared deviations.

---

#18. What is the Chi-square goodness of fit test, and how is it applied?

It checks whether observed frequencies match expected frequencies.
**H₀:** Observed = Expected
**Ha:** Observed ≠ Expected
Test statistic: **χ² = Σ\[(O - E)² / E]**

---

#19. What is the F-distribution, and when is it used in hypothesis testing?

The **F-distribution** is used to compare **variances**. It’s asymmetric and used in **ANOVA** and **regression analysis**.

---

#20. What is an ANOVA test, and what are its assumptions?

**ANOVA (Analysis of Variance)** tests for differences among **3 or more group means**.
**Assumptions:**

* Independence
* Normality
* Homogeneity of variance

---

#21. What are the different types of ANOVA tests?

 **One-way ANOVA:** One independent variable
 **Two-way ANOVA:** Two independent variables
**Repeated measures ANOVA:** Same subjects under different conditions

---

#22. What is the F-test, and how does it relate to hypothesis testing?

An **F-test** compares **two variances** or **multiple group means (via ANOVA)**.
It’s used to test:

* Equality of variances
* Model fit in regression
* Mean differences in ANOVA

---




# PRACTICAL QUESTIONS 1
Great! Here's a comprehensive set of Python scripts addressing each of your listed statistical tasks, complete with visualizations and calculations. Each block includes code and brief comments explaining its purpose.

---

### 1.  Write a Python program to generate a random variable and display its value.

```python
import numpy as np

x = np.random.rand()
print("Random Variable:", x)
```

---

### 2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF).

```python
import matplotlib.pyplot as plt
from scipy.stats import randint

values = randint.rvs(1, 7, size=1000)
pmf_values = [np.mean(values == k) for k in range(1, 7)]

plt.bar(range(1, 7), pmf_values)
plt.title("PMF of Discrete Uniform Distribution (1-6)")
plt.xlabel("Outcome")
plt.ylabel("Probability")
plt.show()
```

---

### 3.  Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution.

```python
from scipy.stats import bernoulli

def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

print("P(X=1):", bernoulli_pdf(0.6, 1))
```

---

### 4.  Create a Poisson distribution and visualize it using Python.

```python
binom_data = np.random.binomial(n=10, p=0.5, size=1000)
plt.hist(binom_data, bins=np.arange(-0.5, 11.5, 1), edgecolor='black', density=True)
plt.title("Binomial Distribution Histogram (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()
```

---

### 5.  Create a Poisson distribution and visualize it using Python.

```python
from scipy.stats import poisson

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.bar(x, pmf)
plt.title("Poisson Distribution (λ = 3)")
plt.xlabel("Events")
plt.ylabel("Probability")
plt.show()
```

---

### 6.Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution.

```python
cdf_values = [randint.cdf(k, 1, 7) for k in range(1, 7)]

plt.step(range(1, 7), cdf_values, where='mid')
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("x")
plt.ylabel("CDF")
plt.grid(True)
plt.show()
```

---

### 7.  Generate a continuous uniform distribution using NumPy and visualize it.

```python
from scipy.stats import uniform

data = uniform.rvs(loc=0, scale=1, size=1000)
plt.hist(data, bins=20, density=True, edgecolor='black')
plt.title("Continuous Uniform Distribution Histogram")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()
```

---

### 8.  Simulate data from a normal distribution and plot its histogram.

```python
norm_data = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(norm_data, bins=30, density=True, edgecolor='black')
plt.title("Normal Distribution Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
```

---

### 9.  Write a Python function to calculate Z-scores from a dataset and plot them.

```python
from scipy.stats import zscore

z_scores = zscore(norm_data)
plt.plot(z_scores, 'o')
plt.title("Z-scores of Data")
plt.xlabel("Index")
plt.ylabel("Z-score")
plt.show()
```

---

### 10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution.

```python
samples = [np.mean(np.random.exponential(scale=2.0, size=30)) for _ in range(1000)]
plt.hist(samples, bins=30, density=True, edgecolor='black')
plt.title("CLT with Exponential Distribution (Sample Mean)")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()
```

---

### 11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem.

```python
samples = [np.mean(np.random.normal(0, 1, 30)) for _ in range(1000)]
plt.hist(samples, bins=30, edgecolor='black', density=True)
plt.title("CLT with Normal Distribution")
plt.show()
```

---

### 12.  Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1).

```python
x = np.linspace(-4, 4, 1000)
pdf = (1/np.sqrt(2 * np.pi)) * np.exp(-0.5 * x**2)

plt.plot(x, pdf)
plt.title("Standard Normal Distribution PDF")
plt.xlabel("x")
plt.ylabel("Density")
plt.grid(True)
plt.show()
```

---

### 13.  Generate random variables and calculate their corresponding probabilities using the binomial distribution.

```python
k = np.arange(0, 11)
p = 0.5
n = 10
probs = [np.random.binomial(n, p) for _ in range(1000)]
plt.hist(probs, bins=np.arange(-0.5, 11.5, 1), edgecolor='black', density=True)
plt.title("Binomial Distribution Probabilities")
plt.show()
```

---

### 14.  Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution.

```python
def calculate_z(x, mean, std):
    return (x - mean) / std

print("Z-score of 72 (mean=70, std=5):", calculate_z(72, 70, 5))
```

---

### 15.  Implement hypothesis testing using Z-statistics for a sample dataset.

```python
from scipy.stats import norm

sample_mean = 105
population_mean = 100
std_dev = 15
n = 36

z = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = 1 - norm.cdf(z)

print("Z-Statistic:", z)
print("P-value:", p_value)
```

---

### 16. Create a confidence interval for a dataset using Python and interpret the result.

```python
import scipy.stats as stats

data = np.random.normal(loc=50, scale=10, size=100)
mean = np.mean(data)
sem = stats.sem(data)
conf = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem)

print("95% Confidence Interval:", conf)
```

---

### 17.  Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean.

```python
data = np.random.normal(loc=60, scale=8, size=100)
mean = np.mean(data)
sem = stats.sem(data)
conf = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem)

print("CI for Mean:", conf)
```

---

### 18.  Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution.
```python
x = np.linspace(-4, 4, 1000)
pdf = stats.norm.pdf(x, 0, 1)
plt.plot(x, pdf)
plt.title("PDF of Normal Distribution")
plt.show()
```

---

### 19.  Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution.

```python
x = np.arange(0, 15)
cdf = poisson.cdf(x, mu=5)

plt.step(x, cdf, where='mid')
plt.title("CDF of Poisson Distribution (λ = 5)")
plt.grid(True)
plt.show()
```

---

### 20.  Simulate a random variable using a continuous uniform distribution and calculate its expected value.

```python
data = np.random.uniform(0, 1, 10000)
expected_value = np.mean(data)
print("Expected Value:", expected_value)
```

---

### 21. Write a Python program to compare the standard deviations of two datasets and visualize the difference.

```python
data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(0, 2, 1000)

plt.hist(data1, bins=30, alpha=0.5, label='Std=1')
plt.hist(data2, bins=30, alpha=0.5, label='Std=2')
plt.legend()
plt.title("Comparison of Standard Deviations")
plt.show()
```

---

### 22. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution.

```python
data = np.random.normal(50, 15, 1000)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
range_val = np.ptp(data)

print("Range:", range_val)
print("IQR:", iqr)
```

---

### 23.Implement Z-score normalization on a dataset and visualize its transformation.

```python
normalized = zscore(data)
plt.hist(normalized, bins=30, edgecolor='black')
plt.title("Z-score Normalized Data")
plt.show()
```

---

# 24. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution.

```python
from scipy.stats import skew, kurtosis

print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))
```

---


# PRACTICAL QUESTIONS PART 2


## 1.  Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and interpret the result.

```python
from scipy.stats import norm
import numpy as np

def one_sample_z_test(sample, pop_mean, pop_std):
    n = len(sample)
    sample_mean = np.mean(sample)
    z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
    p_value = 2 * (1 - norm.cdf(abs(z)))
    
    print(f"Z-statistic: {z:.4f}, P-value: {p_value:.4f}")
    if p_value < 0.05:
        print("Reject the null hypothesis.")
    else:
        print("Fail to reject the null hypothesis.")
    return z, p_value
```

---

## 2. Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python.

```python
np.random.seed(42)
sample = np.random.normal(loc=52, scale=10, size=30)
z, p = one_sample_z_test(sample, pop_mean=50, pop_std=10)
```

---
## 3.  Implement a one-sample Z-test using Python to compare the sample mean with the population mean.

```python
import numpy as np
import scipy.stats as stats

# Sample data
sample = np.array([12, 15, 14, 10, 13, 17, 16, 12, 14, 15])

# Population parameters
population_mean = 13
population_std = 2  # Assume we know the population standard deviation

# Sample statistics
sample_mean = np.mean(sample)
sample_size = len(sample)

# Compute the Z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Compute the p-value
p_value = 1 - stats.norm.cdf(z_score)  # One-tailed test (adjust if two-tailed is needed)

# Output results
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpretation
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")
```


## 4. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing.

```python
import matplotlib.pyplot as plt

def visualize_z_test(z, alpha=0.05):
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)
    critical = norm.ppf(1 - alpha / 2)
    
    plt.plot(x, y, label="Z-distribution")
    plt.fill_between(x, y, where=(x < -critical) | (x > critical), color='red', alpha=0.3, label='Rejection region')
    plt.axvline(z, color='blue', linestyle='--', label=f'Z = {z:.2f}')
    plt.title("Two-tailed Z-test")
    plt.legend()
    plt.show()
```

---

## 5. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing.

```python
def visualize_type_errors(mu0, mu1, sigma, n, alpha=0.05):
    x = np.linspace(mu0 - 4*sigma, mu1 + 4*sigma, 1000)
    se = sigma / np.sqrt(n)
    
    z_crit = norm.ppf(1 - alpha / 2)
    lower = mu0 - z_crit * se
    upper = mu0 + z_crit * se

    plt.plot(x, norm.pdf(x, mu0, se), label="Null Distribution (H0)")
    plt.plot(x, norm.pdf(x, mu1, se), label="Alternative Distribution (H1)")
    plt.axvline(lower, color='red', linestyle='--', label='Rejection boundaries')
    plt.axvline(upper, color='red', linestyle='--')
    plt.fill_between(x, norm.pdf(x, mu0, se), where=(x < lower) | (x > upper), color='red', alpha=0.3, label='Type I Error')
    plt.fill_between(x, norm.pdf(x, mu1, se), where=(x >= lower) & (x <= upper), color='blue', alpha=0.3, label='Type II Error')
    plt.legend()
    plt.title("Type I and Type II Error Visualization")
    plt.show()
```

---

## 6. Write a Python program to perform an independent T-test and interpret the results.

```python
from scipy.stats import ttest_ind

def independent_t_test(sample1, sample2):
    t_stat, p_val = ttest_ind(sample1, sample2)
    print(f"T-statistic: {t_stat:.4f}, P-value: {p_val:.4f}")
    return t_stat, p_val
```

---

## 7.  Perform a paired sample T-test using Python and visualize the comparison results.

```python
from scipy.stats import ttest_rel

def paired_t_test(sample1, sample2):
    t_stat, p_val = ttest_rel(sample1, sample2)
    print(f"Paired T-test T-stat: {t_stat:.4f}, P-value: {p_val:.4f}")
    plt.hist(np.array(sample1) - np.array(sample2), bins=10, alpha=0.7)
    plt.title("Histogram of Differences (Paired Samples)")
    plt.show()
    return t_stat, p_val
```

---

## 8.Simulate data and perform both Z-test and T-test, then compare the results using Python.

```python
def compare_z_t_tests(sample, pop_mean, pop_std):
    z_stat, z_p = one_sample_z_test(sample, pop_mean, pop_std)
    t_stat, t_p = ttest_1samp(sample, pop_mean)
    print(f"Z-test: Z={z_stat:.2f}, P={z_p:.4f}")
    print(f"T-test: T={t_stat:.2f}, P={t_p:.4f}")
```

---

## 9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance.

```python
def confidence_interval(sample, confidence=0.95):
    mean = np.mean(sample)
    se = np.std(sample, ddof=1) / np.sqrt(len(sample))
    margin = norm.ppf(1 - (1 - confidence)/2) * se
    ci = (mean - margin, mean + margin)
    print(f"{confidence*100:.0f}% Confidence Interval: {ci}")
    return ci
```

---

## 10.Write a Python program to calculate the margin of error for a given confidence level using sample data.

```python
def margin_of_error(sample, confidence=0.95):
    se = np.std(sample, ddof=1) / np.sqrt(len(sample))
    moe = norm.ppf(1 - (1 - confidence)/2) * se
    print(f"Margin of Error at {confidence*100:.0f}%: {moe:.4f}")
    return moe
```

---

##  11. Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process.

```python
def bayes_theorem(prior_A, likelihood_B_given_A, prob_B):
    posterior = (likelihood_B_given_A * prior_A) / prob_B
    print(f"Posterior Probability: {posterior:.4f}")
    return posterior
```

---

##  12.Perform a Chi-square test for independence between two categorical variables in Python.

```python
import pandas as pd
from scipy.stats import chi2_contingency

def chi_square_test(data):
    chi2, p, dof, expected = chi2_contingency(data)
    print(f"Chi2 = {chi2:.4f}, P-value = {p:.4f}, DOF = {dof}")
    return chi2, p, expected
```

---

##  13.Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data.

```python
def expected_freq(data):
    _, _, _, expected = chi2_contingency(data)
    print("Expected Frequencies:")
    print(expected)
    return expected
```

---

##  14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution.

```python
from scipy.stats import chisquare

def goodness_of_fit(observed, expected):
    chi2, p = chisquare(f_obs=observed, f_exp=expected)
    print(f"Chi2: {chi2:.4f}, P-value: {p:.4f}")
    return chi2, p
```

---


