In [None]:
Sure! Here's a numbered list with answers to the questions you mentioned:

---

### **1) What is hypothesis testing in statistics?**
Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on sample data. It involves two competing hypotheses:
- **Null Hypothesis (H₀)**: Assumes no effect or no difference exists in the population.
- **Alternative Hypothesis (H₁)**: Suggests that there is an effect or a difference.

The goal is to determine whether the sample data provides sufficient evidence to reject the null hypothesis.

### **2) What is the null hypothesis, and how does it differ from the alternative hypothesis?**
- **Null Hypothesis (H₀)**: Assumes no effect or no relationship. It is the hypothesis that is tested for rejection.
  - Example: "There is no difference between two groups."
- **Alternative Hypothesis (H₁ or Ha)**: The hypothesis that there is a significant effect or relationship.
  - Example: "There is a difference between two groups."

### **3) What is the significance level in hypothesis testing, and why is it important?**
The **significance level (α)** is the threshold used to determine whether to reject the null hypothesis. It defines the probability of making a **Type I error** (i.e., rejecting a true null hypothesis). A common significance level is 0.05 (5%).
- **Importance**: It controls the risk of falsely rejecting the null hypothesis. Smaller α reduces Type I error.

### **4) What does a P-value represent in hypothesis testing?**
The **P-value** is the probability of observing results as extreme as those in the sample, assuming the null hypothesis is true. A smaller P-value indicates stronger evidence against the null hypothesis.

### **5) How do you interpret the P-value in hypothesis testing?**
- **P-value < α**: Reject the null hypothesis (indicates sufficient evidence for the alternative hypothesis).
- **P-value ≥ α**: Fail to reject the null hypothesis (insufficient evidence for the alternative hypothesis).

### **6) What are Type 1 and Type 2 errors in hypothesis testing?**
- **Type I Error (False Positive)**: Rejecting the null hypothesis when it is actually true.
- **Type II Error (False Negative)**: Failing to reject the null hypothesis when it is actually false.

### **7) What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**
- **One-Tailed Test**: The alternative hypothesis is directional (testing if a parameter is greater than or less than a certain value).
- **Two-Tailed Test**: The alternative hypothesis is non-directional (testing if a parameter is different from a certain value, either greater or smaller).

### **8) What is the Z-test, and when is it used in hypothesis testing?**
The **Z-test** is used to compare the sample mean to the population mean when the sample size is large (typically n > 30) or when the population standard deviation is known.

### **9) How do you calculate the Z-score, and what does it represent in hypothesis testing?**
The **Z-score** is calculated as:
\[ Z = \frac{X - \mu}{\sigma / \sqrt{n}} \]
- \(X\) = sample mean, \( \mu \) = population mean, \( \sigma \) = population standard deviation, \( n \) = sample size.
- **Interpretation**: It represents the number of standard deviations the sample mean is from the population mean.

### **10) What is the T-distribution, and when should it be used instead of the normal distribution?**
The **T-distribution** is used when the sample size is small (typically n < 30) or the population standard deviation is unknown. It has heavier tails and is more spread out than the normal distribution.
- **When to use**: When the population standard deviation is unknown or the sample size is small.

### **11) What is the difference between a Z-test and a T-test?**
- **Z-test**: Used for large samples (n > 30) or when the population standard deviation is known.
- **T-test**: Used for small samples (n < 30) or when the population standard deviation is unknown.

### **12) What is the T-test, and how is it used in hypothesis testing?**
The **T-test** is used to determine if there is a significant difference between the means of two groups (for small sample sizes or unknown population standard deviation).
- **Types**:
  - **One-sample T-test**: Compares a sample mean to a known value.
  - **Independent T-test**: Compares the means of two independent groups.
  - **Paired T-test**: Compares means from the same group at different times or conditions.

### **13) What is the relationship between Z-test and T-test in hypothesis testing?**
Both tests are used to compare sample means to population parameters. The difference is that:
- **Z-test** uses the normal distribution and is used for large samples or known population standard deviations.
- **T-test** uses the T-distribution and is used for small samples or unknown population standard deviations.

### **14) What is a confidence interval, and how is it used to interpret statistical results?**
A **confidence interval (CI)** is a range of values that estimates the true population parameter with a specified level of confidence (e.g., 95%).
- **Interpretation**: A 95% CI means that we are 95% confident that the true parameter lies within this interval.

### **15) What is the margin of error, and how does it affect the confidence interval?**
The **margin of error** is the range of values added and subtracted from the sample statistic to create the confidence interval. It is influenced by sample size and variability.
- **Effect**: Larger margins of error result in wider confidence intervals, indicating more uncertainty.

### **16) How is Bayes' Theorem used in statistics, and what is its significance?**
**Bayes' Theorem** updates the probability of a hypothesis based on new evidence. It combines prior knowledge (prior probability) and observed data (likelihood) to calculate the posterior probability.
- **Significance**: It provides a framework for incorporating new information into probabilistic models.

### **17) What is the Chi-square distribution, and when is it used?**
The **Chi-square distribution** is a probability distribution used for categorical data. It is applied in tests like the **Chi-square goodness of fit test** and the **Chi-square test for independence**.

### **18) What is the Chi-square goodness of fit test, and how is it applied?**
The **Chi-square goodness of fit test** compares observed frequencies to expected frequencies to see if a sample follows a hypothesized distribution.
- **Application**: It is used to test whether categorical data follows a specific distribution.

### **19) What is the F-distribution, and when is it used in hypothesis testing?**
The **F-distribution** is used to test the variances of two or more populations. It is commonly applied in **ANOVA** and regression analysis to compare variances.

### **20) What is an ANOVA test, and what are its assumptions?**
**ANOVA (Analysis of Variance)** compares means of three or more groups to determine if at least one differs significantly.
- **Assumptions**:
  - Data is normally distributed.
  - Samples are independent.
  - Homogeneity of variance (equal variances).

### **21) What are the different types of ANOVA tests?**
- **One-Way ANOVA**: Compares means across multiple groups based on a single factor.
- **Two-Way ANOVA**: Compares means across multiple groups based on two factors, accounting for interactions.
- **Repeated Measures ANOVA**: Compares means from the same group at different times or under different conditions.

### **22) What is the F-test, and how does it relate to hypothesis testing?**
The **F-test** compares the variances of two samples to determine if they come from populations with equal variances.
- **Relation**: It is used in **ANOVA** and regression analysis to test the equality of variances.
.

In [None]:
Below are Python programs for each of the tasks you requested:

---

### **1) Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and interpret the results**

```python
import numpy as np
import scipy.stats as stats

# Sample data
sample_data = np.array([22, 24, 27, 21, 23, 26, 28, 25, 30, 22])
sample_mean = np.mean(sample_data)
sample_size = len(sample_data)
population_mean = 25
population_std = 2  # Known population standard deviation

# Calculate the Z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Calculate the p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Output the result
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")
```

### **2) Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python**

```python
# Simulating data
np.random.seed(42)
sample_data = np.random.normal(25, 5, 30)  # Mean = 25, std = 5, n = 30

# Population parameters
population_mean = 25

# Perform Z-test
sample_mean = np.mean(sample_data)
sample_size = len(sample_data)
population_std = 5  # Known population standard deviation

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")
```

### **3) Implement a one-sample Z-test using Python to compare the sample mean with the population mean**

This is very similar to task **1)**. The code provided above for the Z-test works for one-sample Z-tests.

### **4) Perform a two-tailed Z-test using Python and visualize the decision region on a plot**

```python
import matplotlib.pyplot as plt

# Z-test calculation
sample_data = np.random.normal(25, 5, 30)
sample_mean = np.mean(sample_data)
population_mean = 25
population_std = 5
sample_size = len(sample_data)

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Plot decision region
x = np.linspace(-4, 4, 100)
y = stats.norm.pdf(x)
plt.plot(x, y)

# Plot the critical values for a 0.05 significance level
plt.axvline(x=-1.96, color='red', linestyle='--')
plt.axvline(x=1.96, color='red', linestyle='--')

plt.title(f"Two-tailed Z-test (Z-score: {z_score:.2f}, P-value: {p_value:.4f})")
plt.show()
```

### **5) Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing**

```python
def plot_errors(true_mean, sample_size, population_std, significance_level=0.05):
    # Type 1 Error (false positive)
    critical_value = stats.norm.ppf(1 - significance_level / 2)
    rejection_region = (critical_value, -critical_value)

    # Type 2 Error (false negative) - we assume the true mean is different
    true_sample_mean = true_mean
    z_score = (true_sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

    # Plot
    x = np.linspace(-4, 4, 100)
    y = stats.norm.pdf(x)

    plt.plot(x, y)
    plt.axvline(x=-critical_value, color='red', linestyle='--', label="Type 1 Error")
    plt.axvline(x=critical_value, color='red', linestyle='--')

    plt.fill_between(x, 0, y, where=(x > rejection_region[0]), color='green', alpha=0.3)
    plt.fill_between(x, 0, y, where=(x < rejection_region[1]), color='green', alpha=0.3)

    plt.title(f"Type 1 and Type 2 Errors Visualization")
    plt.legend()
    plt.show()

# Example usage
population_mean = 25
sample_size = 30
population_std = 5
true_mean = 26  # Assume the true mean is different
plot_errors(true_mean, sample_size, population_std)
```

### **6) Write a Python program to perform an independent T-test and interpret the results**

```python
from scipy.stats import t

# Two independent samples
sample1 = np.random.normal(25, 5, 30)
sample2 = np.random.normal(28, 5, 30)

# Perform T-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)

print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The means are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the means.")
```

### **7) Perform a paired sample T-test using Python and visualize the comparison results**

```python
# Paired sample data (before and after treatment)
before = np.random.normal(25, 5, 30)
after = np.random.normal(27, 5, 30)

# Perform Paired T-test
t_stat, p_value = stats.ttest_rel(before, after)

print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

# Visualization
plt.boxplot([before, after], labels=['Before', 'After'])
plt.title(f"Paired Sample T-test (P-value: {p_value:.4f})")
plt.show()
```

### **8) Simulate data and perform both Z-test and T-test, then compare the results using Python**

```python
# Simulate data
sample_data = np.random.normal(25, 5, 30)
population_mean = 25
population_std = 5
sample_size = len(sample_data)

# Perform Z-test
z_score = (np.mean(sample_data) - population_mean) / (population_std / np.sqrt(sample_size))
z_p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Perform T-test
t_stat, t_p_value = stats.ttest_1samp(sample_data, population_mean)

print(f"Z-test P-value: {z_p_value}")
print(f"T-test P-value: {t_p_value}")

if z_p_value < 0.05:
    print("Z-test: Reject the null hypothesis.")
else:
    print("Z-test: Fail to reject the null hypothesis.")

if t_p_value < 0.05:
    print("T-test: Reject the null hypothesis.")
else:
    print("T-test: Fail to reject the null hypothesis.")
```

### **9) Write a Python function to calculate the confidence interval for a sample mean and explain its significance**

```python
def confidence_interval(sample_data, confidence_level=0.95):
    sample_mean = np.mean(sample_data)
    sample_std = np.std(sample_data, ddof=1)
    sample_size = len(sample_data)

    # Calculate the margin of error using the t-distribution
    critical_value = stats.t.ppf((1 + confidence_level) / 2, df=sample_size-1)
    margin_of_error = critical_value * (sample_std / np.sqrt(sample_size))

    # Confidence Interval
    ci_lower = sample_mean - margin_of_error
    ci_upper = sample_mean + margin_of_error

    return (ci_lower, ci_upper)

# Example usage
sample_data = np.random.normal(25, 5, 30)
ci = confidence_interval(sample_data)
print(f"95% Confidence Interval: {ci}")

Below are Python code implementations for the requested tasks:

---

### **10) Write a Python program to calculate the margin of error for a given confidence level using sample data**

```python
import numpy as np
import scipy.stats as stats

def margin_of_error(sample_data, confidence_level=0.95):
    # Calculate sample statistics
    sample_mean = np.mean(sample_data)
    sample_std = np.std(sample_data, ddof=1)
    sample_size = len(sample_data)

    # Find the critical value for the confidence level
    critical_value = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

    # Calculate margin of error
    margin_of_error = critical_value * (sample_std / np.sqrt(sample_size))
    return margin_of_error

# Example usage
sample_data = np.random.normal(50, 10, 100)  # Example data
error = margin_of_error(sample_data, 0.95)
print(f"Margin of error: {error}")
```

---

### **11) Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process**

```python
def bayes_theorem(prior, likelihood, evidence):
    """
    Bayes' Theorem: P(A|B) = (P(B|A) * P(A)) / P(B)
    :param prior: P(A), the prior probability
    :param likelihood: P(B|A), the likelihood of the evidence given the hypothesis
    :param evidence: P(B), the total probability of the evidence
    :return: Posterior probability P(A|B)
    """
    posterior = (likelihood * prior) / evidence
    return posterior

# Example: Disease detection
prior = 0.01  # Prior probability of having the disease
likelihood = 0.90  # Likelihood of testing positive given you have the disease
evidence = 0.05  # Probability of testing positive (with or without disease)

posterior = bayes_theorem(prior, likelihood, evidence)
print(f"Posterior probability: {posterior}")
```

Explanation: **Bayes' Theorem** updates our belief about an event (A) after observing new evidence (B). The formula combines prior knowledge (prior probability), the likelihood of the observed evidence under the hypothesis, and the total probability of the evidence.

---

### **12) Perform a Chi-square test for independence between two categorical variables in Python**

```python
import scipy.stats as stats
import numpy as np

# Example data: Contingency table (Observed counts)
data = np.array([[10, 20, 30], [6, 9, 11], [4, 7, 12]])

# Perform Chi-square test
chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi2 statistic: {chi2_stat}")
print(f"P-value: {p_value}")
print(f"Degrees of freedom: {dof}")
print(f"Expected frequencies:\n{expected}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: There is a significant relationship between the variables.")
else:
    print("Fail to reject the null hypothesis: No significant relationship between the variables.")
```

---

### **13) Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data**

```python
import numpy as np
import scipy.stats as stats

# Example observed data
observed = np.array([[10, 20], [30, 40]])

# Perform Chi-square test to get expected frequencies
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Observed data:\n{observed}")
print(f"Expected frequencies:\n{expected}")
```

---

### **14) Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution**

```python
import numpy as np
import scipy.stats as stats

# Observed data
observed = np.array([15, 25, 35, 25])

# Expected probabilities for the goodness-of-fit test
expected = np.array([25, 25, 25, 25])  # Even distribution

# Perform Chi-square goodness-of-fit test
chi2_stat, p_value = stats.chisquare(observed, expected)

print(f"Chi2 statistic: {chi2_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The data doesn't fit the expected distribution.")
else:
    print("Fail to reject the null hypothesis: The data fits the expected distribution.")
```

---

### **15) Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics**

```python
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Simulate a Chi-square distribution with 2 degrees of freedom
data = np.random.chisquare(df=2, size=1000)

# Plot the Chi-square distribution
plt.hist(data, bins=30, density=True, alpha=0.7, color='g', edgecolor='black')

# Plot the theoretical Chi-square distribution
x = np.linspace(0, np.max(data), 100)
plt.plot(x, stats.chi2.pdf(x, df=2), 'r-', label="Theoretical Chi-square PDF")

plt.title('Chi-square Distribution (df=2)')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.show()
```

Explanation: The Chi-square distribution is often used in hypothesis testing for categorical data. It is right-skewed and the shape depends on the degrees of freedom (df). Higher df make the distribution more symmetric.

---

### **16) Implement an F-test using Python to compare the variances of two random samples**

```python
import numpy as np
import scipy.stats as stats

# Simulate two random samples
sample1 = np.random.normal(10, 2, 100)
sample2 = np.random.normal(12, 3, 100)

# Perform F-test for equality of variances
f_stat, p_value = stats.levene(sample1, sample2)

print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")
```

---

### **17) Write a Python program to perform an ANOVA test to compare means between multiple groups and interpret the results**

```python
import numpy as np
import scipy.stats as stats

# Simulate three groups of data
group1 = np.random.normal(20, 5, 30)
group2 = np.random.normal(25, 5, 30)
group3 = np.random.normal(30, 5, 30)

# Perform ANOVA test
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The means of the groups are significantly different.")
else:
    print("Fail to reject the null hypothesis: The means of the groups are not significantly different.")
```

---

### **18) Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results**

```python
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Simulate three groups of data
group1 = np.random.normal(20, 5, 30)
group2 = np.random.normal(25, 5, 30)
group3 = np.random.normal(30, 5, 30)

# Perform one-way ANOVA test
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Plot boxplot
plt.boxplot([group1, group2, group3], labels=['Group 1', 'Group 2', 'Group 3'])
plt.title("One-way ANOVA: Group Comparison")
plt.show()
```

---

### **19) Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA**

```python
import scipy.stats as stats
import numpy as np

def check_anova_assumptions(group1, group2, group3):
    # Check normality (Shapiro-Wilk test)
    for group, name in zip([group1, group2, group3], ['Group 1', 'Group 2', 'Group 3']):
        stat, p_value = stats.shapiro(group)
        print(f"Normality test for {name}: Statistic={stat}, P-value={p_value}")
        if p_value < 0.05:
            print(f"{name} is not normally distributed.")

    # Check equal variance (Levene's test)
    stat, p_value = stats.levene(group1, group2, group3)
    print(f"Levene's test for equal variance: Statistic={stat}, P-value={p_value}")
    if p_value < 0.05:
        print("The variances are not equal.")

# Example usage
group1 = np.random.normal(20, 5, 30)
group2 = np.random.normal(25, 5, 30)
group3 = np.random.normal(30, 5, 30)

check_anova_assumptions(group1, group2, group3)
```

---

### **20) Perform a two-way ANOVA test using Python to study the

 interaction between two factors and visualize the results**

```python
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Simulate data for two-way ANOVA
np.random.seed(42)
data = pd.DataFrame({
    'Factor1': np.repeat(['A', 'B'], 30),
    'Factor2': np.tile(['X', 'Y'], 30),
    'Score': np.random.normal(25, 5, 60)
})

# Perform two-way ANOVA
model = smf.ols('Score ~ C(Factor1) + C(Factor2) + C(Factor1):C(Factor2)', data=data).fit()
anova_table = sm.stats.anova_lm(model)
print(anova_table)
```

---

### **21) Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing**

```python
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Parameters for the F-distribution
dfn = 3  # Degrees of freedom numerator
dfd = 10  # Degrees of freedom denominator

# Generate values for plotting
x = np.linspace(0, 5, 1000)
y = stats.f.pdf(x, dfn, dfd)

# Plot the F-distribution
plt.plot(x, y, label=f"F-distribution (dfn={dfn}, dfd={dfd})")
plt.title("F-distribution")
plt.xlabel("X")
plt.ylabel("Density")
plt.legend()
plt.show()
```

---

### **22) Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means**

```python
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

# Simulate data for 3 groups
group1 = np.random.normal(50, 10, 30)
group2 = np.random.normal(55, 10, 30)
group3 = np.random.normal(60, 10, 30)

# Perform One-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Plot boxplot
plt.boxplot([group1, group2, group3], labels=["Group 1", "Group 2", "Group 3"])
plt.title("One-way ANOVA Comparison")
plt.show()
```

---

### **23) Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means**

```python
import numpy as np
import scipy.stats as stats

# Simulate data
data = np.random.normal(25, 5, 100)

# Perform one-sample Z-test
z_stat = (np.mean(data) - 25) / (5 / np.sqrt(100))
z_p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"Z-statistic: {z_stat}")
print(f"P-value: {z_p_value}")
```

---

### **24) Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results**

```python
import numpy as np
import scipy.stats as stats

# Sample data
data = np.random.normal(25, 5, 100)

# Population variance (assumed)
population_variance = 25

# Calculate sample variance
sample_variance = np.var(data, ddof=1)

# Perform Chi-square test for variance
chi2_stat = (len(data) - 1) * sample_variance / population_variance
p_value = 1 - stats.chi2.cdf(chi2_stat, df=len(data)-1)

print(f"Chi2 statistic: {chi2_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The variance is significantly different from the population variance.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variance.")
```

---

### **25) Write a Python script to perform a Z-test for comparing proportions between two datasets or groups**

```python
import numpy as np
import scipy.stats as stats

# Simulate two sample proportions
sample1_successes = 45
sample1_size = 100
sample2_successes = 40
sample2_size = 100

# Calculate sample proportions
p1 = sample1_successes / sample1_size
p2 = sample2_successes / sample2_size

# Calculate the pooled proportion
pooled_p = (sample1_successes + sample2_successes) / (sample1_size + sample2_size)

# Z-test formula for comparing proportions
z_stat = (p1 - p2) / np.sqrt(pooled_p * (1 - pooled_p) * (1/sample1_size + 1/sample2_size))

# Calculate the p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The proportions are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the proportions.")
```

---

### **26) Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results**

```python
import numpy as np
import scipy.stats as stats

# Simulate two random samples
sample1 = np.random.normal(25, 5, 30)
sample2 = np.random.normal(30, 6, 30)

# F-test for equality of variances
f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
dfn = len(sample1) - 1  # Degrees of freedom for sample 1
dfd = len(sample2) - 1  # Degrees of freedom for sample 2

# Calculate the p-value for the F-statistic
p_value = 1 - stats.f.cdf(f_stat, dfn, dfd)

print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")
```

---

### **27) Perform a Chi-square test for goodness of fit with simulated data and analyze the results**

```python
import numpy as np
import scipy.stats as stats

# Simulate observed data
observed = np.random.choice([10, 20, 30, 40], 100, p=[0.25, 0.25, 0.25, 0.25])

# Expected frequencies (assumed)
expected = np.array([25, 25, 25, 25])

# Perform Chi-square test for goodness of fit
chi2_stat, p_value = stats.chisquare(np.bincount(observed), expected)

print(f"Chi2 statistic: {chi2_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("Reject the null hypothesis: The observed data does not fit the expected distribution.")
else:
    print("Fail to reject the null hypothesis: The observed data fits the expected distribution.")
```
