Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups to determine if there are statistically significant differences among them. To use ANOVA effectively and ensure the validity of its results, certain assumptions must be met. Here are the key assumptions required for ANOVA and examples of violations that could impact the validity of the results:

1. **Independence of Observations**:
   - Assumption: Observations within each group must be independent of each other.
   - Violations: If observations within groups are not independent (e.g., repeated measures design where subjects are measured multiple times), it can inflate the Type I error rate and lead to inaccurate results.

2. **Normality**:
   - Assumption: The data within each group should be approximately normally distributed.
   - Violations: If the data are not normally distributed (e.g., heavily skewed or have outliers), it can affect the accuracy of the p-values and confidence intervals derived from ANOVA. Transformation of the data or using non-parametric alternatives may be necessary in such cases.

3. **Homogeneity of Variance**:
   - Assumption: The variance within each group should be approximately equal across all groups (homoscedasticity).
   - Violations: If the variances are not equal across groups (heteroscedasticity), it can lead to inflated Type I error rates and compromise the reliability of ANOVA results. Corrective measures such as using Welch's ANOVA or transformation of the data may be needed.

4. **Random Sampling**:
   - Assumption: Samples are selected randomly from the population.
   - Violations: Non-random sampling methods can introduce bias and affect the generalizability of the results obtained from ANOVA.

5. **Interval or Ratio Scale Data**:
   - Assumption: The dependent variable (outcome) should be measured on an interval or ratio scale.
   - Violations: Using categorical or ordinal data as the dependent variable may not be appropriate for ANOVA and could lead to erroneous conclusions.

Examples of Violations:
- Violation of independence: Conducting ANOVA on repeated measures data without accounting for the dependency between observations from the same subject.
- Violation of normality: Performing ANOVA on skewed data without transformation or using non-parametric alternatives.
- Violation of homogeneity of variance: Conducting ANOVA on data with unequal variances across groups, resulting in incorrect conclusions.
- Violation of random sampling: Conducting ANOVA on convenience samples rather than randomly selected samples, leading to biased results.
- Violation of scale: Using ordinal data as the dependent variable in ANOVA instead of converting it to interval or ratio scale.

Q2. What are the three types of ANOVA, and in what situations would each be used?

ANOVA (Analysis of Variance) can be classified into three main types based on the number of independent variables and their levels. These types include:

1. **One-Way ANOVA**:
   - Situation: One-way ANOVA is used when there is one categorical independent variable (factor) with two or more levels (groups).
   - Example: Suppose a researcher wants to compare the mean scores of students across three different teaching methods (e.g., traditional lecture, online tutorial, and interactive workshop). One-way ANOVA would be appropriate to determine if there are significant differences in student performance among the three teaching methods.

2. **Two-Way ANOVA**:
   - Situation: Two-way ANOVA is used when there are two categorical independent variables (factors) and their interaction, with each factor having two or more levels (groups).
   - Example: Consider a study investigating the effects of both gender (male vs. female) and age group (young adults vs. older adults) on the response time of individuals to a stimulus. Two-way ANOVA can determine if there are main effects of gender and age group as well as their interaction on response time.

3. **N-Way ANOVA (Multi-Factor ANOVA)**:
   - Situation: N-way ANOVA extends the concept of two-way ANOVA to situations where there are more than two categorical independent variables (factors) and their interactions, with each factor having two or more levels (groups).
   - Example: Suppose a study aims to investigate the effects of various factors (e.g., age, gender, education level, and income level) on individuals' preferences for different brands of smartphones. N-way ANOVA can be employed to analyze the main effects of each factor and their interactions on smartphone brand preference.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

In ANOVA (Analysis of Variance), the partitioning of variance refers to the division of total variability in the data into different components, each representing the variability attributed to specific sources or factors. Understanding the concept of partitioning of variance is crucial in ANOVA for several reasons:

1. **Understanding Sources of Variation**: Partitioning of variance helps identify and quantify the contribution of different sources of variation to the total variability observed in the data. This includes understanding the variability between groups (due to the effect of the independent variable) and within groups (residual or error variability).

2. **Interpretation of Results**: By partitioning the variance, ANOVA provides insights into the relative importance of various factors or treatments in explaining the observed differences in the dependent variable. It allows researchers to interpret the significance of the effects of independent variables on the dependent variable.

3. **Assessment of Model Fit**: Partitioning of variance helps evaluate the fit of the ANOVA model to the data. It allows researchers to assess whether the model adequately accounts for the observed variability and whether any unexplained variability (residuals) remains.

4. **Hypothesis Testing**: ANOVA tests hypotheses about differences in means among groups or treatments. Partitioning of variance provides the basis for calculating the F-statistic, which is used to determine whether the observed differences among group means are statistically significant.

5. **Statistical Power and Sample Size Determination**: Understanding the partitioning of variance helps researchers in estimating statistical power and determining the appropriate sample size for their ANOVA study. It provides insights into the factors that contribute most to the variability in the dependent variable, which can influence decisions about study design and data collection.

Overall, understanding the partitioning of variance in ANOVA is essential for accurately interpreting results, assessing model fit, conducting hypothesis tests, and making informed decisions about experimental design and analysis. It enables researchers to identify the sources of variability that are most relevant to their research questions and to draw valid conclusions from their ANOVA analyses.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

In a one-way ANOVA, the total sum of squares (SST) represents the total variability in the dependent variable, the explained sum of squares (SSE) represents the variability explained by the group means, and the residual sum of squares (SSR) represents the unexplained variability or error. Here's how you can calculate SST, SSE, and SSR using Python:

```python
import numpy as np

# Example data (group means)
group_means = [82, 78, 75, 80]  # Example group means

# Example data (individual observations within each group)
data = [[80, 85, 78], [77, 79, 76, 80], [73, 76, 78], [78, 82, 81]]

# Calculate the total sum of squares (SST)
overall_mean = np.mean(group_means)
SST = sum((np.array(obs) - overall_mean)**2 for obs in data)

# Calculate the explained sum of squares (SSE)
SSE = sum(len(obs) * (np.mean(obs) - overall_mean)**2 for obs in data)

# Calculate the residual sum of squares (SSR)
SSR = SST - SSE

# Print the results
print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)
```

In this code:
- `group_means` represents the means of each group in the one-way ANOVA.
- `data` represents the individual observations within each group.
- SST is calculated as the sum of squared differences between each observation and the overall mean.
- SSE is calculated as the sum of squared differences between each group mean and the overall mean, weighted by the number of observations in each group.
- SSR is calculated as the difference between SST and SSE.

This code assumes that `group_means` contains the means of each group and `data` is a list of lists, where each sublist contains the observations for a particular group. You can modify the code according to your specific data format and values.

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

To calculate the main effects and interaction effects in a two-way ANOVA using Python, you can use libraries such as `statsmodels` or `scipy`. Here's how you can do it using `statsmodels`:

```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {
    'A': ['A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2', 'A2'],
    'B': ['B1', 'B2', 'B1', 'B2', 'B1', 'B2', 'B1', 'B2'],
    'Y': [10, 20, 15, 25, 30, 40, 35, 45]
}

# Create DataFrame
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('Y ~ C(A) + C(B) + C(A):C(B)', data=df).fit()

# Print ANOVA table
print(sm.stats.anova_lm(model, typ=2))
```

This code performs a two-way ANOVA on the example data, where `A` and `B` are the two independent variables, and `Y` is the dependent variable. The `ols` function from `statsmodels.formula.api` is used to define the model formula, and `fit` is used to fit the model to the data. The ANOVA table is printed using `sm.stats.anova_lm`.

The ANOVA table will show the main effects of `A` and `B`, as well as the interaction effect between `A` and `B`.

Note: Make sure you have the `statsmodels` library installed (`pip install statsmodels`) before running this code.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

In a one-way ANOVA, the F-statistic tests the null hypothesis that the means of all groups are equal against the alternative hypothesis that at least one group mean is different from the others. The p-value associated with the F-statistic indicates the probability of obtaining the observed F-statistic (or one more extreme) if the null hypothesis is true. Here's how you can interpret the results:

1. **F-Statistic**: The F-statistic value of 5.23 indicates the ratio of the between-group variability to the within-group variability. In other words, it measures how much larger the differences between the group means are compared to the variability within each group.

2. **P-Value**: The p-value of 0.02 represents the probability of observing an F-statistic as extreme as 5.23 under the null hypothesis of no difference between group means. A p-value below the chosen significance level (e.g., 0.05) suggests that the observed differences between the group means are statistically significant.

Interpretation:

- Since the p-value (0.02) is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis. This indicates that there is sufficient evidence to conclude that at least one group mean is different from the others.

- In practical terms, the results suggest that there are statistically significant differences between the groups in the study. However, the ANOVA does not identify which specific groups differ from each other. Post-hoc tests (e.g., Tukey's HSD, Bonferroni) can be conducted to determine pairwise differences between groups.

- Overall, the findings suggest that the factor being studied (the independent variable) has a significant effect on the dependent variable, and further analysis can be conducted to explore the nature and implications of these differences.

It's important to consider the context of the study and the specific research question when interpreting the results of an ANOVA. Additionally, it's recommended to report effect sizes and conduct post-hoc tests for a more comprehensive understanding of the differences between groups.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA is essential to ensure the validity and reliability of the analysis. Here are some common methods for handling missing data in repeated measures ANOVA and their potential consequences:

1. **Complete Case Analysis (Listwise Deletion)**:
   - Method: Exclude cases with missing data on any variable included in the analysis.
   - Consequences:
     - Reduces the sample size and statistical power, potentially leading to biased results if the missing data are not missing completely at random (MCAR).
     - May introduce bias if the missingness mechanism is related to the outcome variable or other variables in the analysis.

2. **Mean Imputation**:
   - Method: Replace missing values with the mean of the observed values for the variable.
   - Consequences:
     - Preserves sample size but underestimates the variability in the data, leading to biased estimates of means, variances, and standard errors.
     - Can distort relationships between variables and inflate correlations.

3. **Last Observation Carried Forward (LOCF)**:
   - Method: Replace missing values with the last observed value from the same participant.
   - Consequences:
     - Assumes that the missing data are similar to the last observed data point, which may not be appropriate for all variables.
     - Can artificially inflate the apparent stability of outcomes over time and produce biased estimates.

4. **Multiple Imputation**:
   - Method: Generate multiple plausible values for each missing value based on the observed data, and analyze each imputed dataset separately before combining results.
   - Consequences:
     - Preserves sample size and incorporates uncertainty about missing data into the analysis, providing more accurate estimates of parameters and standard errors.
     - Requires more computational resources and assumptions about the missing data mechanism, which may affect the validity of results if the assumptions are violated.

5. **Maximum Likelihood Estimation (MLE)**:
   - Method: Estimate parameters of the model using likelihood-based methods that account for missing data.
   - Consequences:
     - Provides unbiased estimates of parameters and standard errors under the missing at random (MAR) assumption.
     - Requires specifying a model for the missing data mechanism, and results may be sensitive to the chosen model.

The choice of method for handling missing data in repeated measures ANOVA depends on the nature of the missing data, assumptions about the missingness mechanism, and the goals of the analysis. It's important to carefully consider the potential consequences of each method and to perform sensitivity analyses to assess the robustness of results to different approaches. Additionally, reporting the method used for handling missing data and conducting sensitivity analyses can enhance the transparency and reproducibility of research findings.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

After conducting an ANOVA and finding a significant overall effect, post-hoc tests are often used to determine which specific group means differ from each other. Here are some common post-hoc tests used after ANOVA and when you would use each one:

1. **Tukey's Honestly Significant Difference (Tukey's HSD)**:
   - When to use: Tukey's HSD is used when you have multiple groups and want to test all possible pairwise comparisons while controlling for the familywise error rate.
   - Example: In a study comparing the effectiveness of four different treatments for a medical condition, you could use Tukey's HSD to determine which treatments result in significantly different outcomes.

2. **Bonferroni Correction**:
   - When to use: Bonferroni correction is used when you have multiple pairwise comparisons and want to control the familywise error rate by adjusting the significance level for each comparison.
   - Example: In a study comparing the effectiveness of four different teaching methods on student performance, you could use Bonferroni correction to adjust the significance level for pairwise comparisons between each pair of teaching methods.

3. **Sidak Correction**:
   - When to use: Sidak correction is similar to Bonferroni correction and is used to adjust the significance level for multiple pairwise comparisons while controlling the familywise error rate.
   - Example: In a study comparing the effects of three different dietary interventions on blood sugar levels, you could use Sidak correction to test all possible pairwise comparisons between the interventions.

4. **Duncan's New Multiple Range Test (Duncan's MRT)**:
   - When to use: Duncan's MRT is used when you have multiple groups and want to compare all possible pairwise combinations while controlling the Type I error rate.
   - Example: In a study comparing the yield of different crop varieties, you could use Duncan's MRT to determine which varieties have significantly different yields.

5. **Bonferroni-Dunn Test**:
   - When to use: Bonferroni-Dunn test is used in situations where you have a control group and want to compare each treatment group with the control while controlling the familywise error rate.
   - Example: In a study comparing the effects of three different medications on blood pressure, with a placebo as the control group, you could use Bonferroni-Dunn test to compare each medication group with the placebo group.

Post-hoc tests are necessary in situations where the overall ANOVA indicates a significant difference among groups, but it does not specify which specific groups differ from each other. By performing post-hoc tests, researchers can identify the pairwise differences that contribute to the overall effect observed in the ANOVA.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

To conduct a one-way ANOVA using Python to compare the mean weight loss of three diets (A, B, and C), you can use the `scipy.stats` module. Here's how you can do it:

```python
import numpy as np
from scipy.stats import f_oneway

# Example data (weight loss for each diet)
diet_A = [5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8]
diet_B = [4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7]
diet_C = [3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6]

# Perform one-way ANOVA
F_statistic, p_value = f_oneway(diet_A, diet_B, diet_C)

# Print the results
print("F-statistic:", F_statistic)
print("p-value:", p_value)

# Interpret the results
if p_value < 0.05:
    print("The p-value is less than 0.05, indicating that there is a significant difference in mean weight loss among the three diets.")
else:
    print("The p-value is greater than or equal to 0.05, indicating that there is no significant difference in mean weight loss among the three diets.")
```

In this code:
- `diet_A`, `diet_B`, and `diet_C` represent the weight loss data for participants assigned to diets A, B, and C, respectively.
- The `f_oneway` function from `scipy.stats` is used to perform the one-way ANOVA.
- The F-statistic and p-value are printed.
- The results are interpreted based on the p-value. If the p-value is less than 0.05 (commonly chosen significance level), it indicates a significant difference in mean weight loss among the three diets.

You can replace the example data with your actual data to perform the analysis on your dataset. Make sure you have the necessary Python libraries (`numpy` and `scipy.stats`) installed to run this code.

Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

To conduct a two-way ANOVA in Python to analyze the effects of software programs (Program A, Program B, Program C) and employee experience level (novice vs. experienced) on the time it takes to complete a task, you can use the `statsmodels` library. Here's how you can do it:

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {
    'Program': ['A', 'B', 'C'] * 20,  # 20 employees assigned to each program
    'Experience': ['Novice', 'Experienced'] * 30,  # 15 novice and 15 experienced employees in each program
    'Time': [10, 12, 11, 13, 15, 14, 9, 11, 10, 12, 13, 15, 14, 16, 17] * 6  # Example time data
}

# Create DataFrame
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data=df).fit()

# Print ANOVA table
print(sm.stats.anova_lm(model, typ=2))
```

In this code:
- `data` represents the example data with columns for the software program used (`Program`), employee experience level (`Experience`), and the time it takes to complete the task (`Time`).
- A DataFrame (`df`) is created using the example data.
- The `ols` function from `statsmodels.formula.api` is used to define the model formula, which includes the main effects of `Program` and `Experience` as well as their interaction.
- The `fit` method is used to fit the model to the data.
- The ANOVA table is printed using `sm.stats.anova_lm`.

The ANOVA table will show the main effects of software program (`Program`), employee experience level (`Experience`), and their interaction effect. You can interpret the F-statistics and p-values to determine if there are significant main effects or interaction effects between the software programs and employee experience levels on the time it takes to complete the task.

Interpretation of the results should consider both the significance of individual effects and their interactions, as well as their practical implications for the task completion time.

Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

To conduct a two-sample t-test in Python to compare test scores between the control group (traditional teaching method) and the experimental group (new teaching method), you can use the `scipy.stats` module. Here's how you can do it:

```python
import numpy as np
from scipy.stats import ttest_ind

# Example data (test scores for control and experimental groups)
control_group = [80, 75, 85, 70, 78, 82, 79, 81, 77, 76]
experimental_group = [85, 88, 90, 84, 82, 87, 89, 86, 83, 88]

# Perform two-sample t-test
t_statistic, p_value = ttest_ind(control_group, experimental_group)

# Print the results
print("Two-sample t-test results:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpret the results
if p_value < 0.05:
    print("The p-value is less than 0.05, indicating a significant difference in test scores between the control and experimental groups.")
    # Perform post-hoc tests if the results are significant
    # You can use additional post-hoc tests here, such as Tukey's HSD or Bonferroni correction
    # Example: Perform pairwise t-tests between control and experimental groups
    for i in range(len(control_group)):
        print(f"Pairwise t-test for student {i+1}:")
        t_statistic_pairwise, p_value_pairwise = ttest_ind([control_group[i]], [experimental_group[i]])
        print("t-statistic:", t_statistic_pairwise)
        print("p-value:", p_value_pairwise)
else:
    print("The p-value is greater than or equal to 0.05, indicating no significant difference in test scores between the control and experimental groups.")
```

In this code:
- `control_group` and `experimental_group` represent the test scores for the control and experimental groups, respectively.
- The `ttest_ind` function from `scipy.stats` is used to perform the two-sample t-test.
- The t-statistic and p-value are printed.
- If the p-value is less than 0.05, indicating a significant difference, pairwise t-tests are performed between individual students in the control and experimental groups to identify which groups differ significantly from each other.

You can replace the example data with your actual data to perform the analysis on your dataset. Make sure you have the necessary Python libraries (`numpy` and `scipy.stats`) installed to run this code.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-hoc test to determine which store(s) differ significantly from each other.

Since the scenario involves repeated measures (sales recorded for each store on multiple days), a repeated measures ANOVA might not be appropriate. Repeated measures ANOVA typically involves measuring the same participants or subjects under different conditions or at different time points.

Instead, we can conduct a one-way ANOVA to compare the average daily sales between the three retail stores. We'll use a one-way ANOVA because it compares the means of three or more independent groups.

Here's how you can perform the analysis in Python:

```python
import numpy as np
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Example data (daily sales for Store A, Store B, and Store C)
store_A_sales = [1000, 1200, 1100, 1050, 1150, 1180, 1120, 1080, 1140, 1160, 1130, 1190, 1210, 1170, 1110, 1090, 1150, 1200, 1180, 1100, 1160, 1140, 1120, 1150, 1130, 1180, 1160, 1120, 1140, 1170]
store_B_sales = [950, 980, 1000, 1050, 990, 1020, 1030, 970, 1040, 1020, 1010, 1030, 990, 1010, 980, 1000, 1020, 1000, 980, 1050, 1000, 1020, 1040, 1010, 1030, 980, 1000, 1050, 1020, 1040]
store_C_sales = [1050, 1100, 1080, 1120, 1090, 1060, 1110, 1130, 1100, 1150, 1160, 1140, 1090, 1120, 1150, 1100, 1120, 1110, 1130, 1140, 1090, 1100, 1080, 1120, 1090, 1060, 1110, 1130, 1100, 1150]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(store_A_sales, store_B_sales, store_C_sales)

# Print the results
print("One-way ANOVA results:")
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results
if p_value < 0.05:
    print("The p-value is less than 0.05, indicating a significant difference in sales between the three stores.")
    # Perform post-hoc Tukey's HSD test
    data = np.concatenate([store_A_sales, store_B_sales, store_C_sales])
    labels = ['Store A'] * len(store_A_sales) + ['Store B'] * len(store_B_sales) + ['Store C'] * len(store_C_sales)
    tukey_results = pairwise_tukeyhsd(data, labels, alpha=0.05)
    print(tukey_results)
else:
    print("The p-value is greater than or equal to 0.05, indicating no significant difference in sales between the three stores.")
```

In this code:
- `store_A_sales`, `store_B_sales`, and `store_C_sales` represent the daily sales for Store A, Store B, and Store C, respectively.
- The `f_oneway` function from `scipy.stats` is used to perform the one-way ANOVA.
- If the p-value is less than 0.05, indicating a significant difference, a post-hoc Tukey's HSD test is performed to identify which stores differ significantly from each other.

You can replace the example data with your actual data to perform the analysis on your dataset. Make sure you have the necessary Python libraries (`numpy`, `scipy.stats`, and `statsmodels`) installed to run this code.