Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

ANS:
    
      ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups to determine if there are significant differences between them. However, ANOVA relies on several assumptions to be met for its results to be valid. Violations of these assumptions can impact the accuracy and reliability of the ANOVA results. The key assumptions for ANOVA are as follows:

1. Independence: The observations within each group are independent of each other. This means that the data points in one group should not be related to or influenced by the data points in other groups.

2. Normality: The data within each group should be normally distributed. This is important because ANOVA uses the normal distribution to calculate probabilities and test statistics.

3. Homogeneity of Variance (Homoscedasticity): The variances of the groups should be approximately equal. If the variances differ significantly between groups, it can lead to incorrect conclusions.

Examples of Violations and Their Impact:

1. Non-Independence: If there is a dependency or correlation between observations within groups, it can lead to biased results. For example, in a study measuring the performance of students in different classes, if the classes are taught by the same teacher and their teaching style influences student performance, the independence assumption is violated.

2. Non-Normality: If the data within groups deviate significantly from the normal distribution, ANOVA results may not be reliable. For instance, if the data in one group is heavily skewed or has extreme outliers, the normality assumption is violated.

3. Heteroscedasticity: When the variances between groups are not equal, it can lead to incorrect conclusions. For example, in a study comparing the salaries of employees across different departments in a company, if the salary variations within certain departments are much larger than others, the assumption of homogeneity of variance is violated.

Addressing Assumptions:

If the assumptions are violated, there are some possible remedies:

- Non-independence: Ensure that the data is collected independently, or use appropriate statistical techniques like hierarchical linear models (HLM) if there are nested data structures.

- Non-Normality: Consider transforming the data (e.g., logarithmic or square root transformation) to make it more normally distributed or use non-parametric alternatives like the Kruskal-Wallis test.

- Heteroscedasticity: You can try data transformations, using a weighted ANOVA, or using robust ANOVA methods to account for unequal variances.

However, it is essential to note that transforming data or using alternative methods should be done with caution, and in some cases, it might be more appropriate to use non-parametric tests when assumptions cannot be met.

Before interpreting the results of an ANOVA, it is crucial to check for the assumptions and address any violations accordingly to ensure the validity of the conclusions drawn from the analysis.  

Q2. What are the three types of ANOVA, and in what situations would each be used?

ANS:
    
    ANOVA (Analysis of Variance) is a statistical technique used to compare means across multiple groups. There are three main types of ANOVA, each suited for different situations:

1. One-Way ANOVA:
One-Way ANOVA is used when you have one categorical independent variable (also known as a factor) with more than two levels (groups), and you want to compare the means of a continuous dependent variable across these groups. For example, you could use One-Way ANOVA to compare the test scores of students from three different schools.

2. Two-Way ANOVA:
Two-Way ANOVA is used when you have two categorical independent variables (factors) and one continuous dependent variable. The two independent variables can have multiple levels, and you want to examine the interaction effect between these variables on the dependent variable. For example, you could use Two-Way ANOVA to analyze how the effect of a new teaching method on student test scores varies based on gender and age group.

3. Repeated Measures ANOVA:
Repeated Measures ANOVA (also known as Within-Subjects ANOVA) is used when you have a single group of participants, and you measure them repeatedly under different conditions. The within-subjects factor represents the different conditions, and the dependent variable is measured multiple times for each participant. This type of ANOVA is often used in experimental designs where the same participants are exposed to different treatments or conditions. For example, you could use Repeated Measures ANOVA to analyze the effect of three different diets on the weight of the same group of participants measured at three different time points.

Each type of ANOVA is suitable for different research designs and data structures. Choosing the appropriate type of ANOVA depends on the number of factors you want to study, the design of your experiment, and the nature of your data. Properly selecting the type of ANOVA ensures that you are conducting the most relevant statistical analysis for your research question and obtaining accurate and meaningful results.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

ANS:
    
    The partitioning of variance in ANOVA refers to the process of breaking down the total variability in the data into different components, each representing a different source of variation. ANOVA divides the total variability in the dependent variable into two main components: the variability explained by the factors or groups being compared, and the variability not explained by these factors, also known as the error or residual variability.

The partitioning of variance in ANOVA is important for several reasons:

1. Identifying Sources of Variation: By partitioning the total variance, ANOVA helps identify which factors or groups contribute significantly to the differences observed in the dependent variable. This is crucial for understanding the main effects of factors and any potential interactions between them.

2. Assessing Significance: ANOVA allows us to test the significance of the variance components, helping determine whether the differences between groups are statistically significant. This information is essential in drawing valid conclusions from the analysis.

3. Understanding Group Differences: By understanding how much of the variance is attributable to the factors or groups being compared, researchers can better comprehend the magnitude of differences between groups and their impact on the dependent variable.

4. Efficient Use of Data: ANOVA efficiently uses data by considering the overall variability and decomposing it into meaningful components. This can provide more statistical power than analyzing multiple pairwise comparisons separately.

5. Interpretation and Communication: The partitioning of variance provides a clear and concise way to present the results of the ANOVA analysis. Researchers can easily communicate the contributions of different factors to the total variation, aiding in the interpretation of the findings.

In summary, the partitioning of variance in ANOVA is a fundamental concept that allows researchers to understand the relative contributions of different factors to the overall variability in the data. This understanding is crucial for drawing valid conclusions, identifying significant effects, and efficiently analyzing and interpreting complex data structures with multiple groups or factors.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

ANS:
    
  In a one-way ANOVA, you can calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) using Python.

1. Total Sum of Squares (SST):
The Total Sum of Squares represents the total variability in the dependent variable. It is calculated as the sum of squared differences between each data point and the overall mean of the data. The formula for SST is:

SST = Σ (Yi - Y_mean)^2

Where Yi is each data point, and Y_mean is the overall mean of the data.

2. Explained Sum of Squares (SSE):
The Explained Sum of Squares represents the variability explained by the group means. It is calculated as the sum of squared differences between each group mean and the overall mean of the data, weighted by the number of data points in each group. The formula for SSE is:

SSE = Σ (ni * (Y_group_mean - Y_mean)^2)

Where ni is the number of data points in each group, Y_group_mean is the mean of each group, and Y_mean is the overall mean of the data.

3. Residual Sum of Squares (SSR):
The Residual Sum of Squares represents the unexplained variability in the data, also known as the error. It is calculated as the sum of squared differences between each data point and its corresponding group mean. The formula for SSR is:

SSR = Σ (Yi - Y_group_mean)^2

Where Yi is each data point, and Y_group_mean is the mean of the group to which the data point belongs.

Here's how you can calculate SST, SSE, and SSR using Python:

```python
import numpy as np

# Example data for three groups (replace with your actual data)
group1 = np.array([12, 15, 18, 21])
group2 = np.array([10, 13, 16])
group3 = np.array([9, 11, 14, 17, 20])

# Calculate the overall mean
overall_mean = np.mean(np.concatenate([group1, group2, group3]))

# Calculate the Total Sum of Squares (SST)
SST = np.sum((np.concatenate([group1, group2, group3]) - overall_mean)**2)

# Calculate the group means
group_means = [np.mean(group1), np.mean(group2), np.mean(group3)]

# Calculate the Explained Sum of Squares (SSE)
SSE = np.sum([len(group1) * (group_means[0] - overall_mean)**2,
              len(group2) * (group_means[1] - overall_mean)**2,
              len(group3) * (group_means[2] - overall_mean)**2])

# Calculate the Residual Sum of Squares (SSR)
SSR = np.sum((np.concatenate([group1 - group_means[0], group2 - group_means[1], group3 - group_means[2]])**2))

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)
```

In this example, we have three groups of data (group1, group2, and group3). We calculate the overall mean, group means, and then use these values to compute SST, SSE, and SSR as described above.  

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

ANS:
    
    In a two-way ANOVA, you can calculate the main effects and interaction effects using Python. The main effects represent the independent contributions of each factor, while the interaction effect represents how the combined effects of two factors differ from the individual effects.

Here's how you can calculate the main effects and interaction effect using Python:

1. Main Effects:
To calculate the main effects for each factor, you can simply compute the difference between the group means of each factor and the overall mean.

```python
import numpy as np

# Example data for two factors (replace with your actual data)
factor1 = np.array([10, 15, 20, 25])
factor2 = np.array([5, 10, 15, 20])
dependent_variable = np.array([12, 18, 22, 28])

# Calculate the overall mean
overall_mean = np.mean(dependent_variable)

# Calculate the means for each factor
mean_factor1 = np.mean(factor1)
mean_factor2 = np.mean(factor2)

# Calculate the main effects
main_effect_factor1 = mean_factor1 - overall_mean
main_effect_factor2 = mean_factor2 - overall_mean

print("Main Effect of Factor 1:", main_effect_factor1)
print("Main Effect of Factor 2:", main_effect_factor2)
```

2. Interaction Effect:
To calculate the interaction effect between two factors, you need to compare the group means of the combination of the two factors with the means of the individual factors.

```python
# Combine the data for both factors
combined_data = np.array([dependent_variable, factor1, factor2]).T

# Calculate the means for each combination of factors
mean_factor1_factor2 = np.mean(combined_data[:, 0][(combined_data[:, 1] == factor1) & (combined_data[:, 2] == factor2)])
mean_factor1_not_factor2 = np.mean(combined_data[:, 0][(combined_data[:, 1] == factor1) & (combined_data[:, 2] != factor2)])
mean_not_factor1_factor2 = np.mean(combined_data[:, 0][(combined_data[:, 1] != factor1) & (combined_data[:, 2] == factor2)])
mean_not_factor1_not_factor2 = np.mean(combined_data[:, 0][(combined_data[:, 1] != factor1) & (combined_data[:, 2] != factor2)])

# Calculate the interaction effect
interaction_effect = mean_factor1_factor2 - (mean_factor1_not_factor2 + mean_not_factor1_factor2) / 2

print("Interaction Effect:", interaction_effect)
```

In this example, we have two factors (factor1 and factor2) and a dependent variable. We first calculate the overall mean, means for each factor, and then compute the main effects using these means. Next, we combine the data for both factors, calculate the means for each combination of factors, and then use these means to compute the interaction effect.

Please note that in practice, it is recommended to use statistical libraries such as `scipy.stats` or `statsmodels` to perform the two-way ANOVA analysis, as they provide more robust and comprehensive methods for conducting ANOVA with different data structures and options for calculating main effects and interaction effects.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

ANS:
    
    When conducting a one-way ANOVA, the F-statistic and p-value are used to test the null hypothesis that there are no significant differences between the means of the groups. The F-statistic measures the ratio of variance between the groups to the variance within the groups, and the p-value indicates the probability of obtaining the observed F-statistic under the assumption that the null hypothesis is true.

In your case, you obtained an F-statistic of 5.23 and a p-value of 0.02. To interpret these results:

1. F-statistic:
The F-statistic of 5.23 represents the ratio of variance between the groups to the variance within the groups. A larger F-statistic indicates that the variability between the group means is relatively larger compared to the variability within each group.

2. p-value:
The p-value of 0.02 is the probability of observing the F-statistic (or a more extreme value) if the null hypothesis is true. In this case, the p-value is below the significance level of 0.05 (assuming a common significance level of 0.05), indicating that the observed differences between the group means are statistically significant.

Conclusion:
Based on the F-statistic and the p-value, you can conclude that there are significant differences between the group means. In other words, the null hypothesis of no significant differences between the groups is rejected, and you have evidence to support the alternative hypothesis that at least one of the group means is different from the others.

Interpretation:
The small p-value (0.02) suggests that the probability of obtaining such differences between the group means by chance alone is low (less than 0.02 or 2%). Therefore, you can conclude that the differences observed in the data are likely not due to random variation but rather indicate real differences in the population means. Additionally, the F-statistic of 5.23 indicates that the differences between the group means are large enough to be considered statistically significant.

In summary, the results of the one-way ANOVA suggest that there are significant differences between the groups, and you can reject the null hypothesis of no significant differences. However, it is important to conduct post-hoc tests (e.g., Tukey's HSD, Bonferroni correction) to identify which specific group means differ significantly from each other.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

ANS:
    
    Handling missing data in a repeated measures ANOVA is crucial to ensure the validity and reliability of the analysis. There are several methods to handle missing data, each with its potential consequences. Some common approaches are:

1. Complete Case Analysis (Listwise Deletion):
In this method, any participant with missing data on any variable is excluded from the analysis. It is the simplest approach but may lead to a loss of valuable information if the missing data are not missing completely at random (MCAR). Additionally, it can reduce the sample size, which may affect the statistical power and generalizability of the results.

2. Pairwise Deletion:
In this method, participants with missing data in some variables are included for those specific variables where data are available. This approach preserves more data than complete case analysis but may lead to biased estimates if the missing data are not MCAR. It can also result in different sample sizes for different variables, which may complicate the interpretation of the results.

3. Mean Substitution (Imputation):
Mean substitution involves replacing missing data with the mean of the observed data for that variable. While it retains the sample size, it can lead to an underestimation of standard errors and may not reflect the true variability of the data. This method may also introduce biases, especially if the missing data are related to the mean value.

4. Last Observation Carried Forward (LOCF):
LOCF involves using the last observed value for a participant's missing data. This method is commonly used in longitudinal studies but can be problematic as it assumes that the participant's condition does not change between observations.

5. Multiple Imputation:
Multiple imputation generates several imputed datasets, each with different plausible values for the missing data, based on the observed data and the missing data's distribution. The analysis is then performed on each imputed dataset, and the results are combined to provide a single set of valid estimates. Multiple imputation is considered one of the most robust methods for handling missing data, but it requires careful modeling and can be computationally intensive.

The potential consequences of using different methods to handle missing data include:

- Bias: Some methods may introduce biases in the results, especially if the missing data are related to the variables being analyzed.

- Loss of Statistical Power: Methods that lead to a reduction in the sample size can decrease the statistical power of the analysis, making it more challenging to detect significant effects.

- Inaccurate Estimates: Certain methods may lead to inaccurate parameter estimates and standard errors, affecting the precision and validity of the results.

- Difficulty in Interpretation: Uneven treatment of missing data across different variables or participants can complicate the interpretation of the results and make comparisons more challenging.

In conclusion, handling missing data in repeated measures ANOVA requires careful consideration of the nature of missingness and the potential consequences of different methods. Multiple imputation is generally considered the most reliable method, but each approach has its advantages and disadvantages, and the choice should be based on the specific characteristics of the data and the research question.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

ANS:
    
    After conducting an ANOVA and finding a significant overall effect (i.e., rejecting the null hypothesis of equal means), post-hoc tests are used to determine which specific groups differ significantly from each other. Post-hoc tests allow for pairwise comparisons between all possible combinations of groups, which helps identify where the significant differences lie.

Some common post-hoc tests used after ANOVA include:

1. Tukey's Honestly Significant Difference (HSD) Test:
Tukey's HSD is a conservative post-hoc test that controls the family-wise error rate, making it suitable for situations where multiple pairwise comparisons are conducted. It is often used when you have equal sample sizes across groups.

2. Bonferroni Correction:
Bonferroni correction is a simple and conservative method that adjusts the significance level for each comparison to maintain the overall family-wise error rate. It is suitable when conducting a small number of comparisons and provides a more stringent criterion for significance.

3. Sidak Correction:
Similar to Bonferroni correction, Sidak correction adjusts the significance level for multiple comparisons. However, Sidak correction is less conservative than Bonferroni and is appropriate when conducting a moderate number of comparisons.

4. Dunnett's Test:
Dunnett's test is used when comparing several treatment groups to a single control group. It controls the Type I error rate and is more powerful than other post-hoc tests in this specific scenario.

5. Fisher's Least Significant Difference (LSD) Test:
Fisher's LSD test is one of the simplest post-hoc tests, but it does not control the overall Type I error rate well. It is typically used when a significant overall effect is found, but there is no specific hypothesis about which groups differ.

Example Situation for Post-Hoc Test:
Suppose you conducted an experiment to test the effect of three different treatments (A, B, and C) on the test scores of students. After conducting a one-way ANOVA, you find a significant difference among the three treatments (p < 0.05). To identify which specific treatments differ significantly from each other, you can use post-hoc tests.

For example, you could use Tukey's HSD to perform all possible pairwise comparisons between treatments A, B, and C. The post-hoc test would indicate which specific pairs of treatments have significantly different effects on the test scores.

Without a post-hoc test, you wouldn't know which individual treatments contributed to the overall significant effect, and the ANOVA results alone would not provide the specific information needed to draw conclusions about the treatment effects on the students' test scores. Post-hoc tests are essential to gain a deeper understanding of the differences between the groups and to make more meaningful and accurate interpretations of the results.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

ANS:
    
    To conduct a one-way ANOVA in Python to compare the mean weight loss of three diets (A, B, and C) with data from 50 participants, you can use the `scipy.stats` library. Here's how you can perform the analysis:

```python
import numpy as np
import scipy.stats as stats

# Example data for weight loss for each diet (replace with your actual data)
diet_A = np.array([2.5, 3.0, 2.2, 1.8, 2.9, ...])  # 50 values
diet_B = np.array([3.5, 2.8, 3.2, 3.7, 3.0, ...])  # 50 values
diet_C = np.array([1.8, 2.0, 2.5, 1.5, 2.1, ...])  # 50 values

# Combine the data for all diets
all_data = np.concatenate([diet_A, diet_B, diet_C])

# Create labels for each diet to use in the ANOVA
labels = ['A'] * len(diet_A) + ['B'] * len(diet_B) + ['C'] * len(diet_C)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print the results
print("F-statistic:", f_statistic)
print("P-value:", p_value)

# Check for statistical significance at the 0.05 significance level
alpha = 0.05
if p_value <= alpha:
    print("There are significant differences between the mean weight loss of the three diets.")
else:
    print("There are no significant differences between the mean weight loss of the three diets.")
```

In this example, we have data for the weight loss of 50 participants for each of the three diets (A, B, and C). We use `np.concatenate` to combine all the data into a single array, and we create labels for each diet using a list comprehension. We then use `scipy.stats.f_oneway()` to perform the one-way ANOVA, which returns the F-statistic and p-value.

The F-statistic measures the ratio of variance between the group means to the variance within the groups. The p-value represents the probability of observing the F-statistic (or a more extreme value) under the null hypothesis that there are no significant differences between the group means.

Based on the p-value, you can interpret the results as follows:
- If the p-value is less than or equal to the significance level (e.g., 0.05), you would reject the null hypothesis, and you can conclude that there are significant differences between the mean weight loss of the three diets.
- If the p-value is greater than the significance level, you would fail to reject the null hypothesis, and you would conclude that there are no significant differences between the mean weight loss of the three diets.

Remember that statistical significance does not necessarily imply practical significance, so even if the results are statistically significant, it is essential to consider the magnitude of the differences and the context of the study when interpreting the results.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

ANS:
    
     To conduct a two-way ANOVA in Python to analyze the average time it takes to complete a task using three different software programs (A, B, and C) based on employee experience level (novice vs. experienced), you can use the `statsmodels` library. It allows you to perform more comprehensive ANOVA analysis, including testing main effects and interaction effects. Here's how you can do it:

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data (replace with your actual data)
data = {
    'Time': [12.5, 14.2, 11.8, 10.9, 13.6, ...],  # 30 values for each combination of program and experience level
    'Program': ['A']*30 + ['B']*30 + ['C']*30,
    'Experience': ['Novice']*15 + ['Experienced']*15 + ['Novice']*15 + ['Experienced']*15 + ['Novice']*15 + ['Experienced']*15
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA results
print(anova_table)
```

The output will provide F-statistics and p-values for the main effects of software programs, the main effects of employee experience level, and the interaction effect between software programs and employee experience level.

Interpreting the results:
- If the p-value for the main effect of software programs (C(Program)) is less than the significance level (e.g., 0.05), it indicates that there are significant differences in the average task completion time between at least two software programs.
- If the p-value for the main effect of employee experience level (C(Experience)) is less than the significance level, it indicates that there are significant differences in the average task completion time between novice and experienced employees, irrespective of the software programs.
- If the p-value for the interaction effect (C(Program):C(Experience)) is less than the significance level, it indicates that the effect of software programs on task completion time varies significantly depending on the employee experience level.

Remember that statistical significance alone does not imply practical significance, so consider the effect sizes and the context of the study when interpreting the results. Additionally, if the assumptions of ANOVA are not met (e.g., normality, homogeneity of variance), you may need to explore alternative methods or transformations to analyze the data appropriately.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

ANS:

To conduct a two-sample t-test in Python to compare the test scores between the control group (traditional teaching method) and the experimental group (new teaching method), you can use the `scipy.stats` library. Additionally, if the t-test results are significant, you can perform a post-hoc test, such as the Tukey's Honestly Significant Difference (HSD) test, to determine which groups differ significantly from each other. Here's how you can do it:

```python
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.multicomp as mc

# Example data (replace with your actual data)
control_group = np.array([85, 78, 92, 75, 80, ...])  # Test scores for the control group (100 values)
experimental_group = np.array([90, 88, 95, 85, 92, ...])  # Test scores for the experimental group (100 values)

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Print the t-test results
print("T-statistic:", t_statistic)
print("P-value:", p_value)

# Check for statistical significance at the 0.05 significance level
alpha = 0.05
if p_value <= alpha:
    print("There is a significant difference in test scores between the two groups.")
else:
    print("There is no significant difference in test scores between the two groups.")

# Perform post-hoc test (Tukey's HSD) if the results are significant
if p_value <= alpha:
    data = np.concatenate([control_group, experimental_group])
    groups = ['Control'] * len(control_group) + ['Experimental'] * len(experimental_group)
    tukey_result = mc.MultiComparison(data, groups).tukeyhsd()
    print(tukey_result.summary())
```

The output will provide the t-statistic and p-value from the two-sample t-test, indicating whether there is a significant difference in test scores between the control and experimental groups. If the p-value is less than or equal to the significance level (e.g., 0.05), it indicates a significant difference.

If the results are significant, the code proceeds to perform the Tukey's HSD post-hoc test using `statsmodels.stats.multicomp.MultiComparison.tukeyhsd()`. This test will compare all possible pairwise combinations of groups (Control vs. Experimental) and provide confidence intervals and adjusted p-values for each comparison.

Interpreting the results:
- If the p-value from the t-test is less than or equal to the significance level, you can conclude that there is a significant difference in test scores between the two groups.
- If the post-hoc test (Tukey's HSD) also indicates significant differences between specific pairs of groups, you can identify which groups differ significantly from each other.

Remember that when performing multiple statistical tests (such as post-hoc tests), it's essential to adjust the significance level to control the family-wise error rate. The Tukey's HSD test already provides adjusted p-values, so you can use those for significance testing. Additionally, consider effect sizes and the context of the study when interpreting the results.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

ANS:
    
    
 A repeated measures ANOVA is not appropriate for this scenario because it is used to compare the means of the same group of participants under different conditions or at different time points. In this case, you are comparing the average daily sales between three different retail stores, which involves independent groups, not repeated measures.

To analyze the average daily sales between three independent retail stores (Store A, Store B, and Store C), you should use a one-way ANOVA instead. Here's how you can do it in Python using the `scipy.stats` library:

```python
import numpy as np
import scipy.stats as stats

# Example data (replace with your actual data)
store_A_sales = np.array([1000, 1200, 1100, ..., 1300])  # 30 daily sales values for Store A
store_B_sales = np.array([900, 950, 1050, ..., 1100])   # 30 daily sales values for Store B
store_C_sales = np.array([800, 850, 950, ..., 1000])   # 30 daily sales values for Store C

# Combine the data for all three stores
all_sales = np.concatenate([store_A_sales, store_B_sales, store_C_sales])

# Create labels for each store to use in the ANOVA
labels = ['Store A'] * len(store_A_sales) + ['Store B'] * len(store_B_sales) + ['Store C'] * len(store_C_sales)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(store_A_sales, store_B_sales, store_C_sales)

# Print the ANOVA results
print("F-statistic:", f_statistic)
print("P-value:", p_value)

# Check for statistical significance at the 0.05 significance level
alpha = 0.05
if p_value <= alpha:
    print("There is a significant difference in average daily sales between the three stores.")
else:
    print("There is no significant difference in average daily sales between the three stores.")
```

If the one-way ANOVA results are significant, indicating a significant difference in average daily sales between the three stores, you can follow up with post-hoc tests such as Tukey's HSD or Bonferroni correction to determine which specific store(s) differ significantly from each other. However, keep in mind that post-hoc tests are only necessary when the overall ANOVA test is significant.

Remember to adjust the significance level in post-hoc tests to control the family-wise error rate if you perform multiple comparisons. The choice of post-hoc test may depend on the specific characteristics of your data and research question.   