# PW SKILLS

## Assignment Questions 

### Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.
### Answer : 

Analysis of Variance (ANOVA) is a statistical method used to compare means across multiple groups. To apply ANOVA and ensure the validity of its results, certain assumptions must be met. These assumptions are:

Independence: Observations within and between groups should be independent. This means that the values in one group should not be related to or affect the values in another group.

Normality: The data within each group should follow a normal distribution. This assumption is more critical with smaller sample sizes, as ANOVA is known to be robust to violations of normality with larger sample sizes.

Homogeneity of Variance (Homoscedasticity): The variances of the groups being compared should be approximately equal. In other words, the spread of the data points should be similar across all groups.

Interval or Ratio Scale: The dependent variable (the variable being measured) should be measured on an interval or ratio scale. This implies that the distances between values are consistent and meaningful.

Violations of these assumptions can impact the validity of ANOVA results. Examples of violations and their potential impact include:

Non-independence: If observations are not independent, it can lead to increased Type I errors. For example, if data points within a group are related, such as repeated measures on the same subjects, it violates the independence assumption.

Non-normality: If the data is not normally distributed, especially with small sample sizes, it can lead to inaccurate p-values. Transforming the data or using non-parametric alternatives may be necessary if normality is violated.

Heterogeneity of Variance: Unequal variances across groups can lead to imprecise estimates of group means. This violation may result in an inflated Type I error rate. Transformations or robust statistical methods can be employed to address this issue.

Ordinal Data or Non-Interval Data: ANOVA assumes that the dependent variable is measured on an interval or ratio scale. If the data is ordinal or not on an interval scale, using ANOVA may not be appropriate. In such cases, non-parametric tests like the Kruskal-Wallis test may be more suitable.

It's essential to assess these assumptions before interpreting ANOVA results and consider alternative methods or transformations if the assumptions are violated. Additionally, graphical methods like Q-Q plots and residual plots can help in diagnosing violations of normality and homogeneity of variance.






### Q2. What are the three types of ANOVA, and in what situations would each be used?
### Answer : 

There are three main types of Analysis of Variance (ANOVA): One-Way ANOVA, Two-Way ANOVA, and Repeated Measures ANOVA. Each type is used in specific situations to analyze variations in data with different experimental designs:

One-Way ANOVA:

Situation: Used when comparing the means of three or more independent groups.
Example: An experiment with different treatment groups (e.g., three different types of fertilizer) to assess if there are significant differences in the mean outcome (e.g., plant growth).
Two-Way ANOVA:

Situation: Used when there are two independent categorical variables (factors) influencing the dependent variable.
Example: An experiment studying the effects of two factors simultaneously, such as the type of diet and the intensity of exercise, on weight loss. Two-Way ANOVA can analyze the main effects of each factor and their interaction.
Repeated Measures ANOVA:

Situation: Used when the same subjects are used for each treatment or measurement, leading to correlated or repeated measurements.
Example: A study measuring the blood pressure of individuals before and after different drug treatments. Repeated Measures ANOVA allows for the analysis of changes within subjects over time or under different conditions.
In summary:

One-Way ANOVA is for comparing means across three or more independent groups.
Two-Way ANOVA is for assessing the influence of two independent variables and their interaction on the dependent variable.
Repeated Measures ANOVA is for analyzing variations in data where the same subjects are measured or treated multiple times.
It's important to choose the appropriate type of ANOVA based on the experimental design and nature of the data. Misapplication of ANOVA types may lead to incorrect conclusions and interpretations. Additionally, post-hoc tests or pairwise comparisons are often performed after ANOVA to identify specific group differences when the overall test indicates statistical significance.






### Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?
### Answer : 

The partitioning of variance in ANOVA refers to the division of the total variance in the data into different components associated with various sources. Understanding this concept is crucial for interpreting ANOVA results and gaining insights into the factors contributing to variability in the dependent variable. The partitioning is typically represented in a way that decomposes the total variability into several components:

Total Variance (Total Sum of Squares - SST): This represents the overall variability in the dependent variable across all groups. It is the sum of the squared differences between each individual data point and the overall mean.

Between-Group Variance (Between-Group Sum of Squares - SSB): This component reflects the variability attributed to differences between the group means. It is the sum of the squared differences between each group mean and the overall mean, weighted by the number of observations in each group.

Within-Group Variance (Within-Group Sum of Squares - SSW): This component accounts for the variability within each group. It is the sum of the squared differences between each individual data point and its group mean.

The partitioning of variance is expressed mathematically as follows:

�
�
�
=
�
�
�
+
�
�
�
SST=SSB+SSW

Understanding this partitioning is important for several reasons:

Hypothesis Testing: ANOVA involves testing the null hypothesis that there are no significant differences between group means. The partitioning helps in assessing whether the observed differences are due to actual group effects or if they could occur by chance.

Effect Size: By examining the proportion of variance attributed to between-group differences (SSB) relative to the total variance (SST), researchers can assess the practical significance or effect size of the factors being studied.

Power Analysis: Understanding the partitioning of variance is crucial for power analysis, helping researchers determine the sample size needed to detect significant effects.

Model Evaluation: Researchers can use the partitioning to evaluate the adequacy of their model in explaining the observed variability. This is particularly relevant in the context of Two-Way ANOVA, where interactions between factors can also contribute to the overall variance.

In summary, the partitioning of variance in ANOVA provides a framework for understanding the distribution of variability in the data, aiding in hypothesis testing, effect size estimation, and the overall interpretation of the significance of group differences.

### Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?
### Answer : 

To calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) in a one-way ANOVA using Python, you can use the following steps and code:

Assuming you have a dataset with a dependent variable and a categorical independent variable representing different groups, you can use the scipy.stats library for the ANOVA analysis. Here's an example:

In [1]:
import pandas as pd
from scipy.stats import f_oneway

# Sample data (replace this with your own dataset)
data = {'Group1': [10, 12, 15, 18],
        'Group2': [8, 11, 14, 16],
        'Group3': [5, 7, 9, 11]}

df = pd.DataFrame(data)

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(df['Group1'], df['Group2'], df['Group3'])

# Calculate SST, SSE, and SSR
grand_mean = df.values.mean()
sst = ((df.values - grand_mean) ** 2).sum()
sse = sum((group - group.mean()) ** 2 for group in df.values)
ssr = sst - sse

print(f"SST: {sst}")
print(f"SSE: {sse}")
print(f"SSR: {ssr}")


SST: 164.66666666666666
SSE: [23.88888889  3.88888889 45.55555556]
SSR: [140.77777778 160.77777778 119.11111111]


In this example:

sst is the Total Sum of Squares, calculated as the sum of squared differences between each data point and the grand mean.
sse is the Explained Sum of Squares, calculated as the sum of squared differences between each group mean and the overall mean, weighted by the number of observations in each group.
ssr is the Residual Sum of Squares, calculated as the difference between SST and SSE.
Note: The f_oneway function is used here to perform the ANOVA test and obtain the F-statistic and p-value. You can adapt this code to your specific dataset and structure.

### Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?
### Answer : 

In a two-way ANOVA, you can calculate the main effects and interaction effects using Python by analyzing the sum of squares for each factor (main effects) and their interaction. The scipy.stats library can be used for this purpose. Here's an example:

In [2]:
import pandas as pd
from scipy.stats import f_oneway
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Sample data (replace this with your own dataset)
data = {'A': ['A1', 'A1', 'A2', 'A2', 'A3', 'A3'],
        'B': ['B1', 'B2', 'B1', 'B2', 'B1', 'B2'],
        'Value': [10, 12, 8, 11, 14, 16]}

df = pd.DataFrame(data)

# Fit the two-way ANOVA model
formula = 'Value ~ A + B + A:B'
model = ols(formula, df).fit()
anova_table = anova_lm(model)

# Extract main effects and interaction effects
main_effect_A = anova_table['sum_sq']['A'] / anova_table['df']['A']
main_effect_B = anova_table['sum_sq']['B'] / anova_table['df']['B']
interaction_effect = anova_table['sum_sq']['A:B'] / anova_table['df']['A:B']

print(f"Main Effect A: {main_effect_A}")
print(f"Main Effect B: {main_effect_B}")
print(f"Interaction Effect: {interaction_effect}")


Main Effect A: 16.166666666666657
Main Effect B: 8.166666666666636
Interaction Effect: 0.16666666666666793


  (model.ssr / model.df_resid))


In this example:

A and B represent the two factors in the two-way ANOVA.
Value is the dependent variable.
The ols function from statsmodels is used to fit the ANOVA model, and the anova_lm function is used to obtain the ANOVA table. The sum of squares for each factor (A, B) and their interaction (A:B) can be extracted from the table.

main_effect_A is the sum of squares for factor A divided by its degrees of freedom.
main_effect_B is the sum of squares for factor B divided by its degrees of freedom.
interaction_effect is the sum of squares for the interaction term (A:B) divided by its degrees of freedom.
These values represent the main effects of factors A and B and their interaction effect, respectively. Keep in mind that the degrees of freedom and sum of squares may vary depending on the specific structure of your dataset. Adapt the code accordingly to your data.






### Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?
### Answer : 

In a one-way ANOVA, the F-statistic is used to test the null hypothesis that the means of the groups are equal. The p-value associated with the F-statistic indicates the probability of observing such an extreme F-statistic (or more extreme) under the assumption that the null hypothesis is true.

In your case, you obtained an F-statistic of 5.23 and a p-value of 0.02. Here's how to interpret these results:

Null Hypothesis (H0): The null hypothesis in ANOVA is that there are no significant differences between the means of the groups.

Alternative Hypothesis (H1): The alternative hypothesis is that there are significant differences between the means of the groups.

Interpretation:

The low p-value (0.02) suggests that the observed F-statistic is unlikely to have occurred by chance if the null hypothesis were true.
Typically, if the p-value is less than the chosen significance level (commonly set at 0.05), you reject the null hypothesis.
In this case, with a p-value of 0.02, you would reject the null hypothesis.
Conclusion:

Based on the statistical analysis, there is sufficient evidence to conclude that there are significant differences between at least two of the groups.
Practical Significance:

While statistical significance indicates that there are differences, it does not provide information about the magnitude of those differences. It's essential to consider the practical significance or effect size to understand the real-world importance of the observed differences.
Further Analysis:

If you reject the null hypothesis, you may want to conduct post-hoc tests or pairwise comparisons to identify which specific groups differ from each other.
In summary, with an F-statistic of 5.23 and a p-value of 0.02, you have evidence to reject the null hypothesis, suggesting that there are significant differences between the groups. The next steps would involve exploring the nature and magnitude of these differences through further analyses.

### Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?
### Answer : 

Handling missing data in a repeated measures ANOVA is crucial to obtaining accurate and unbiased results. There are several methods for dealing with missing data, each with its own potential consequences. Here are some common approaches and their implications:

Complete Case Analysis (Listwise Deletion):

Approach: Exclude cases with missing data from the analysis.
Consequences:
Loss of data and potentially reduced statistical power.
If data are not missing completely at random (MCAR), it may introduce bias.
Mean Imputation:

Approach: Replace missing values with the mean of the observed values for that variable.
Consequences:
Does not account for individual variability.
May underestimate standard errors, leading to artificially narrow confidence intervals.
Can distort the true variability in the data, impacting the validity of statistical tests.
Last Observation Carried Forward (LOCF) or Next Observation Carried Backward (NOCB):

Approach: Impute missing values with the last (or next) observed value.
Consequences:
Assumes that missing values remain constant over time.
May not reflect the true trajectory of change.
Can introduce bias if there are systematic patterns in the missing data.
Linear Interpolation or LOCF/ NCB with a Pattern:

Approach: Impute missing values using linear interpolation or by considering patterns in the data.
Consequences:
Assumes a linear relationship or a specific pattern in the missing data.
The accuracy of imputation depends on the appropriateness of the assumed pattern.
Multiple Imputation:

Approach: Generate multiple imputed datasets, each reflecting the uncertainty about the missing values.
Consequences:
More accurate and unbiased estimates compared to single imputation methods.
Requires assumptions about the missing data mechanism.
Computational complexity increases.
Mixed-Effects Models (Longitudinal Data Analysis):

Approach: Utilize mixed-effects models that can handle missing data using maximum likelihood estimation.
Consequences:
Effective for handling missing data in repeated measures designs.
Assumes data are missing at random (MAR).
Weighted Estimation:

Approach: Assign different weights to observations based on the probability of being observed.
Consequences:
Addresses potential biases associated with complete case analysis.
Requires a correctly specified model for the missing data mechanism.
When choosing a method for handling missing data, it's essential to carefully consider the nature of the missing data and the assumptions underlying each approach. Multiple imputation and mixed-effects models are generally preferred when possible, as they provide more robust and unbiased results compared to simpler methods. However, the appropriateness of the method depends on the specific characteristics of the data and the assumptions made about the missing data mechanism.

### Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.
### Answer : 

After conducting an Analysis of Variance (ANOVA) and finding a significant difference among group means, post-hoc tests are often used to identify specific group differences. Some common post-hoc tests include:

Tukey's Honestly Significant Difference (HSD):

Use: Appropriate when you have three or more groups. It controls the familywise error rate and is suitable for all pairwise comparisons.
Example: In a one-way ANOVA comparing the mean scores of three different teaching methods, Tukey's HSD can be used to identify which pairs of teaching methods have significantly different mean scores.
Bonferroni Correction:

Use: Controls the familywise error rate by adjusting the significance level for each comparison. It is suitable when you have several pairwise comparisons.
Example: If you are comparing the mean scores of five different treatment groups, the Bonferroni correction can be applied to adjust the significance level for each pairwise comparison.
Scheffé's Test:

Use: Suitable for all pairwise comparisons, especially when sample sizes are unequal. It is more conservative than Tukey's HSD.
Example: In a situation where you have groups with different sample sizes, Scheffé's test might be chosen for post-hoc analysis to account for the unequal variances.
Dunnett's Test:

Use: Designed for comparing each treatment group to a control group. It is used when there is a specific control group in the study.
Example: In a drug trial, you may have a control group and several experimental groups receiving different doses of the drug. Dunnett's test can help identify which doses differ significantly from the control group.
Holm's Procedure:

Use: Controls the familywise error rate like Bonferroni but is less conservative. It is suitable when the number of comparisons is large.
Example: If you are conducting multiple pairwise comparisons in a study, Holm's procedure may be employed to adjust the significance level for each comparison.
Games-Howell Test:

Use: Appropriate when there are unequal variances and/or sample sizes among groups. It is more robust in such situations.
Example: In a study comparing the means of different species across various environmental conditions, Games-Howell may be used when there are variations in both variances and sample sizes.
Example Scenario:
Suppose you conducted a one-way ANOVA to compare the mean scores of students who received different types of tutoring methods (e.g., Method A, Method B, Method C). The ANOVA indicates a significant difference among the tutoring methods. To pinpoint which specific pairs of methods are different from each other, you might perform post-hoc tests such as Tukey's HSD or Scheffé's test. These tests help avoid Type I errors associated with conducting multiple pairwise comparisons without adjusting the significance level.






### Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.
### Answer : 

To conduct a one-way ANOVA in Python, you can use the scipy.stats library. Here's an example of how to perform the analysis with some random data:

In [3]:
import numpy as np
from scipy.stats import f_oneway

# Generate random weight loss data for three diets (replace this with your actual data)
np.random.seed(42)  # Setting seed for reproducibility
diet_A = np.random.normal(loc=2.5, scale=1.0, size=50)
diet_B = np.random.normal(loc=3.0, scale=1.2, size=50)
diet_C = np.random.normal(loc=2.8, scale=1.1, size=50)

# Combine the data
all_data = [diet_A, diet_B, diet_C]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(*all_data)

# Report the results
print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

# Interpret the results
if p_value < 0.05:
    print("There is a significant difference in mean weight loss between at least two diets.")
else:
    print("There is no significant difference in mean weight loss between the diets.")


F-statistic: 6.6789903834520095
P-value: 0.0016736573545323658
There is a significant difference in mean weight loss between at least two diets.


In this example:

diet_A, diet_B, and diet_C represent the weight loss data for participants in three different diets.
The ANOVA is performed using the f_oneway function from the scipy.stats library.
The F-statistic and p-value are printed, and the results are interpreted based on the p-value.
Interpretation:

If the p-value is less than the chosen significance level (e.g., 0.05), you would reject the null hypothesis.
If there is a significant difference, it suggests that there are significant variations in mean weight loss among at least two of the diets.
Please replace the example data with your actual data before running the code for a meaningful analysis. Additionally, consider assumptions like normality and homogeneity of variances before interpreting the results.

### Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.
### Answer: 

To conduct a two-way ANOVA in Python, you can use the statsmodels library, which provides more extensive capabilities for ANOVA, including handling interactions. Here's an example with some random data:

In [4]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Generate random data (replace this with your actual data)
np.random.seed(42)
data = {'Software': np.random.choice(['A', 'B', 'C'], size=90),
        'Experience': np.random.choice(['Novice', 'Experienced'], size=90),
        'Time': np.random.normal(loc=10, scale=2, size=90)}

df = pd.DataFrame(data)

# Fit the two-way ANOVA model
formula = 'Time ~ Software * Experience'
model = ols(formula, data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Report the results
print(anova_table)

# Interpret the results
print("\nInterpretation:")
if anova_table['PR(>F)']['Software'] < 0.05:
    print("There is a significant main effect of Software on the time to complete the task.")
else:
    print("There is no significant main effect of Software.")

if anova_table['PR(>F)']['Experience'] < 0.05:
    print("There is a significant main effect of Experience on the time to complete the task.")
else:
    print("There is no significant main effect of Experience.")

if anova_table['PR(>F)']['Software:Experience'] < 0.05:
    print("There is a significant interaction effect between Software and Experience.")
else:
    print("There is no significant interaction effect between Software and Experience.")


                         sum_sq    df         F    PR(>F)
Software               1.334021   2.0  0.193670  0.824297
Experience             5.096305   1.0  1.479736  0.227223
Software:Experience    8.396750   2.0  1.219018  0.300694
Residual             289.301266  84.0       NaN       NaN

Interpretation:
There is no significant main effect of Software.
There is no significant main effect of Experience.
There is no significant interaction effect between Software and Experience.


In this example:

Software, Experience, and Time represent the software program, experience level, and time taken to complete the task, respectively.
The two-way ANOVA model is fitted using the ols function from statsmodels.
The ANOVA table is obtained, and F-statistics and p-values are printed.
The results are interpreted based on the significance levels.
Interpretation:

Main Effect of Software: If the p-value for the 'Software' factor is below 0.05, it suggests a significant main effect of Software on the task completion time.
Main Effect of Experience: If the p-value for the 'Experience' factor is below 0.05, it indicates a significant main effect of Experience on the task completion time.
Interaction Effect: If the p-value for the 'Software:Experience' interaction term is below 0.05, it suggests a significant interaction effect between Software and Experience. This implies that the effect of Software on task completion time may depend on the level of Experience.
Replace the example data with your actual data for a meaningful analysis. Additionally, consider assumptions like normality and homogeneity of variances before interpreting the results.






### Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.
### Answer : 

To conduct a two-sample t-test in Python and perform a post-hoc test for group comparisons, you can use the scipy.stats library for the t-test and the statsmodels library for the post-hoc test (e.g., Tukey's HSD). Here's an example:

In [5]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Generate random test scores data for control and experimental groups (replace this with your actual data)
np.random.seed(42)
control_group = np.random.normal(loc=70, scale=10, size=50)
experimental_group = np.random.normal(loc=75, scale=10, size=50)

# Perform two-sample t-test
t_statistic, p_value = ttest_ind(control_group, experimental_group)

# Report the results
print(f"Two-sample t-test results:")
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Perform post-hoc test (Tukey's HSD)
data = pd.DataFrame({'Scores': np.concatenate([control_group, experimental_group]),
                     'Group': ['Control'] * 50 + ['Experimental'] * 50})

posthoc_results = pairwise_tukeyhsd(data['Scores'], data['Group'])

# Report post-hoc results
print("\nPost-hoc (Tukey's HSD) results:")
print(posthoc_results)

# Interpret the results
print("\nInterpretation:")
if p_value < 0.05:
    print("There is a significant difference in test scores between the control and experimental groups.")
    print("Post-hoc analysis is needed to identify specific group differences.")
else:
    print("There is no significant difference in test scores between the groups.")


Two-sample t-test results:
T-statistic: -4.108723928204809
P-value: 8.261945608702611e-05

Post-hoc (Tukey's HSD) results:
   Multiple Comparison of Means - Tukey HSD, FWER=0.05    
 group1    group2    meandiff p-adj  lower   upper  reject
----------------------------------------------------------
Control Experimental   7.4325 0.0001 3.8427 11.0224   True
----------------------------------------------------------

Interpretation:
There is a significant difference in test scores between the control and experimental groups.
Post-hoc analysis is needed to identify specific group differences.


In this example:

control_group and experimental_group represent the test scores for the control and experimental groups, respectively.
The two-sample t-test is performed using the ttest_ind function from scipy.stats.
Post-hoc analysis (Tukey's HSD) is conducted using the pairwise_tukeyhsd function from statsmodels.
Interpretation:

If the p-value from the two-sample t-test is below 0.05, it indicates a significant difference in test scores between the control and experimental groups.
The post-hoc analysis (Tukey's HSD) helps identify specific group differences.
Replace the example data with your actual data for a meaningful analysis. Additionally, consider assumptions like normality and homogeneity of variances before interpreting the results.






### Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-hoc test to determine which store(s) differ significantly from each other.
### Answer : 

For a repeated measures ANOVA in Python, you can use the statsmodels library, which provides functions for ANOVA and post-hoc tests. Here's an example:

In [9]:
import pandas as pd
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Generate random sales data for three stores (replace this with your actual data)
np.random.seed(42)
data = {'Store': np.random.choice(['A', 'B', 'C'], size=90),
        'Sales': np.random.normal(loc=100, scale=20, size=90)}

df = pd.DataFrame(data)

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(df['Sales'][df['Store'] == 'A'],
                                 df['Sales'][df['Store'] == 'B'],
                                 df['Sales'][df['Store'] == 'C'])

# Report the results
print(f"One-way ANOVA results:")
print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

# Perform post-hoc test (Tukey's HSD)
posthoc_results = pairwise_tukeyhsd(df['Sales'], df['Store'])

# Report post-hoc results
print("\nPost-hoc (Tukey's HSD) results:")
print(posthoc_results)

# Interpret the results
print("\nInterpretation:")
if p_value < 0.05:
    print("There is a significant difference in daily sales between the three stores.")
    print("Post-hoc analysis is needed to identify specific store differences.")
else:
    print("There is no significant difference in daily sales between the stores.")


One-way ANOVA results:
F-statistic: 1.5520108564481612
P-value: 0.21762853741240518

Post-hoc (Tukey's HSD) results:
 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
     A      B   7.4361 0.2904  -4.2923 19.1645  False
     A      C   8.2438 0.2516  -4.0511 20.5386  False
     B      C   0.8077  0.984 -10.4333 12.0486  False
-----------------------------------------------------

Interpretation:
There is no significant difference in daily sales between the stores.
