##  Assignment 37 - 13 March 2023 : Divya Pardeshi

__Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.__

__Ans.__

Analysis of Variance (ANOVA) is a statistical method used to compare means among multiple groups. However, ANOVA comes with certain assumptions that must be met for the results to be valid and reliable. Violations of these assumptions can lead to incorrect conclusions. The main assumptions for using ANOVA are:

1. **Normality**: The residuals (differences between observed values and group means) should follow a normal distribution. This assumption is particularly important when the sample sizes are small. Violations of normality might lead to biased results.

2. **Homogeneity of Variance (Homoscedasticity)**: The variances of the residuals should be roughly equal across all groups. Homoscedasticity ensures that the groups have similar variability. Violations can lead to an inflated Type I error rate.

3. **Independence**: The observations within each group should be independent of each other. This means that the values within one group should not be influenced by the values in other groups. Violations of independence can distort the standard error estimates and lead to inaccurate p-values.

Examples of Violations:

1. **Normality Violation**: If the residuals are not normally distributed, the ANOVA results might be unreliable. For example, if the residuals are skewed or have heavy tails, the p-values for group differences may not accurately reflect the true significance.

2. **Homoscedasticity Violation**: If the variances of the residuals are not equal across groups, the assumptions underlying the F-test may not hold. For instance, if one group has much larger variances than the others, it might dominate the F-test and lead to incorrect conclusions.

3. **Independence Violation**: If observations are not independent within groups, it can lead to biased standard error estimates and artificially low p-values. For instance, if repeated measurements are taken from the same subjects, the independence assumption might be violated.

It's important to assess these assumptions before interpreting ANOVA results. Various graphical methods (such as residual plots, normality plots, and box plots) can be used to check for violations. If assumptions are severely violated, alternative analysis methods or transformations might be necessary.

__Q2. What are the three types of ANOVA, and in what situations would each be used?__

__Ans.__

There are three main types of Analysis of Variance (ANOVA), each designed for different situations and experimental designs:

1. **One-Way ANOVA**:
   - **Design**: Used when there is a single categorical independent variable with three or more levels (groups).
   - **Example**: Comparing the effect of different teaching methods (A, B, C) on student test scores.

2. **Two-Way ANOVA**:
   - **Design**: Used when there are two categorical independent variables, also known as factors, and their interaction, which leads to multiple combinations or cells.
   - **Example**: Investigating the effects of both gender (male, female) and treatment (A, B) on a response variable like blood pressure.

3. **Repeated Measures ANOVA** (or Within-Subjects ANOVA):
   - **Design**: Used when the same subjects are measured under different conditions, resulting in related samples or repeated measurements.
   - **Example**: Examining the effects of three different diets (A, B, C) on the weight of the same group of individuals measured over time.

Each type of ANOVA addresses different research questions and experimental designs. It's important to choose the appropriate type of ANOVA based on the structure of your data and research objectives.

__Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?__

__Ans.__

The partitioning of variance in ANOVA refers to the process of decomposing the total variability observed in the data into different sources of variability. This decomposition helps to understand the relative contributions of different factors or sources to the observed variation. In ANOVA, the total variance of the dependent variable is divided into several components:

1. **Between-Group Variance (Treatment Variance)**: This is the variance between different groups or levels of the independent variable. It measures how much the group means differ from each other.

2. **Within-Group Variance (Error Variance)**: This is the variance within each group. It measures the variability of the individual data points around their respective group means.

3. **Total Variance**: This is the overall variability in the data. It's the sum of the between-group variance and the within-group variance.

Understanding the partitioning of variance is important for several reasons:

1. **Identifying Sources of Variation**: By partitioning variance, ANOVA helps us identify whether the observed differences between group means are significant and whether they can be attributed to the factors under study (such as treatments or conditions).

2. **Assessing Significance**: ANOVA provides a framework for assessing the significance of differences between group means. It does so by comparing the between-group variance to the within-group variance.

3. **Hypothesis Testing**: The partitioning of variance forms the basis for the F-test used in ANOVA. The F-statistic quantifies the ratio of between-group variance to within-group variance, helping us determine whether the group means are significantly different.

4. **Interpreting Results**: By understanding how much of the total variance is explained by the factors being studied and how much is due to random variability, researchers can better interpret the practical significance of their findings.

5. **Designing Experiments**: Partitioning variance helps researchers design experiments by considering the factors that contribute most to the overall variability. This can guide decisions about sample sizes, treatment conditions, and control measures.

Overall, the partitioning of variance in ANOVA provides a structured and quantitative way to analyze and interpret the differences among groups, helping researchers draw meaningful conclusions from their data.

__Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?__

__Ans.__

In [1]:
import pandas as pd
from statsmodels.formula.api import ols
import seaborn as sns
from statsmodels.stats.anova import anova_lm

# Loading Tips dataset from seaborn
df_tips = sns.load_dataset('tips')
print('Top 5 rows of Tips dataset:')
print(df_tips.head())
print('\n===================================================================\n')

# Fit the one-way ANOVA model (total_bill vs Day)
model = ols('total_bill ~ day', data=df_tips).fit()

# Calculate the sum of squares for the model
print('Values for Total Bill vs Day:')
SSE = model.ess
SSR = model.ssr
SST = SSE + SSR

print('SSE:', round(SSE, 4))
print('SSR:', round(SSR, 4))
print('SST:', round(SST, 4))

print('\n===================================================================\n')
# Print the ANOVA table
print(anova_lm(model))


Top 5 rows of Tips dataset:
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4


Values for Total Bill vs Day:
SSE: 643.9414
SSR: 18614.5227
SST: 19258.4641


             df        sum_sq     mean_sq         F    PR(>F)
day         3.0    643.941362  214.647121  2.767479  0.042454
Residual  240.0  18614.522721   77.560511       NaN       NaN


__Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?__

__Ans.__

In [2]:
import pandas as pd
from statsmodels.formula.api import ols
import seaborn as sns
import statsmodels.api as sm

# Load the Tips dataset from Seaborn
df_tips = sns.load_dataset('tips')
print('Top 5 rows of Tips dataset:')
print(df_tips.head())
print('\n==============================================================\n')

# Define the model formula
model_formula = "total_bill ~ C(day) + C(time) + C(day):C(time)"

# Fit the model using OLS regression
model = ols(model_formula, df_tips).fit()

# Calculate the main effects and interaction effects
main_effects = sm.stats.anova_lm(model, typ=2)['sum_sq'][:2]
interaction_effect = sm.stats.anova_lm(model, typ=2)['sum_sq'][2:3]

# Print the results
print("Main effects:")
print(main_effects)
print("\n==============================\n")
print("Interaction effect:")
print(interaction_effect)
print("\n==============================\n")
print("ANOVA Table:")
print(sm.stats.anova_lm(model, typ=2))


Top 5 rows of Tips dataset:
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4


Main effects:
C(day)     165.305544
C(time)    612.986863
Name: sum_sq, dtype: float64


Interaction effect:
C(day):C(time)    94.67622
Name: sum_sq, dtype: float64


ANOVA Table:
                      sum_sq     df         F    PR(>F)
C(day)            165.305544    3.0  0.712428  0.491495
C(time)           612.986863    1.0  7.925488  0.005283
C(day):C(time)     94.676220    3.0  0.408032  0.747369
Residual        18407.808794  238.0       NaN       NaN




In [3]:
model.summary()

0,1,2,3
Dep. Variable:,total_bill,R-squared:,0.044
Model:,OLS,Adj. R-squared:,0.024
Method:,Least Squares,F-statistic:,2.2
Date:,"Fri, 25 Aug 2023",Prob (F-statistic):,0.0551
Time:,23:28:03,Log-Likelihood:,-873.67
No. Observations:,244,AIC:,1759.0
Df Residuals:,238,BIC:,1780.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,17.6648,1.126,15.688,0.000,15.447,19.883
C(day)[T.Fri],-4.8190,3.510,-1.373,0.171,-11.733,2.095
C(day)[T.Sat],0.8307,4.422,0.188,0.851,-7.881,9.543
C(day)[T.Sun],1.3150,4.426,0.297,0.767,-7.404,10.034
C(time)[T.Dinner],1.1152,8.866,0.126,0.900,-16.351,18.582
C(day)[T.Fri]:C(time)[T.Dinner],5.7024,9.803,0.582,0.561,-13.610,25.015
C(day)[T.Sat]:C(time)[T.Dinner],0.8307,4.422,0.188,0.851,-7.881,9.543
C(day)[T.Sun]:C(time)[T.Dinner],1.3150,4.426,0.297,0.767,-7.404,10.034

0,1,2,3
Omnibus:,45.019,Durbin-Watson:,1.932
Prob(Omnibus):,0.0,Jarque-Bera (JB):,66.138
Skew:,1.107,Prob(JB):,4.35e-15
Kurtosis:,4.265,Cond. No.,4940000000000000.0


__Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?__

__Ans.__

In the context of a one-way ANOVA, the F-statistic and the associated p-value provide important information about the differences between the groups being compared. Let's break down the interpretation of the results:

1. **F-Statistic**: The F-statistic is a measure of the variability between group means relative to the variability within the groups. It quantifies whether the observed differences in means are greater than what would be expected due to random chance alone.

2. **P-Value**: The p-value associated with the F-statistic indicates the probability of observing such extreme or more extreme results if the null hypothesis is true. In this case, the null hypothesis usually states that there are no significant differences between the group means.

As per given information:
- F-Statistic: 5.23
- P-Value: 0.02

Interpretation:
- The p-value (0.02) is less than the common significance level of 0.05. This suggests that the observed differences between the groups' means are statistically significant.

- Since the p-value is below the significance level, you would reject the null hypothesis. This means that there are likely significant differences between at least two of the groups.

- The F-statistic of 5.23 indicates that the variability between group means is larger than the variability within the groups. This further supports the idea that the groups are not all the same.

- In practical terms, you would conclude that there is evidence to suggest that at least one group mean is different from the others. However, the ANOVA itself does not tell you which specific groups are different from each other.

To determine which specific groups differ from each other, you might perform post-hoc tests or pairwise comparisons (e.g., Tukey's HSD test, Bonferroni correction) to identify which pairs of groups have significantly different means.

In summary, an F-statistic of 5.23 with a p-value of 0.02 suggests that there are significant differences between the groups, and you would reject the null hypothesis of equal group means. Further analyses would be needed to determine exactly which groups differ from each other.

__Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?__

__Ans.__

Handling missing data in a repeated measures ANOVA is crucial to ensure the validity and accuracy of your analysis. There are several methods you can use to handle missing data, each with its own implications. Let's explore how to handle missing data and the potential consequences of using different methods:

1. **Listwise Deletion (Complete Case Analysis)**:
   - This method involves removing entire cases (subjects) with missing data from the analysis.
   - Pros: Simple to implement, retains only complete cases for analysis.
   - Cons: Reduces sample size, may lead to biased results if missingness is related to the outcome or other variables.

2. **Pairwise Deletion**:
   - This method retains cases with available data for each specific comparison.
   - Pros: Retains more data than listwise deletion, doesn't require removing entire cases.
   - Cons: May lead to biased results, as different analyses will use different subsets of data.

3. **Mean Imputation**:
   - Missing values are replaced with the mean of the available data for that variable.
   - Pros: Simple to implement, retains all cases.
   - Cons: Distorts variance and covariance estimates, potentially affects statistical significance and confidence intervals.

4. **Last Observation Carried Forward (LOCF)**:
   - Missing values are replaced with the last observed value for that subject.
   - Pros: Preserves temporal order, appropriate for some longitudinal studies.
   - Cons: May not accurately represent missing values, assumes a constant pattern.

5. **Linear Interpolation**:
   - Missing values are estimated based on linear interpolation between neighboring time points.
   - Pros: Preserves temporal order, can be useful for time series data.
   - Cons: Assumes linear relationships, may not work well for nonlinear data.

6. **Multiple Imputation**:
   - Generates multiple complete datasets by imputing missing values using statistical methods. Analysis is performed on each dataset, and results are pooled.
   - Pros: Handles missing data more rigorously, provides estimates of uncertainty.
   - Cons: More complex to implement, may require assumptions about missing data mechanism.

The potential consequences of using different methods to handle missing data include:

- **Bias**: Some methods may introduce bias by inflating or deflating the estimates of effects. For example, mean imputation tends to underestimate variability.

- **Inaccurate Standard Errors**: Incorrectly handling missing data can lead to incorrect standard errors and p-values, affecting the validity of inferential statistics.

- **Misinterpretation**: Using listwise or pairwise deletion can lead to incorrect conclusions about the relationships between variables.

- **Efficiency**: Some methods may be more efficient than others in terms of retaining information and maintaining statistical power.

It's important to choose a method that aligns with your data characteristics and research questions while considering the potential consequences of the chosen method. Multiple imputation is generally considered a robust approach that addresses many of the issues associated with missing data, but it may require more advanced statistical knowledge and software. 

__Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.__

__Ans.__

Post-hoc tests are used after conducting an ANOVA to make pairwise comparisons between group means when a significant difference has been detected in the overall ANOVA. These tests help identify which specific groups differ from each other. There are several common post-hoc tests, each designed to address different levels of control over Type I error rates (familywise error rate) and assumptions about the data. Let's look at a few common post-hoc tests and when to use them:

1. **Tukey's Honestly Significant Difference (HSD) Test**:
   - Controls the familywise error rate, making it suitable for situations where multiple pairwise comparisons are conducted.
   - Assumes homogeneity of variances across groups.
   - Example: You conducted a one-way ANOVA to compare the effects of different teaching methods on test scores among multiple groups. Tukey's HSD can be used to identify which specific pairs of teaching methods have significantly different effects.

2. **Bonferroni Correction**:
   - Controls the familywise error rate by dividing the desired significance level (e.g., 0.05) by the number of comparisons.
   - More conservative than Tukey's HSD.
   - Suitable for cases where a large number of pairwise comparisons are conducted.
   - Example: You're analyzing differences in customer satisfaction scores among different product categories. Since you're conducting numerous pairwise comparisons, Bonferroni correction can help manage the increased risk of Type I error.

3. **Dunnett's Test**:
   - Used when one group (control) is compared against multiple other groups.
   - Controls the familywise error rate specifically for the control-to-other-group comparisons.
   - Example: You're comparing the effectiveness of a new drug to a control group and want to determine which doses of the new drug are significantly different from the control.

4. **Scheffe's Test**:
   - Offers greater control over Type I error rates but is more conservative than other tests.
   - Suitable for situations with unequal sample sizes and unequal variances.
   - Example: You're conducting a multi-factorial ANOVA with unequal group sizes and want to make pairwise comparisons that have a high level of control over Type I errors.

5. **Games-Howell Test**:
   - Suitable when the assumption of homogeneity of variances is violated.
   - Allows for unequal variances across groups.
   - Example: You're analyzing data on reaction times among different groups but find that the assumption of equal variances is not met. The Games-Howell test can be used for pairwise comparisons.

The need for a post-hoc test arises when you've performed an ANOVA and found a significant overall effect. However, the ANOVA doesn't indicate which specific groups are different from each other. In such cases, post-hoc tests provide a systematic way to identify these differences while managing the risk of Type I errors. The choice of post-hoc test depends on the research question, the assumptions of the data, and the level of control over error rates desired.

__Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.__

__Ans.__

In [4]:
import numpy as np
from scipy.stats import f_oneway

# Generate simulated data assuming normal distribution with the same variance
np.random.seed(1)
diet_A = np.random.normal(5, 1, 50)
diet_B = np.random.normal(4, 1, 50)
diet_C = np.random.normal(3, 1, 50)

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(diet_A, diet_B, diet_C)

# Set significance level (alpha)
alpha = 0.05

# Define hypotheses
null_hypothesis = "The mean weight loss is the same for all three diets."
alternate_hypothesis = "The mean weight loss is different for at least one diet."

# Print ANOVA results
print("One-Way ANOVA Results:")
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results based on the significance level
if p_value < alpha:
    print("We reject the null hypothesis.")
    print("Conclusion:", alternate_hypothesis)
else:
    print("We fail to reject the null hypothesis.")
    print("Conclusion:", null_hypothesis)


One-Way ANOVA Results:
F-statistic: 57.06379442059458
p-value: 4.5619061215783055e-19
We reject the null hypothesis.
Conclusion: The mean weight loss is different for at least one diet.


__Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.__

__Ans.__

In [5]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Set random seed for reproducibility
np.random.seed(123)

# Generate random time samples for novice and expert
time_novice = np.random.normal(loc=15, scale=2, size=30)
time_expert = np.random.normal(loc=10, scale=2, size=30)

# Create simulated data
data = pd.DataFrame({
    'Software': np.repeat(['A', 'B', 'C'], 20),
    'Experience': np.tile(['Novice', 'Experienced'], 30),
    'Time': np.concatenate([time_novice, time_expert])
})

# Display the simulated data
print('Simulated Data:')
print(data.head())
print('\n=============================================\n')

# Fit the two-way ANOVA model
model = ols('Time ~ C(Software) + C(Experience) + C(Software):C(Experience)', data=data).fit()
anova_table = anova_lm(model, typ=2)

# Set significance level
alpha = 0.05

# Print ANOVA table
print("ANOVA Table:")
print(anova_table)
print('\n=============================================\n')

# Interpret the results
print("Interpretation:")
for factor in anova_table.index:
    p_value = anova_table.loc[factor, 'PR(>F)']
    if p_value < alpha:
        print(f"There is a significant effect of {factor}.")
    else:
        print(f"There is no significant effect of {factor}.")



Simulated Data:
  Software   Experience       Time
0        A       Novice  12.828739
1        A  Experienced  16.994691
2        A       Novice  15.565957
3        A  Experienced  11.987411
4        A       Novice  13.842799


ANOVA Table:
                               sum_sq    df          F    PR(>F)
C(Software)                204.881181   2.0  11.507860  0.000069
C(Experience)                0.246230   1.0   0.027661  0.868530
C(Software):C(Experience)    0.455966   2.0   0.025611  0.974726
Residual                   480.696856  54.0        NaN       NaN


Interpretation:
There is a significant effect of C(Software).
There is no significant effect of C(Experience).
There is no significant effect of C(Software):C(Experience).
There is no significant effect of Residual.


__Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.__

__Ans.__

In [6]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Set random seed for reproducibility
np.random.seed(123)

# Generate test scores for control and experimental groups
control_scores = np.random.normal(loc=75, scale=10, size=50)
experimental_scores = np.random.normal(loc=80, scale=10, size=50)

# Create a DataFrame with the data
data = pd.DataFrame({
    'Group': np.repeat(['Control', 'Experimental'], 50),
    'Scores': np.concatenate([control_scores, experimental_scores])
})

# Display the first few rows of the data
print('Simulated Data:')
print(data.head())
print('\n=============================================\n')

# Perform a two-sample t-test
t_statistic, p_value = ttest_ind(control_scores, experimental_scores)

# Set significance level
alpha = 0.05

# Print t-test results
print("Two-sample t-test results:")
print("T-statistic:", t_statistic)
print("P-value:", p_value)
print('\n=============================================\n')

# Perform a post-hoc Tukey's HSD test
posthoc = pairwise_tukeyhsd(data['Scores'], data['Group'])

# Print post-hoc test results
print("Post-hoc Tukey's HSD test results:")
print(posthoc)


Simulated Data:
     Group     Scores
0  Control  64.143694
1  Control  84.973454
2  Control  77.829785
3  Control  59.937053
4  Control  69.213997


Two-sample t-test results:
T-statistic: -2.315158728279605
P-value: 0.022690065589586535


Post-hoc Tukey's HSD test results:
   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
Control Experimental   5.2768 0.0227 0.7537 9.7998   True
---------------------------------------------------------


#### Conclusion from Tukey's HSD Test:

* The adjusted p-value (0.0227) is less than the significance level (0.05).
* The mean difference of approximately 5.2768 indicates a statistically significant difference in test scores.
* The "True" value under the "reject" column suggests that the means are significantly different between the Control and Experimental groups.

#### Overall Conclusion:

The new teaching method (Experimental group) has led to a statistically significant improvement in student test scores compared to the traditional teaching method (Control group).
In summary, the t-test and Tukey's HSD test results provide strong evidence that the new teaching method has a positive impact on student test scores compared to the traditional method.

__Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any
significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.__

__Ans.__

Assume significance value of 0.05

In [7]:
import numpy as np
import pandas as pd
from statsmodels.stats.anova import AnovaRM
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# set random seed for reproducibility
np.random.seed(456)

# generate sales data for Store A, B, and C
sales_a = np.random.normal(loc=1000, scale=100, size=(30,))
sales_b = np.random.normal(loc=1050, scale=150, size=(30,))
sales_c = np.random.normal(loc=800, scale=80, size=(30,))

# create a DataFrame to store the sales data
sales_df = pd.DataFrame({'Store A': sales_a, 'Store B': sales_b, 'Store C': sales_c})

# reshape the DataFrame for repeated measures ANOVA
sales_melted = pd.melt(sales_df.reset_index(), id_vars=['index'], value_vars=['Store A', 'Store B', 'Store C'])
sales_melted.columns = ['Day', 'Store', 'Sales']

# Printing top 5 rows of generated data
print('Generated data top 5 rows : ')
print(sales_melted.head())

print('\n================================================\n')

# perform repeated measures ANOVA
rm_anova = AnovaRM(sales_melted, 'Sales', 'Day', within=['Store'])
rm_results = rm_anova.fit()
print(rm_results)

# check if null hypothesis should be rejected based on p-value
if rm_results.anova_table['Pr > F'][0] < 0.05:
    # perform post-hoc Tukey test
    print('Reject the Null Hypothesis : Atleast one of the group has different mean.\n')
    print('Tukey HSD posthoc test:')
    tukey_results = pairwise_tukeyhsd(sales_melted['Sales'], sales_melted['Store'])
    print(tukey_results)
else:
    print('NO significant difference between groups.')

Generated data top 5 rows : 
   Day    Store        Sales
0    0  Store A   933.187150
1    1  Store A   950.179048
2    2  Store A  1061.857582
3    3  Store A  1056.869225
4    4  Store A  1135.050948


               Anova
      F Value Num DF  Den DF Pr > F
-----------------------------------
Store 51.5040 2.0000 58.0000 0.0000

Reject the Null Hypothesis : Atleast one of the group has different mean.

Tukey HSD posthoc test:
    Multiple Comparison of Means - Tukey HSD, FWER=0.05    
 group1  group2  meandiff p-adj    lower     upper   reject
-----------------------------------------------------------
Store A Store B   21.2439 0.6945   -40.881   83.3688  False
Store A Store C -207.8078    0.0 -269.9328 -145.6829   True
Store B Store C -229.0517    0.0 -291.1766 -166.9268   True
-----------------------------------------------------------


#### Conclusion: 

1. In Repeated Measure ANOVA test we got p_value (Pr>F) as 0.0000 which is less than 0.05 .Reject the Null Hypothesis .Which means atleast one of the mean of groups is different.

2. In Tukey's Post Hoc Test we get following interpretation :
    * No significant difference between sales of Store A and Store B. Store B earns 21.24 dollars more than store A(becuse reject=False for this)
    * Significant difference between sales of Store A and Store C . Store C has approx 207.8 dollars lesser compared to store A (reject=True)
    * Siginficant difference between sales of Store B and Store C . Store C has approx 229.0 dollars lesser compared to store B (reject=True)