#Answer  1 :



ANOVA relies on several assumptions to ensure valid results:

Normality: The residuals (errors) for each group should be normally distributed. Violation: Non-normal residuals can lead to inflated Type I error (false positives).
Homogeneity of variance: The variances of the dependent variable should be equal across all groups. Violation: Unequal variances can affect the F-statistic's accuracy.
Independence: Observations within and between groups must be independent. Violation: Dependence can make the F-statistic unreliable.
Random sampling: Groups should be formed by random sampling from the population. Violation: Non-random sampling can bias the results.

#Answer 2  :


One-way ANOVA: Compares means between two or more groups with one independent variable (e.g., comparing weight loss with three diets).
Two-way ANOVA: Examines the effects of two independent variables and their interaction (e.g., software program and experience level on task completion time).
Repeated measures ANOVA: Analyzes data where the same subjects are measured under multiple conditions (e.g., daily sales of three stores).


#Answer  3 :


ANOVA partitions the total variability (sum of squares) in the data into two components:

Explained sum of squares (SSE): Represents the variation attributable to the independent variable(s).
Residual sum of squares (SSR): Represents the unexplained variation due to error or other factors.


In [1]:
#Answer  4 :

import numpy as np
import pandas as pd

# Example data
data = {'Group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
df = pd.DataFrame(data)

# Overall mean
overall_mean = df['Value'].mean()

# Group means
group_means = df.groupby('Group')['Value'].mean()

# Total sum of squares (SST)
sst = ((df['Value'] - overall_mean) ** 2).sum()

# Between-groups sum of squares (SSB)
ssb = sum(df.groupby('Group').size() * (group_means - overall_mean) ** 2)

# Within-groups sum of squares (SSW)
ssw = ((df['Value'] - df['Group'].map(group_means)) ** 2).sum()

print(f"SST: {sst}, SSB: {ssb}, SSW: {ssw}")


SST: 60.0, SSB: 54.0, SSW: 6.0


In [2]:
#Answer 5  :

import pandas as pd
import numpy as np
import scipy.stats as stats

# Generating example data
np.random.seed(0)
diet_A = np.random.normal(5, 2, 50)
diet_B = np.random.normal(6, 2, 50)
diet_C = np.random.normal(4, 2, 50)

data = {'Diet': ['A'] * 50 + ['B'] * 50 + ['C'] * 50,
        'Weight_Loss': np.concatenate([diet_A, diet_B, diet_C])}
df = pd.DataFrame(data)

# Performing One-Way ANOVA
f_stat, p_value = stats.f_oneway(df[df['Diet'] == 'A']['Weight_Loss'],
                                 df[df['Diet'] == 'B']['Weight_Loss'],
                                 df[df['Diet'] == 'C']['Weight_Loss'])

print(f"F-statistic: {f_stat}, p-value: {p_value}")




F-statistic: 6.229115357571769, p-value: 0.0025308938971832957


#Answer  6 :


An F-statistic of 5.23 and a p-value of 0.02 suggest statistically significant differences between the means of at least two groups (reject the null hypothesis of no difference). However, this doesn't tell you which groups differ. You might need post-hoc tests like Tukey's HSD to identify specific group differences.


#Answer  7 :



Missing data can be handled in various ways, each with potential consequences:

Listwise deletion: Removes entire rows with missing values. Can lead to bias if missingness is non-random.
Mean/median imputation: Replaces missing values with the mean/median of the group. May underestimate variability.
Model-based methods: Use statistical models to estimate missing values. Can be computationally intensive.

#Answer 8  :

Q8. Common Post-hoc Tests after ANOVA

Here are some common post-hoc tests used after ANOVA, along with their applications:

Tukey's Honest Significant Difference (HSD): A conservative test for pairwise comparisons between all groups, controlling the family-wise error rate (FWER). Useful when you want to compare all possible pairs while maintaining a low chance of false positives.
Scheffé's Test: Another multiple comparison test, but less conservative than Tukey's HSD. Useful when you have a smaller number of groups and want more power to detect differences.
Bonferroni Correction: A simple method to adjust p-values from multiple comparisons, but may be overly conservative, leading to missed true differences.
Example of Using Post-hoc Tests:

Suppose you run a one-way ANOVA and find a significant difference. You can then use Tukey's HSD to identify which pairs of groups have statistically different means.



In [3]:
#Answer 9  :

import pandas as pd
import numpy as np
import scipy.stats as stats

# Generating example data
np.random.seed(0)
diet_A = np.random.normal(5, 2, 50)
diet_B = np.random.normal(6, 2, 50)
diet_C = np.random.normal(4, 2, 50)

data = {'Diet': ['A'] * 50 + ['B'] * 50 + ['C'] * 50,
        'Weight_Loss': np.concatenate([diet_A, diet_B, diet_C])}
df = pd.DataFrame(data)

# Performing One-Way ANOVA
f_stat, p_value = stats.f_oneway(df[df['Diet'] == 'A']['Weight_Loss'],
                                 df[df['Diet'] == 'B']['Weight_Loss'],
                                 df[df['Diet'] == 'C']['Weight_Loss'])

print(f"F-statistic: {f_stat}, p-value: {p_value}")


F-statistic: 6.229115357571769, p-value: 0.0025308938971832957


In [4]:
#Answer  10 :


import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Generating example data
np.random.seed(0)
program = np.tile(['A', 'B', 'C'], 30)
experience = np.repeat(['novice', 'experienced'], 45)
time = np.random.normal(10, 2, 90)

data = {'Program': program,
        'Experience': experience,
        'Time': time}
df = pd.DataFrame(data)

# Performing Two-Way ANOVA
model = ols('Time ~ C(Program) * C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)



                              sum_sq    df         F    PR(>F)
C(Program)                 11.759094   2.0  1.421073  0.247199
C(Experience)              15.955992   1.0  3.856527  0.052860
C(Program):C(Experience)    5.899393   2.0  0.712935  0.493145
Residual                  347.541523  84.0       NaN       NaN


In [5]:
#Answer  11 :

import scipy.stats as stats

# Sample data (replace with actual data)
control_scores = [80, 75, 88, 92, 85]
experimental_scores = [90, 82, 95, 98, 87]

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

# Print results
print(f"t-statistic: {t_statistic:.2f}, p-value: {p_value:.4f}")

if p_value < 0.05:
    print("There is a statistically significant difference in test scores between the control and experimental groups.")
    # Consider a post-hoc test (e.g., Mann-Whitney U test) if data is not normally distributed.
else:
    print("There is no statistically significant difference in test scores between the control and experimental groups.")




t-statistic: -1.55, p-value: 0.1588
There is no statistically significant difference in test scores between the control and experimental groups.


In [6]:
#Answer 12  :

import pandas as pd
import numpy as np
from statsmodels.stats.anova import AnovaRM

# Generating example data
np.random.seed(0)
days = np.arange(1, 31)
store_A_sales = np.random.normal(100, 10, 30)
store_B_sales = np.random.normal(110, 10, 30)
store_C_sales = np.random.normal(105, 10, 30)

data = {
    'Day': np.tile(days, 3),
    'Store': np.repeat(['A', 'B', 'C'], 30),
    'Sales': np.concatenate([store_A_sales, store_B_sales, store_C_sales])
}
df = pd.DataFrame(data)

# Performing Repeated Measures ANOVA
aovrm = AnovaRM(df, 'Sales', 'Day', within=['Store'])
res = aovrm.fit()

print(res)

# If the results are significant, follow up with a post-hoc test
if res.anova_table['Pr > F'][0] < 0.05:
    print("Significant differences found. Proceeding with post-hoc test.")

    # Post-hoc test (Pairwise comparisons using Tukey's HSD)
    from statsmodels.stats.multicomp import pairwise_tukeyhsd
    
    posthoc = pairwise_tukeyhsd(df['Sales'], df['Store'], alpha=0.05)
    print(posthoc)
else:
    print("No significant differences found.")


               Anova
      F Value Num DF  Den DF Pr > F
-----------------------------------
Store  0.9395 2.0000 58.0000 0.3967

No significant differences found.
