 #QNO.1 ANS 
 1.Independence: The observations within each group are assumed to be independent of each other.
2.Normality: The dependent variable follows a normal distribution within each group. This assumption is more important when the group sizes are small.
3.Homogeneity of variances: The variability (variance) of the dependent variable is assumed to be equal across all groups.
Violations that could impact the validity of ANOVA results:

Violation of independence: If the observations within groups are not independent, such as in a repeated measures design or a 1.clustered data structure, the assumption is violated.
2.Violation of normality: If the dependent variable is not normally distributed within each group, the results may be unreliable, especially when the sample sizes are small.
3.Violation of homogeneity of variances: If the variability of the dependent variable is not equal across groups (i.e., groups have unequal variances), the results of ANOVA may be affected. This is known as heteroscedasticity.


#QNO.2
 The three types of ANOVA are:

1.One-Way ANOVA: This type of ANOVA is used when you have one categorical independent variable (also known as a factor) and one continuous dependent variable. It is used to determine whether there are any significant differences between the means of two or more groups.

2.Two-Way ANOVA: This type of ANOVA is used when you have two independent variables (factors) and one continuous dependent variable. It is used to analyze the main effects of each independent variable and the interaction effect between them.

3.Multivariate ANOVA (MANOVA): This type of ANOVA is used when you have multiple dependent variables and one or more independent variables. It allows you to examine the differences between groups across multiple dependent variables simultaneously.

#QNO.3 ANS
 The partitioning of variance in ANOVA refers to the division of the total variability in the data into different components. These components include the explained variance (also known as the between-group variance), the unexplained variance (also known as the within-group variance or residual variance), and the total variance.

It is important to understand this concept because it helps us quantify the amount of variance that can be attributed to the factors being tested in ANOVA. By decomposing the total variability into these components, we can assess the significance of the factors and determine their contribution to the overall variability in the data.

#QNO.4 ANS

In [5]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'],
        'value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)

model = ols('value ~ group', data=df).fit()
anova_table = sm.stats.anova_lm(model)

SST = anova_table['sum_sq']['group']
SSE = anova_table['sum_sq']['Residual']
SSR = SST - SSE

print('Total sum of squares (SST):', SST)
print('Explained sum of squares (SSE):', SSE)
print('Residual sum of squares (SSR):', SSR)


Total sum of squares (SST): 15.999999999999996
Explained sum of squares (SSE): 1.5
Residual sum of squares (SSR): 14.499999999999996


#QNO.5 ANS

In [4]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = {'group1': ['A', 'A', 'A', 'B', 'B', 'B'],
        'group2': ['X', 'Y', 'Z', 'X', 'Y', 'Z'],
        'value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)

model = ols('value ~ group1 + group2 + group1:group2', data=df).fit()
anova_table = sm.stats.anova_lm(model)

main_effect_group1 = anova_table['sum_sq']['group1']
main_effect_group2 = anova_table['sum_sq']['group2']

print('Total sum of squares (SST):', SST)
print('Explained sum of squares (SSE):', SSE)
print('Residual sum of squares (SSR):', SSR)


Total sum of squares (SST): 15.999999999999996
Explained sum of squares (SSE): 1.5
Residual sum of squares (SSR): 14.499999999999996


  (model.ssr / model.df_resid))


#QNO.5 ANS

In [3]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

data = {
    'factor1': [1, 1, 2, 2, 3, 3],
    'factor2': ['A', 'B', 'A', 'B', 'A', 'B'],
    'response': [5, 3, 6, 2, 7, 4]
}
df = pd.DataFrame(data)

model = ols('response ~ factor1 + factor2 + factor1:factor2', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

main_effect_factor1 = anova_table['sum_sq']['factor1']
main_effect_factor2 = anova_table['sum_sq']['factor2']

interaction_effect = anova_table['sum_sq']['factor1:factor2']

#QNO.6 ANS
In a one-way ANOVA, an F-statistic tests the null hypothesis that the means of all the groups are equal, against the alternative hypothesis that at least one of the means is different. A p-value of 0.02 suggests that the probability of observing such a large F-statistic under the null hypothesis is 0.02, assuming the usual significance level of 0.05.

If the p-value is less than the significance level, we reject the null hypothesis and conclude that there is evidence of at least one group mean being different from the others. In this case, with a p-value of 0.02, we reject the null hypothesis and conclude that there is significant evidence that the means of the groups are different.

# QNO.7 ANS

1.Complete Case Analysis: This approach involves excluding any cases with missing data from the analysis. However, this may lead to a loss of statistical power and potential bias if the missing data are not missing completely at random (MCAR).

2.Pairwise Deletion: In this approach, missing values are ignored for each pairwise comparison. It can be used when missing data occur in only a few observations. However, this approach may introduce bias if the missing data are related to the outcome or other variables.

3.Imputation: Imputation involves estimating missing values based on observed data. Common imputation methods include mean imputation, regression imputation, or multiple imputation. Imputation can help retain sample size and reduce bias, but the accuracy of imputed values depends on the imputation method used

#QNO.8 ANS
After conducting an ANOVA and finding a significant overall effect, post-hoc tests are often used to determine which specific group means differ significantly from each other. Some common post-hoc tests include:

1.Tukey's Honestly Significant Difference (HSD): Tukey's HSD test compares all possible pairs of group means while controlling the family-wise error rate. It is suitable when you have a balanced design (equal sample sizes) and want to compare all group means against each other.

2.Bonferroni correction: The Bonferroni correction adjusts the significance level for each pairwise comparison to maintain an overall family-wise error rate. It is suitable when you want to control for multiple comparisons and have a predetermined significance level.

3.Scheffe's test: Scheffe's test is a conservative post-hoc test that allows for comparing all possible group mean differences while controlling the family-wise error rate. It is suitable when you have unequal sample sizes and want to compare all group means against each other.

4.Dunnett's test: Dunnett's test compares each treatment group mean against a control group mean. It is suitable when you have a control group and want to determine if any treatment groups differ significantly from the control.

#QNO. 9 ANS

In [4]:
import scipy.stats as stats

diet_A = [2.3, 1.8, 3.1, 1.5, 2.6, 2.0, 2.7, 2.2, 1.9, 2.4, 1.7, 2.8, 1.6, 2.5, 1.9, 2.1, 2.3, 1.8, 2.0, 2.4, 1.5, 2.7, 2.9, 1.8, 2.3, 2.2, 1.9, 2.5, 2.1, 1.7]
diet_B = [1.3, 0.8, 1.1, 1.5, 0.6, 0.9, 1.2, 1.4, 1.0, 0.7, 1.1, 1.2, 0.8, 1.3, 1.2, 0.9, 1.5, 1.4, 0.6, 1.3, 1.1, 1.0, 1.4, 1.3, 0.9, 1.2, 0.8, 1.5, 1.1, 1.3]
diet_C = [1.7, 1.5, 1.8, 1.3, 1.4, 1.2, 1.9, 1.6, 1.7, 1.5, 1.3, 1.6, 1.4, 1.8, 1.7, 1.5, 1.2, 1.4, 1.3, 1.6, 1.5, 1.7, 1.6, 1.4, 1.3, 1.5, 1.7, 1.8, 1.6, 1.5, 1.9]

f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

print("F-statistic:", f_statistic)
print("p-value:", p_value)


F-statistic: 88.36679842865402
p-value: 8.987500355979926e-22


#QNO.10 ANS

In [9]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = {
    'Program': ['A'] * 30 + ['B'] * 30 + ['C'] * 30,
    'Experience': ['Novice'] * 30 + ['Experienced'] * 30 + ['Novice'] * 30 + ['Experienced'] * 30,
    'Time': [10, 12, 14, 11, 13, 15, 12, 14, 16, 9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16,
             9, 11, 13, 10, 12, 14, 11, 13, 15, 12, 14, 16]
}

df = pd.DataFrame(data)

model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data=df).fit()

anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


ValueError: All arrays must be of the same length

#QNO.11 ANS

In [10]:
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd

control_scores = [82, 85, 78, 90, 88, 92, 87, 85, 80, 86]

experimental_scores = [88, 90, 92, 78, 85, 88, 84, 92, 86, 80]

t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

scores = np.concatenate([control_scores, experimental_scores])
group_labels = ["Control"] * len(control_scores) + ["Experimental"] * len(experimental_scores)

tukey_result = pairwise_tukeyhsd(scores, group_labels)

print(tukey_result)


T-statistic: -0.49306371945465977
P-value: 0.6279288002855348
  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper reject
--------------------------------------------------------
Control Experimental      1.0 0.6279 -3.261 5.261  False
--------------------------------------------------------


#QNO.12 ANS

In [11]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import AnovaRM

store_a_sales = np.random.randint(1000, 2000, size=30)
store_b_sales = np.random.randint(900, 1800, size=30)
store_c_sales = np.random.randint(800, 1700, size=30)

data = {
    'Store': ['A'] * 30 + ['B'] * 30 + ['C'] * 30,
    'Sales': np.concatenate([store_a_sales, store_b_sales, store_c_sales]),
    'Day': np.tile(np.arange(30), 3)
}
df = pd.DataFrame(data)

rm_anova = AnovaRM(df, 'Sales', 'Store', within=['Day'])
results = rm_anova.fit()

print(results.summary())

posthoc = sm.stats.multicomp.pairwise_tukeyhsd(df['Sales'], df['Store'])
print(posthoc.summary())


              Anova
    F Value  Num DF  Den DF Pr > F
----------------------------------
Day  0.8946 29.0000 58.0000 0.6202

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
group1 group2  meandiff p-adj    lower    upper   reject
--------------------------------------------------------
     A      B    -102.1  0.262  -256.704   52.504  False
     A      C -228.2333  0.002 -382.8373 -73.6294   True
     B      C -126.1333 0.1322 -280.7373  28.4706  False
--------------------------------------------------------
