### Q1. Assumptions required to use ANOVA and examples of violations

**Assumptions:**
1. **Independence:** Observations within each group are independent of each other.
2. **Normality:** The residuals (errors) of the model are normally distributed.
3. **Homogeneity of variances (Homoscedasticity):** The variance of residuals is constant across all levels of the independent variable(s).

**Examples of Violations:**
- **Independence Violation:** Data collected from the same individuals over time (repeated measures) violates independence.
- **Normality Violation:** Skewed distributions or outliers in the residuals.
- **Homogeneity of Variances Violation:** Unequal variances across groups, especially problematic in small sample sizes or when data are not normally distributed.

### Q2. Three types of ANOVA and their uses

1. **One-Way ANOVA:** Compares means across two or more independent groups to determine if there are significant differences.
   - **Example:** Comparing the effectiveness of three different teaching methods on student test scores.

2. **Two-Way ANOVA:** Examines the interaction between two independent variables (factors) on a dependent variable.
   - **Example:** Evaluating the effects of both gender and educational background on academic performance.

3. **Repeated Measures ANOVA:** Assesses changes in a dependent variable measured on the same subjects over multiple time points or conditions.
   - **Example:** Testing the impact of three different therapies on patients' pain levels before treatment, immediately after, and one week post-treatment.

### Q3. Partitioning of variance in ANOVA and its importance

- **Partitioning of Variance:** In ANOVA, total variance (SST) is decomposed into explained variance (SSE) and unexplained variance (SSR).
- **Importance:** Helps to understand how much of the total variation in the data is due to the independent variable(s) and how much is due to random error. This partitioning forms the basis for calculating F-statistics and determining the significance of results.

### Q4. Calculation of SST, SSE, and SSR in one-way ANOVA using Python

In Python, using `scipy.stats`:

In [1]:
import numpy as np
from scipy import stats

# Example data (weights for three diets)
diet_A = [65.3, 67.2, 64.8, 66.5, 68.1]
diet_B = [63.8, 62.1, 64.5, 61.9, 63.2]
diet_C = [69.5, 71.2, 70.1, 72.5, 68.9]

# Combine data
data = np.concatenate([diet_A, diet_B, diet_C])

# Group labels
groups = ['A'] * len(diet_A) + ['B'] * len(diet_B) + ['C'] * len(diet_C)

# One-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Calculate SST, SSE, SSR
mean_total = np.mean(data)
sst = np.sum((data - mean_total) ** 2)
sse = np.sum((np.concatenate([diet_A, diet_B, diet_C]) - mean_total) ** 2)
ssr = sst - sse

print(f"SST: {sst:.2f}")
print(f"SSE: {sse:.2f}")
print(f"SSR: {ssr:.2f}")
print(f"F-statistic: {f_statistic:.2f}")
print(f"P-value: {p_value:.4f}")

SST: 155.60
SSE: 155.60
SSR: 0.00
F-statistic: 39.76
P-value: 0.0000


### Q5. Calculation of Main Effects and Interaction Effects in Two-Way ANOVA



In a two-way ANOVA, you can calculate the main effects and interaction effects using Python by fitting an ANOVA model and examining the results. Here’s how you can approach this using the `statsmodels` library in Python:

### Example Setup

Suppose you have data where you want to analyze the effects of two factors: Factor A (with levels A1 and A2) and Factor B (with levels B1 and B2) on an outcome variable.

### Example Code

Here’s an example using `statsmodels` to perform a two-way ANOVA and interpret the main effects and interaction effects:

In [20]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data setup
data = {
    'A': ['A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2', 'A2'],
    'B': ['B1', 'B2', 'B1', 'B2', 'B1', 'B2', 'B1', 'B2'],
    'outcome': [10, 12, 15, 17, 9, 11, 14, 16]
}
df = pd.DataFrame(data)

# Fit the ANOVA model
model = ols('outcome ~ C(A) + C(B) + C(A):C(B)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA table
print(anova_table)

# Interpretation of results
# Main effects of Factor A and Factor B
print("\nMain effects:")
print(anova_table['sum_sq']['C(A)'] / anova_table['sum_sq'].sum())  # Proportion of variance explained by Factor A
print(anova_table['sum_sq']['C(B)'] / anova_table['sum_sq'].sum())  # Proportion of variance explained by Factor B

# Interaction effect between Factor A and Factor B
print("\nInteraction effect:")
print(anova_table['sum_sq']['C(A):C(B)'] / anova_table['sum_sq'].sum())  # Proportion of variance explained by interaction

# Conclusion based on p-values
alpha = 0.05
if anova_table['PR(>F)']['C(A):C(B)'] < alpha:
    print("\nThere is a significant interaction effect between Factor A and Factor B.")
else:
    print("\nThere is no significant interaction effect between Factor A and Factor B.")

                 sum_sq   df             F    PR(>F)
C(A)       5.000000e+01  1.0  1.000000e+02  0.000562
C(B)       8.000000e+00  1.0  1.600000e+01  0.016130
C(A):C(B)  6.310887e-30  1.0  1.262177e-29  1.000000
Residual   2.000000e+00  4.0           NaN       NaN

Main effects:
0.8333333333333338
0.13333333333333297

Interaction effect:
1.0518145402946785e-31

There is no significant interaction effect between Factor A and Factor B.


### Explanation

1. **Data Setup**: The example data has two categorical factors, A and B, and an outcome variable `outcome`.
  
2. **ANOVA Model**: We use the `ols` function from `statsmodels.formula.api` to specify the model formula (`outcome ~ C(A) + C(B) + C(A):C(B)`) where `C()` indicates categorical variables and `C(A):C(B)` specifies the interaction term.

3. **Fitting the Model**: The model is fitted using `.fit()` method.

4. **ANOVA Table**: The `anova_lm` function from `statsmodels.stats` computes the ANOVA table (`anova_table`) which contains sums of squares, degrees of freedom, F-statistic, and p-values.

5. **Main Effects**: To calculate the main effects, you can divide the sums of squares of each main effect by the total sums of squares.

6. **Interaction Effect**: The interaction effect is assessed by the sums of squares of the interaction term divided by the total sums of squares.

7. **Interpretation**: Based on the p-values (`PR(>F)` column in the ANOVA table), you can determine if there are significant main effects and interaction effects.

T

### Q6. Interpretation of One-Way ANOVA Results

Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.

- **Conclusion:** With a significance level of 0.05, since the p-value (0.02) is less than 0.05, we reject the null hypothesis.
- **Interpretation:** There is sufficient evidence to suggest that there are significant differences between at least two of the groups being compared.

### Q7. Handling Missing Data in Repeated Measures ANOVA

In a repeated measures ANOVA, missing data can be handled by various methods:
- **Pairwise deletion:** Exclude any participant with missing data from all analyses involving that participant.
- **Mean substitution:** Replace missing data with the mean value of that variable across all participants.
- **Model-based imputation:** Use statistical models to estimate missing values.

#### Consequences:
- **Bias:** Depending on the method chosen, estimates of effects may be biased if missingness is not completely random.
- **Loss of Power:** Removing cases with missing data reduces the sample size, potentially reducing statistical power.

### Q8. Common Post-hoc Tests after ANOVA

Common post-hoc tests include:
- **Tukey's HSD (Honestly Significant Difference):** Used when comparing all pairs of means to determine which pairs are significantly different.
- **Bonferroni correction:** Adjusts the significance level to account for multiple comparisons.
- **Sidak correction:** Another method to adjust for multiple comparisons.

#### Example:
- **Scenario:** After conducting a one-way ANOVA to compare the mean weight loss of three diets, you find a significant difference.
- **Post-hoc Test:** Use Tukey's HSD to determine which specific pairs of diets have significantly different mean weight loss.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.### Q9. One-Way ANOVA Example in Python

To conduct a one-way ANOVA in Python to compare the mean weight loss of three diets (A, B, and C), you can use the `statsmodels` library. Here’s how you can perform this analysis and interpret the results:

### Example Setup

Suppose you have collected weight loss data from 50 participants who were randomly assigned to one of the three diets (A, B, and C).

### Example Code

In [22]:
import scipy.stats as stats

# Example data for weight loss across three diets
diet_A = [2.1, 1.8, 2.0, 1.9, 2.2, 2.1, 1.8, 2.0, 1.9, 2.1, 1.7, 2.0]
diet_B = [1.9, 2.2, 2.1, 1.8, 2.0, 1.9, 2.1, 2.0, 2.2, 1.7, 2.0, 1.9]
diet_C = [2.2, 1.9, 2.1, 1.8, 2.0, 1.9, 2.1, 1.8, 2.0, 1.9, 2.2, 1.9]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print results
print(f"F-statistic: {f_statistic:.4f}, p-value: {p_value:.4f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("There is a significant difference between the mean weight loss of the three diets.")
else:
    print("There is no significant difference between the mean weight loss of the three diets.")


F-statistic: 0.0509, p-value: 0.9504
There is no significant difference between the mean weight loss of the three diets.


### Q10. Two-Way ANOVA Example in Python

In [None]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data setup
data = {
    'software': ['A'] * 10 + ['B'] * 10 + ['C'] * 10,  # 30 observations for 3 software programs
    'experience': ['Novice'] * 15 + ['Experienced'] * 15,  # 15 observations for Novice, 15 for Experienced
    'time_taken': [20, 22, 21, 25, 24, 23, 18, 19, 20, 24,  # Example data, adjust as per your actual data
                   18, 17, 16, 19, 18, 20, 22, 21, 23, 20,
                   19, 21, 23, 24, 22, 21, 25, 24, 19, 20]
}
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('time_taken ~ C(software) + C(experience) + C(software):C(experience)', data=df).fit()

# Print ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
`

### Q11. Two-Sample T-test Example in Python

In [12]:
from scipy.stats import ttest_ind

# Example data setup
control_group = [82, 78, 85, 88, 90, 79, 84, 86, 83, 81]  # Scores for control group
experimental_group = [88, 84, 87, 89, 92, 82, 85, 90, 86, 88]  # Scores for experimental group

# Conduct two-sample t-test
t_stat, p_value = ttest_ind(control_group, experimental_group)

# Interpret results
if p_value < 0.05:
    print(f"There is a significant difference in test scores between the two groups (p = {p_value:.4f})")
else:
    print(f"No significant difference found (p = {p_value:.4f})")

There is a significant difference in test scores between the two groups (p = 0.0340)


### Q12. Repeated Measures ANOVA Example in Python

In [None]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data setup
data = {
    'store': ['Store A'] * 30 + ['Store B'] * 30 + ['Store C'] * 30,  # 90 observations for 3 stores
    'sales': [100, 110, 105, 95, 102, 98, 105, 100, 107, 98, 103, 110,  # Example data, adjust as per your actual data
              112, 108, 100, 105, 98, 104, 106, 112, 114, 110, 98, 102,
              115, 108, 112, 105, 98, 100, 105, 98, 102, 108, 110, 112]
}
df = pd.DataFrame(data)

# Fit the repeated measures ANOVA model
model = ols('sales ~ C(store)', data=df).fit()

# Print ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


#### Post-hoc Test for Repeated Measures ANOVA

After finding significant differences with ANOVA, you might use pairwise comparison methods like Tukey's HSD or Bonferroni correction to determine which specific pairs of

 stores have significantly different sales.