In [None]:
Q1: Assumptions required to use ANOVA:

Independence: Observations within each group are independent of each other.
Normality: The dependent variable follows a normal distribution within each group.
Homogeneity of variances: Variances of the dependent variable are equal across all groups.
Examples of violations:

Violation of independence: If observations within groups are not independent, such as repeated measures on the same subjects without proper accounting for dependency.
Violation of normality: If the dependent variable is not normally distributed within each group, ANOVA results may not be reliable, especially with small sample sizes.
Violation of homogeneity of variances: If variances are not equal across groups, ANOVA results may be biased, and post-hoc tests may be invalid.

In [None]:
Q2: Three types of ANOVA and their usage:

One-way ANOVA: Used when comparing the means of three or more independent groups on a single continuous dependent variable.
Two-way ANOVA: Used when there are two independent categorical variables (factors), and we want to examine their main effects and interaction effects on a continuous dependent variable.
Repeated measures ANOVA: Used when the same participants are measured under different conditions or at multiple time points.

In [1]:
import numpy as np
import scipy.stats as stats

# Example data
groups = 3
observations_per_group = 20
data = np.random.randn(groups, observations_per_group)

# Calculate mean of all observations
overall_mean = np.mean(data)

# Calculate Total Sum of Squares (SST)
SST = np.sum((data - overall_mean)**2)

# Calculate Explained Sum of Squares (SSE)
group_means = np.mean(data, axis=1)
SSE = np.sum(observations_per_group * (group_means - overall_mean)**2)

# Calculate Residual Sum of Squares (SSR)
SSR = SST - SSE

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)

Total Sum of Squares (SST): 76.2164606163377
Explained Sum of Squares (SSE): 0.855727520039736
Residual Sum of Squares (SSR): 75.36073309629796


In [21]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming data is arranged in a DataFrame with columns: 'Software', 'Experience', and 'Time'
# 'Software': Program A, Program B, Program C
# 'Experience': Novice, Experienced

# Fit the ANOVA model
model = ols('Time ~ C(Software) * C(Experience)', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract the main effects and interaction effect
software_effect = anova_table.loc['C(Software)', 'F']
experience_effect = anova_table.loc['C(Experience)', 'F']
interaction_effect = anova_table.loc['C(Software):C(Experience)', 'F']

print("Main effect of Software:", software_effect)
print("Main effect of Experience:", experience_effect)
print("Interaction effect:", interaction_effect)


NameError: name 'df' is not defined

In [22]:
import pandas as pd

# Example data
data = {
    'Software': ['A', 'A', 'B', 'B', 'C', 'C'],  # Program A, Program A, Program B, ...
    'Experience': ['Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced'],
    'Time': [10, 15, 20, 25, 30, 35]  # Time measurements for each combination of Software and Experience
}

# Create DataFrame
df = pd.DataFrame(data)


In [None]:
7: Handling missing data in repeated measures ANOVA:

Various methods can be used, such as mean imputation, regression imputation, or multiple imputation.
Consequences of using different methods include biased estimates of treatment effects, inflated or deflated standard errors, and potential distortion of p-values.

In [None]:
Q8: Common post-hoc tests used after ANOVA:

Tukey's HSD (Honestly Significant Difference): Used to determine which specific groups differ significantly from each other.
Bonferroni correction: Adjusts the significance threshold for multiple comparisons to control for familywise error rate.
Scheffé's method: Similar to Tukey's HSD but more conservative, suitable for unequal sample sizes and unequal variances.

In [5]:
import scipy.stats as stats

In [19]:
import scipy.stats as stats

# Example data
data = {
    'Diet_A': [1, 2, 3, 4, 5],  # Replace with actual weight loss data for Diet A
    'Diet_B': [2, 3, 4, 5, 6],  # Replace with actual weight loss data for Diet B
    'Diet_C': [3, 4, 5, 6, 7]   # Replace with actual weight loss data for Diet C
}

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(data['Diet_A'], data['Diet_B'], data['Diet_C'])

print("F-statistic:", F_statistic)
print("p-value:", p_value)
# Interpretation: If p-value < 0.05, there are significant differences in mean weight loss among the three diets.


F-statistic: 2.0
p-value: 0.177978515625


In [None]:

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
# Assuming data is in a DataFrame with columns: 'Time', 'Software', 'Experience'

# Fit OLS regression model
model = ols('Time ~ C(Software) * C(Experience)', data=df).fit()

# Perform ANOVA and store the results in an ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)

# Print ANOVA table
print(anova_table)

In [13]:
import scipy.stats as stats

# Example data
control_group = [85, 78, 92, 88, 90]  # list of test scores for control group
experimental_group = [75, 80, 70, 85, 82]  # list of test scores for experimental group

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

print("t-statistic:", t_statistic)
print("p-value:", p_value)
# Interpretation: If p-value < 0.05, there is a significant difference in test scores between the two groups.


t-statistic: 2.2725233814421393
p-value: 0.05268253876201326


In [9]:
import scipy.stats as stats

In [16]:
pip install statsmodels


Collecting statsmodels
  Obtaining dependency information for statsmodels from https://files.pythonhosted.org/packages/9a/b2/13833f94076dcc2709137fee92157ef5a0099a11ddff4f4b7cb301b35357/statsmodels-0.14.1-cp311-cp311-win_amd64.whl.metadata
  Downloading statsmodels-0.14.1-cp311-cp311-win_amd64.whl.metadata (9.8 kB)
Collecting patsy>=0.5.4 (from statsmodels)
  Obtaining dependency information for patsy>=0.5.4 from https://files.pythonhosted.org/packages/43/f3/1d311a09c34f14f5973bb0bb0dc3a6e007e1eda90b5492d082689936ca51/patsy-0.5.6-py2.py3-none-any.whl.metadata
  Downloading patsy-0.5.6-py2.py3-none-any.whl.metadata (3.5 kB)
Downloading statsmodels-0.14.1-cp311-cp311-win_amd64.whl (9.9 MB)
   ---------------------------------------- 0.0/9.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/9.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/9.9 MB 435.7 kB/s eta 0:00:23
   ---------------------------------------- 0.1/9.9 MB 544.7 kB/s eta 0:00:18
    -----


[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: C:\Users\sumit bhot\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
