In [None]:
#ans1:


**Assumptions for ANOVA:**
1. Independence of observations.
2. Normality of residuals.
3. Homogeneity of variances.

**Examples of Violations:**
1. Non-independence: Observations not independent.
2. Non-Normality: Residuals not normally distributed.
3. Heterogeneity of Variances: Unequal variability across groups.

Violations can lead to inaccurate results, and researchers may need to consider alternative methods or
transformations.

In [None]:
#ans2:
Sure, let's break it down:

1. **One-Way ANOVA:**
   - **Use When:** You have one thing you're changing (like a teaching method) and want to see if it
    significantly affects the outcome (e.g., test scores).
   - **Example:** Comparing test scores of students taught with different methods.

2. **Two-Way ANOVA:**
   - **Use When:** You're changing two things at once (e.g., teaching method and gender) and want to know
    if each factor and their interaction have a significant impact.
   - **Example:** Checking if test scores are influenced by both teaching method and gender.

3. **Repeated Measures ANOVA:**
   - **Use When:** You're measuring the same subjects under different conditions or over time, and you want
    to see if there are significant changes.
   - **Example:** Monitoring changes in blood pressure over multiple time points after a drug treatment.

In short, One-Way compares one thing, Two-Way looks at two things and their interaction, and Repeated
Measures studies changes over time or conditions with the same subjects.



In [None]:
#asn3:


1. **Variance Components:**
   - ANOVA breaks down total variance into different components.
   - Total variance = Variance within groups + Variance between groups.

2. **Within-Group Variance:**
   - Measures variability within each group.
   - Represents random variation or error.

3. **Between-Group Variance:**
   - Measures variability between different groups.
   - Indicates whether group means are significantly different.

4. **F-Ratio:**
   - Ratio of between-group variance to within-group variance.
   - F = (Between-group variance / Within-group variance).
   - Used to test if group means are significantly different.

5. **Importance:**
   - Helps assess whether differences among group means are due to actual effects or random chance.
   - Guides researchers in understanding the impact of independent variables on the dependent variable.
   - Provides a statistical basis for drawing conclusions about the population from sample data.

6. **Critical in Experimental Design:**
   - Essential for designing experiments and interpreting results accurately.
   - Guides researchers in refining experimental designs to maximize the power of statistical tests.

7. **Hypothesis Testing:**
   - ANOVA helps test the null hypothesis that all group means are equal.
   - If the F-ratio is significant, it suggests that at least one group differs significantly from the
others.

Understanding the partitioning of variance in ANOVA is crucial for researchers and analysts to make valid
inferences about the effects of independent variables on the dependent variable in experimental designs.
It provides a structured approach to dissecting the sources of variation, allowing for more informed
interpretations of study outcomes.

In [1]:
#asn4:


import numpy as np

def one_way_anova_sumsquares(data):
    # Flatten the data into a 1D array
    flat_data = np.concatenate(data)

    # Calculate overall mean
    overall_mean = np.mean(flat_data)

    # Calculate SST
    sst = np.sum((flat_data - overall_mean)**2)

    # Calculate group means
    group_means = [np.mean(group) for group in data]

    # Calculate SSE
    sse = np.sum([len(group) * (group_mean - overall_mean)**2 for group_mean, group in zip(group_means, data)])

    # Calculate SSR
    ssr = np.sum([(x - group_mean)**2 for group, group_mean in zip(data, group_means) for x in group])

    return sst, sse, ssr

# Example usage
data = [np.array([5, 8, 9]), np.array([7, 6, 11]), np.array([8, 10, 12])]
sst, sse, ssr = one_way_anova_sumsquares(data)

print("Total Sum of Squares (SST):", sst)
print("Explained Sum of Squares (SSE):", sse)
print("Residual Sum of Squares (SSR):", ssr)



Total Sum of Squares (SST): 42.22222222222222
Explained Sum of Squares (SSE): 11.555555555555557
Residual Sum of Squares (SSR): 30.666666666666664


In [5]:
#asn5:


import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {'A': [10, 20, 30, 40, 50, 60],
        'B': [5, 15, 25, 35, 45, 55],
        'Y': [25, 30, 40, 45, 50, 60]}

df = pd.DataFrame(data)

# Fit the two-way ANOVA model
formula = 'Y ~ A + B + A:B'
model = ols(formula, data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)

# Display the ANOVA table
print(anova_table)

# Extract main effects and interaction effect
main_effect_A = anova_table['sum_sq']['A'] / anova_table['df']['A']
main_effect_B = anova_table['sum_sq']['B'] / anova_table['df']['B']
interaction_effect = anova_table['sum_sq']['A:B'] / anova_table['df']['A:B']

# Display main effects and interaction effect
print(f'Main Effect A: {main_effect_A}')
print(f'Main Effect B: {main_effect_B}')
print(f'Interaction Effect: {interaction_effect}')


              sum_sq   df           F    PR(>F)
A         668.746328  1.0  197.104181  0.000783
B         351.306603  1.0  103.542999  0.002023
A:B         0.297619  1.0    0.087719  0.786416
Residual   10.178571  3.0         NaN       NaN
Main Effect A: 668.7463281160254
Main Effect B: 351.3066026564111
Interaction Effect: 0.2976190476185124


In [None]:
#ans6:


The F-statistic of 5.23 with a p-value of 0.02 suggests that there are likely significant differences
between the groups you compared. In simple terms, it means the groups are not all the same, but you'd 
need additional tests to figure out exactly which groups differ from each other.

In [None]:
#asn7:


In a repeated measures ANOVA, handling missing data is crucial. Here's a brief overview:

1. **Handling Missing Data:**
   - **Pairwise Deletion:** Analyze only the available data for each pair of variables, ignoring cases
    with missing data.
   - **Mean Imputation:** Replace missing values with the mean of observed values for that variable.
   - **Interpolation:** Estimate missing values based on the observed data points.

2. **Potential Consequences:**
   - **Pairwise Deletion:** Can lead to biased results and loss of power, especially if missing data are
    not random.
   - **Mean Imputation:** May distort variance and relationships, assuming missing values are missing
completely at random (MCAR).
   - **Interpolation:** Assumes a specific pattern of change between observed data points, potentially
    introducing bias if the pattern is incorrect.

Choosing the appropriate method depends on the nature and extent of missing data, and the assumptions you
are willing to make about the missing data mechanism. Always consider the potential impact on the validity
and generalizability of your results.

In [None]:
#asn8:


Post-hoc tests are used after Analysis of Variance (ANOVA) to identify specific group differences when the 
overall ANOVA result indicates that at least one group differs significantly from others. Common post-hoc
tests include:

1. **Tukey's Honestly Significant Difference (HSD):**
   - **Use:** To compare all possible pairs of means.
   - **Example:** If you have conducted a one-way ANOVA comparing the means of three different teaching
methods, and the ANOVA result is significant, Tukey's HSD can be used to determine which specific pairs
of teaching methods show significant differences in student performance.

2. **Bonferroni Correction:**
   - **Use:** To control for family-wise error rate by adjusting significance levels for multiple 
    comparisons.
   - **Example:** When you have multiple pairwise comparisons to make, like comparing the means of 
different drug treatments, and you want to reduce the risk of making a Type I error due to the 
increased number of comparisons.

3. **Duncan's Multiple Range Test:**
   - **Use:** To compare means of all possible pairs, identifying groups that are significantly different.
   - **Example:** After conducting an ANOVA comparing the yields of different fertilizer treatments
in agriculture, Duncan's test can help determine which specific pairs of fertilizers lead to significantly
different yields.

4. **Scheffé's Test:**
   - **Use:** To control for family-wise error rate with a more liberal approach compared to Bonferroni.
   - **Example:** If you are comparing the means of various product formulations in a manufacturing
process and want to minimize the risk of Type I errors across all possible pairwise comparisons.

**Example Situation:**
Suppose you conducted an ANOVA to compare the effectiveness of four different marketing strategies
on sales. The ANOVA indicates a significant difference among the strategies. To pinpoint which specific 
strategies differ, you might employ Tukey's HSD to conduct pairwise comparisons. This can help you 
identify which marketing strategies are statistically different in terms of their impact on sales, 
allowing for more targeted and informed decision-making.

In [6]:
#ans9:


import scipy.stats as stats
import numpy as np

# Generate some example data (replace this with your actual data)
np.random.seed(42)
diet_A = np.random.normal(5, 2, 50)  # mean=5, std=2
diet_B = np.random.normal(4.5, 1.5, 50)  # mean=4.5, std=1.5
diet_C = np.random.normal(6, 2.5, 50)  # mean=6, std=2.5

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in mean weight loss between the three diets.")
else:
    print("There is no significant difference in mean weight loss between the three diets.")


F-statistic: 7.984872861507485
p-value: 0.0005104585600694623
There is a significant difference in mean weight loss between the three diets.


In [7]:
#ans10:

import pandas as pd
from scipy.stats import f_oneway
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Assuming 'data' is your DataFrame
# Example data (you should replace this with your actual data)
data = pd.DataFrame({
    'Time': [10, 12, 15, 8, 9, 11, 14, 13, 16, 9, 10, 12, 17, 18, 20, 15, 13, 12, 11, 14, 19, 22, 25, 20, 18, 16, 23, 21, 19, 24],
    'Program': ['A']*10 + ['B']*10 + ['C']*10,
    'Experience': ['Novice']*15 + ['Experienced']*15
})

# Fit the ANOVA model
model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data).fit()

# Perform the ANOVA
anova_table = anova_lm(model)

# Display the ANOVA table
print(anova_table)


                            df      sum_sq     mean_sq          F  \
C(Program)                 2.0  431.666667  215.833333  25.346281   
C(Experience)              1.0   14.400000   14.400000   1.691057   
C(Program):C(Experience)   2.0    0.378636    0.189318   0.022232   
Residual                  26.0  221.400000    8.515385        NaN   

                                PR(>F)  
C(Program)                7.813782e-07  
C(Experience)             2.048711e-01  
C(Program):C(Experience)  9.780314e-01  
Residual                           NaN  


In [8]:
#asn11:


import numpy as np
from scipy.stats import ttest_ind
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Generate random data for control and experimental groups
np.random.seed(42)
control_group = np.random.normal(75, 10, 50)  # Mean: 75, Standard Deviation: 10
experimental_group = np.random.normal(80, 10, 50)  # Mean: 80, Standard Deviation: 10

# Perform two-sample t-test
t_stat, p_value = ttest_ind(control_group, experimental_group)

# Check if the results are significant
if p_value < 0.05:
    print("There is a significant difference between the two groups.")
    
    # Perform post-hoc test (Tukey's HSD)
    data = np.concatenate([control_group, experimental_group])
    labels = ['Control'] * len(control_group) + ['Experimental'] * len(experimental_group)
    
    tukey_results = pairwise_tukeyhsd(data, labels, alpha=0.05)
    print(tukey_results.summary())
else:
    print("There is no significant difference between the two groups.")


There is a significant difference between the two groups.
   Multiple Comparison of Means - Tukey HSD, FWER=0.05    
 group1    group2    meandiff p-adj  lower   upper  reject
----------------------------------------------------------
Control Experimental   7.4325 0.0001 3.8427 11.0224   True
----------------------------------------------------------


In [12]:
pip install pingouin


Collecting pingouin
  Downloading pingouin-0.5.3-py3-none-any.whl (198 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.6/198.6 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
Collecting outdated
  Downloading outdated-0.2.2-py2.py3-none-any.whl (7.5 kB)
Collecting pandas-flavor>=0.2.0
  Downloading pandas_flavor-0.6.0-py3-none-any.whl (7.2 kB)
Collecting tabulate
  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting xarray
  Downloading xarray-2023.12.0-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m59.1 MB/s[0m eta [36m0:00:00[0m
Collecting littleutils
  Downloading littleutils-0.2.2.tar.gz (6.6 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: littleutils
  Building wheel for littleutils (setup.py) ... [?25ldone
[?25h  Created wheel for littleutils: filename=littleutils-0.2.2-py3-none-any.whl size=7028 sha256=385ca84a6acd55b6767cb9bdee78b108790b

In [16]:
import pandas as pd
from pingouin import rm_anova, pairwise_tukey
import numpy as np

# Create a DataFrame with daily sales data for each store
data = {
    'Store': ['A'] * 30 + ['B'] * 30 + ['C'] * 30,
    'Day': np.repeat(np.arange(30), 3),  # Add a 'Day' column
    'Sales': np.concatenate([np.random.normal(50, 10, 30),
                             np.random.normal(45, 8, 30),
                             np.random.normal(55, 12, 30)])
}

df = pd.DataFrame(data)

# Fill missing values with the mean of the 'Sales' column
df['Sales'].fillna(df['Sales'].mean(), inplace=True)

# Repeated measures ANOVA
anova_result = rm_anova(dv='Sales', within='Store', subject='Day', data=df)
print(anova_result)

# Post-hoc Tukey test
posthoc_result = pairwise_tukey(data=df, dv='Sales', within='Store', subject='Day')
print(posthoc_result)


ValueError: cannot convert float NaN to integer