# Understanding Year-by-Year Changes in Student Performance
## A Visual and Statistical Journey Through FC Implementation (2021-2023)

### üìö What is this notebook about?

This notebook examines how student performance in IT Service Management changed over three years of Flipped Classroom (FC) implementation. We'll use statistical tests to determine if changes are real or just random variation.

### üéØ Key Questions We're Answering:
1. Did student performance improve over the years?
2. Which aspects of learning showed the most significant changes?
3. Are the improvements statistically meaningful or just coincidence?

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import scipy.stats as stats
from scipy.stats import f_oneway, kruskal, mannwhitneyu, ttest_ind
import warnings
from pathlib import Path

# Set display options
pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

# Set style for better-looking plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 12

print("‚úÖ Libraries loaded successfully!")

‚úÖ Libraries loaded successfully!


## 1. Loading and Preparing the Data

First, we'll load the student performance data from three academic years (2021, 2022, 2023).

In [2]:
# Load the cleaned data
data_dir = Path('../data')
cleaned_file = data_dir / '2025_09_02_FC_K_A_cleaned.csv'

# Check if cleaned data exists
if not cleaned_file.exists():
    print(f"‚ùå Cleaned data file not found at {cleaned_file}")
    print("Please run the data cleaning notebook first.")
else:
    df = pd.read_csv(cleaned_file)
    print(f"‚úÖ Data loaded successfully!")
    print(f"üìä Dataset contains {len(df)} students from {df['academic_year'].nunique()} academic years")
    print(f"\nYears included: {sorted(df['academic_year'].unique())}")
    
    # Basic data overview
    print(f"\nüìà Students per year:")
    for year in sorted(df['academic_year'].unique()):
        count = len(df[df['academic_year'] == year])
        print(f"  ‚Ä¢ {year}: {count} students")

‚úÖ Data loaded successfully!
üìä Dataset contains 147 students from 3 academic years

Years included: [np.int64(2021), np.int64(2022), np.int64(2023)]

üìà Students per year:
  ‚Ä¢ 2021: 45 students
  ‚Ä¢ 2022: 49 students
  ‚Ä¢ 2023: 53 students


## 2. Understanding Statistical Tests

### ü§î What are Statistical Tests?

Statistical tests help us determine if differences we see in data are **real** or just **random chance**.

Think of it like this:
- If you flip a coin 10 times and get 6 heads, is the coin unfair? Probably not - that's just random variation.
- If you flip it 1000 times and get 600 heads, is the coin unfair? Much more likely!

### üìä Tests We'll Use:

1. **ANOVA (Analysis of Variance)**
   - **What it does**: Compares averages across multiple groups (our 3 years)
   - **When to use**: When data follows a normal bell curve
   - **Real-world analogy**: Like comparing average heights of students from different schools

2. **Kruskal-Wallis Test**
   - **What it does**: Same as ANOVA but doesn't require normal distribution
   - **When to use**: When data is skewed or has outliers
   - **Real-world analogy**: Like comparing median house prices across neighborhoods

### üéØ What is "Statistical Significance"?

- **p-value < 0.05**: The difference is probably real (less than 5% chance it's random)
- **p-value ‚â• 0.05**: The difference might just be random variation

Think of p-value as the probability that we're wrong when we say "there's a difference."
- p = 0.01 means 1% chance we're wrong (very confident!)
- p = 0.40 means 40% chance we're wrong (not confident at all!)

In [3]:
# Define the key variables we'll analyze
key_variables = {
    'Student Engagement': [
        ('test_completion_rate', 'How many tests students attempted'),
        ('avg_success_rate_per_test', 'Average score on attempted tests')
    ],
    'Academic Performance': [
        ('final_grade', 'Final course grade (1-5 scale)'),
        ('fc_total_points', 'Total points earned (0-100 scale)'),
        ('percentage_points', 'Percentage of maximum points')
    ],
    'Assessment Components': [
        ('presentation_points', 'Presentation score (max 10)'),
        ('defense_points', 'Defense score (max 30)'),
        ('exam_k2', 'Midterm exam score (max 25)'),
        ('exam_k3', 'Final exam score (max 25)')
    ]
}

# Create a simple visualization of what we're analyzing
print("üìö Variables We're Analyzing:\n")
for category, variables in key_variables.items():
    print(f"\nüéØ {category}:")
    for var_name, description in variables:
        if var_name in df.columns:
            print(f"  ‚úì {var_name}: {description}")
        else:
            print(f"  ‚úó {var_name}: {description} [NOT AVAILABLE]")

üìö Variables We're Analyzing:


üéØ Student Engagement:
  ‚úì test_completion_rate: How many tests students attempted
  ‚úì avg_success_rate_per_test: Average score on attempted tests

üéØ Academic Performance:
  ‚úì final_grade: Final course grade (1-5 scale)
  ‚úì fc_total_points: Total points earned (0-100 scale)
  ‚úì percentage_points: Percentage of maximum points

üéØ Assessment Components:
  ‚úì presentation_points: Presentation score (max 10)
  ‚úì defense_points: Defense score (max 30)
  ‚úì exam_k2: Midterm exam score (max 25)
  ‚úì exam_k3: Final exam score (max 25)


## 3. Visual Overview: How Did Performance Change Over Years?

Let's start with simple visualizations to see the trends before diving into statistical tests.

In [4]:
# Create comprehensive visualization of year-wise trends with improved clarity
def create_year_comparison_plots(df, variables_dict):
    """Create detailed plots comparing variables across years with layman-friendly design"""
    
    for category, variables in variables_dict.items():
        # Filter to available variables
        available_vars = [(var, desc) for var, desc in variables if var in df.columns]
        
        if not available_vars:
            continue
        
        # Create separate plots for each variable for better clarity
        for var_name, description in available_vars:
            years = sorted(df['academic_year'].unique())
            colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']  # Different color for each year
            
            # Create a figure with better spacing
            fig = make_subplots(
                rows=1, cols=3,
                subplot_titles=[
                    "Box Plot (Shows Range & Average)",
                    "Violin Plot (Shows Distribution Shape)",
                    "Mean Values with Error Bars"
                ],
                horizontal_spacing=0.12
            )
            
            # Prepare data for all years
            year_data_dict = {}
            for year in years:
                year_data_dict[year] = df[df['academic_year'] == year][var_name].dropna()
            
            # 1. Box plots with clear labels
            for j, year in enumerate(years):
                year_data = year_data_dict[year]
                
                fig.add_trace(
                    go.Box(
                        y=year_data,
                        name=f"Year {year}",
                        boxmean='sd',  # Show mean and standard deviation
                        marker_color=colors[j],
                        boxpoints='outliers',  # Show only outliers
                        jitter=0.3,
                        pointpos=-1.8,
                        showlegend=True,
                        legendgroup=str(year),
                        hovertemplate=f"Year {year}<br>Value: %{{y:.2f}}<extra></extra>"
                    ),
                    row=1, col=1
                )
            
            # 2. Violin plots with better visibility
            for j, year in enumerate(years):
                year_data = year_data_dict[year]
                
                fig.add_trace(
                    go.Violin(
                        y=year_data,
                        name=f"Year {year}",
                        box_visible=True,
                        meanline_visible=True,
                        marker_color=colors[j],
                        opacity=0.7,
                        showlegend=False,
                        legendgroup=str(year),
                        hovertemplate=f"Year {year}<br>Value: %{{y:.2f}}<extra></extra>"
                    ),
                    row=1, col=2
                )
            
            # 3. Mean comparison with error bars
            means = []
            stds = []
            for year in years:
                year_data = year_data_dict[year]
                means.append(year_data.mean())
                stds.append(year_data.std())
            
            fig.add_trace(
                go.Scatter(
                    x=years,
                    y=means,
                    error_y=dict(
                        type='data',
                        array=stds,
                        visible=True,
                        color='rgba(0,0,0,0.3)'
                    ),
                    mode='lines+markers',
                    marker=dict(size=12, color='#FF6B6B'),
                    line=dict(width=2, color='#FF6B6B'),
                    name='Mean ¬± Std Dev',
                    showlegend=False,
                    hovertemplate="Year: %{x}<br>Mean: %{y:.2f}<br>Std: %{error_y.array:.2f}<extra></extra>"
                ),
                row=1, col=3
            )
            
            # Add individual year points for reference
            for j, year in enumerate(years):
                fig.add_trace(
                    go.Scatter(
                        x=[year],
                        y=[means[j]],
                        mode='markers+text',
                        marker=dict(size=15, color=colors[j]),
                        text=[f"{means[j]:.2f}"],
                        textposition="top center",
                        textfont=dict(size=12),
                        showlegend=False,
                        hovertemplate=f"Year {year}<br>Mean: {means[j]:.2f}<extra></extra>"
                    ),
                    row=1, col=3
                )
            
            # Update layout for clarity
            fig.update_layout(
                title={
                    'text': f"<b>{description}</b><br><sub>{category} - Comparing Years 2021-2023</sub>",
                    'font': {'size': 16}
                },
                height=500,
                showlegend=True,
                hovermode='closest',
                legend=dict(
                    orientation="h",
                    yanchor="bottom",
                    y=1.02,
                    xanchor="right",
                    x=1
                ),
                font=dict(size=11)
            )
            
            # Update axes labels with clear descriptions
            fig.update_xaxes(title_text="", row=1, col=1)
            fig.update_xaxes(title_text="", row=1, col=2)
            fig.update_xaxes(title_text="Academic Year", row=1, col=3)
            
            # Add consistent y-axis labels
            y_label = "Score"
            if "percentage" in var_name.lower():
                y_label = "Percentage (%)"
            elif "grade" in var_name.lower():
                y_label = "Grade (1-5)"
            elif "points" in var_name.lower():
                y_label = "Points"
                
            fig.update_yaxes(title_text=y_label, row=1, col=1)
            fig.update_yaxes(title_text=y_label, row=1, col=2)
            fig.update_yaxes(title_text=y_label, row=1, col=3)
            
            # Make sure all y-axes have the same range for easy comparison
            all_values = np.concatenate([year_data_dict[year].values for year in years])
            y_min = all_values.min() - (all_values.std() * 0.5)
            y_max = all_values.max() + (all_values.std() * 0.5)
            
            fig.update_yaxes(range=[y_min, y_max])
            
            fig.show()
            
            # Print clear summary statistics
            print(f"\nüìä {description}")
            print(f"Variable: {var_name}")
            print("-" * 60)
            
            # Create a clear summary table
            summary_data = []
            for year in years:
                year_data = year_data_dict[year]
                summary_data.append({
                    'Year': year,
                    'Count': len(year_data),
                    'Mean': f"{year_data.mean():.2f}",
                    'Median': f"{year_data.median():.2f}",
                    'Std Dev': f"{year_data.std():.2f}",
                    'Min': f"{year_data.min():.2f}",
                    'Max': f"{year_data.max():.2f}"
                })
            
            summary_df = pd.DataFrame(summary_data)
            print(summary_df.to_string(index=False))
            
            # Add trend interpretation
            first_mean = year_data_dict[years[0]].mean()
            last_mean = year_data_dict[years[-1]].mean()
            change = last_mean - first_mean
            pct_change = (change / first_mean * 100) if first_mean != 0 else 0
            
            print(f"\nüìà Trend: {years[0]} ‚Üí {years[-1]}")
            if change > 0:
                print(f"   ‚Üë Increased by {change:.2f} ({pct_change:.1f}%)")
            elif change < 0:
                print(f"   ‚Üì Decreased by {abs(change):.2f} ({abs(pct_change):.1f}%)")
            else:
                print(f"   ‚Üí No change")
            print()

# Create the visualizations
print("="*80)
print("üìä VISUAL ANALYSIS OF YEAR-WISE PERFORMANCE CHANGES")
print("="*80)
print("\nEach plot shows the same data in different ways:")
print("‚Ä¢ Box Plot: Shows the median (middle line), quartiles (box), and outliers")
print("‚Ä¢ Violin Plot: Shows how data is distributed (wider = more students at that score)")
print("‚Ä¢ Mean Plot: Shows average trend over years with variability")
print("\nLet's examine each variable:\n")

create_year_comparison_plots(df, key_variables)

üìä VISUAL ANALYSIS OF YEAR-WISE PERFORMANCE CHANGES

Each plot shows the same data in different ways:
‚Ä¢ Box Plot: Shows the median (middle line), quartiles (box), and outliers
‚Ä¢ Violin Plot: Shows how data is distributed (wider = more students at that score)
‚Ä¢ Mean Plot: Shows average trend over years with variability

Let's examine each variable:




üìä How many tests students attempted
Variable: test_completion_rate
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min  Max
 2021     45 0.60   0.70    0.30 0.00 1.00
 2022     49 0.82   0.90    0.24 0.10 1.00
 2023     53 0.88   0.90    0.14 0.40 1.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 0.28 (46.4%)




üìä Average score on attempted tests
Variable: avg_success_rate_per_test
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min  Max
 2021     45 0.38   0.38    0.19 0.00 0.83
 2022     49 0.65   0.67    0.22 0.00 0.93
 2023     53 0.66   0.70    0.16 0.28 0.90

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 0.28 (72.7%)




üìä Final course grade (1-5 scale)
Variable: final_grade
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min  Max
 2021     45 1.78   1.00    0.97 1.00 4.00
 2022     49 2.24   2.00    1.23 1.00 5.00
 2023     51 2.31   3.00    1.10 1.00 4.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 0.54 (30.1%)




üìä Total points earned (0-100 scale)
Variable: fc_total_points
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min  Max
 2021     45 2.48   2.66    1.62 0.00 6.67
 2022     49 5.66   6.34    2.56 0.00 9.33
 2023     53 5.95   6.01    1.94 1.66 9.01

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 3.47 (139.9%)




üìä Percentage of maximum points
Variable: percentage_points
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min  Max
 2021     45 0.25   0.27    0.16 0.00 0.67
 2022     49 0.57   0.63    0.26 0.00 0.93
 2023     53 0.60   0.60    0.19 0.17 0.90

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 0.35 (139.9%)




üìä Presentation score (max 10)
Variable: presentation_points
------------------------------------------------------------
 Year  Count Mean Median Std Dev  Min   Max
 2021     45 7.29   7.00    1.74 2.00 10.00
 2022     48 7.08   7.00    1.25 5.00 10.00
 2023     52 6.83   7.00    2.19 2.00 10.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üì Decreased by 0.46 (6.3%)




üìä Defense score (max 30)
Variable: defense_points
------------------------------------------------------------
 Year  Count  Mean Median Std Dev   Min   Max
 2021     45 23.38  24.00    4.94 14.00 31.00
 2022     49 22.37  25.00    7.80  0.00 30.00
 2023     46 21.17  22.00    3.87 12.00 27.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üì Decreased by 2.20 (9.4%)




üìä Midterm exam score (max 25)
Variable: exam_k2
------------------------------------------------------------
 Year  Count  Mean Median Std Dev  Min   Max
 2021     43 10.98  11.00    4.86 3.00 19.00
 2022     49 11.45  11.00    4.26 4.00 20.00
 2023     53 12.45  13.00    5.38 0.00 23.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 1.48 (13.4%)




üìä Final exam score (max 25)
Variable: exam_k3
------------------------------------------------------------
 Year  Count  Mean Median Std Dev  Min   Max
 2021     40 12.28  13.00    4.67 3.00 20.00
 2022     43 14.72  15.00    4.35 5.00 23.00
 2023     44 14.16  16.00    5.66 0.00 21.00

üìà Trend: 2021 ‚Üí 2023
   ‚Üë Increased by 1.88 (15.3%)



## 4. Statistical Testing: Are the Differences Real?

Now let's run statistical tests to determine if the differences we see are statistically significant.

In [5]:
def perform_statistical_tests(df, variable_name, variable_description):
    """Perform ANOVA and Kruskal-Wallis tests with detailed explanations"""
    
    print(f"\n{'='*80}")
    print(f"üìä ANALYZING: {variable_description}")
    print(f"Variable: {variable_name}")
    print(f"{'='*80}")
    
    # Get data for each year
    years = sorted(df['academic_year'].unique())
    year_groups = []
    
    print("\nüìà Data Overview:")
    for year in years:
        year_data = df[df['academic_year'] == year][variable_name].dropna()
        year_groups.append(year_data)
        print(f"  {year}: n={len(year_data)}, mean={year_data.mean():.3f}, std={year_data.std():.3f}")
    
    # Check if we have enough data
    if any(len(group) < 3 for group in year_groups):
        print("\n‚ö†Ô∏è Insufficient data for statistical testing (need at least 3 observations per year)")
        return None
    
    results = {}
    
    # 1. NORMALITY TESTING
    print("\nüî¨ STEP 1: Testing for Normal Distribution")
    print("(Checking if data follows a bell curve)\n")
    
    is_normal = True
    for year, data in zip(years, year_groups):
        if len(data) >= 8:  # Need at least 8 observations for Shapiro-Wilk
            stat, p_value = stats.shapiro(data)
            is_normal_year = p_value > 0.05
            print(f"  {year}: {'‚úì Normal' if is_normal_year else '‚úó Not Normal'} (p={p_value:.4f})")
            if not is_normal_year:
                is_normal = False
        else:
            print(f"  {year}: Too few samples for normality test")
            is_normal = False
    
    # 2. ANOVA TEST
    print("\nüîç STEP 2: ANOVA Test (Parametric)")
    print("Tests if the AVERAGE values differ between years\n")
    
    f_stat, anova_p = f_oneway(*year_groups)
    results['anova'] = {'statistic': f_stat, 'p_value': anova_p}
    
    print(f"  F-statistic: {f_stat:.3f}")
    print(f"  P-value: {anova_p:.4f}")
    
    if anova_p < 0.05:
        print("  üéØ Result: SIGNIFICANT difference in averages between years!")
        print("     ‚Üí The changes over years are statistically meaningful")
    else:
        print("  ‚ÑπÔ∏è Result: No significant difference in averages")
        print("     ‚Üí Changes might be due to random variation")
    
    # 3. KRUSKAL-WALLIS TEST
    print("\nüîç STEP 3: Kruskal-Wallis Test (Non-parametric)")
    print("Tests if the DISTRIBUTIONS differ between years")
    print("(More robust when data isn't normally distributed)\n")
    
    h_stat, kw_p = kruskal(*year_groups)
    results['kruskal'] = {'statistic': h_stat, 'p_value': kw_p}
    
    print(f"  H-statistic: {h_stat:.3f}")
    print(f"  P-value: {kw_p:.4f}")
    
    if kw_p < 0.05:
        print("  üéØ Result: SIGNIFICANT difference in distributions between years!")
        print("     ‚Üí The changes over years are statistically meaningful")
    else:
        print("  ‚ÑπÔ∏è Result: No significant difference in distributions")
        print("     ‚Üí Changes might be due to random variation")
    
    # 4. WHICH TEST TO TRUST?
    print("\nüí° WHICH TEST SHOULD WE TRUST?")
    if is_normal:
        print("  ‚úì Data is normally distributed ‚Üí ANOVA is more appropriate")
        primary_test = 'ANOVA'
        primary_p = anova_p
    else:
        print("  ‚úó Data is NOT normally distributed ‚Üí Kruskal-Wallis is more appropriate")
        primary_test = 'Kruskal-Wallis'
        primary_p = kw_p
    
    # 5. EFFECT SIZE
    print("\nüìè EFFECT SIZE (How big is the difference?)")
    
    # Calculate eta-squared for ANOVA
    all_data = np.concatenate(year_groups)
    grand_mean = all_data.mean()
    ss_between = sum(len(group) * (group.mean() - grand_mean)**2 for group in year_groups)
    ss_total = sum((val - grand_mean)**2 for val in all_data)
    eta_squared = ss_between / ss_total if ss_total > 0 else 0
    
    print(f"  Eta-squared: {eta_squared:.3f}")
    
    if eta_squared < 0.01:
        effect_interpretation = "Negligible effect"
    elif eta_squared < 0.06:
        effect_interpretation = "Small effect"
    elif eta_squared < 0.14:
        effect_interpretation = "Medium effect"
    else:
        effect_interpretation = "Large effect"
    
    print(f"  Interpretation: {effect_interpretation}")
    print(f"  ‚Üí {eta_squared*100:.1f}% of variance is explained by year differences")
    
    # 6. FINAL VERDICT
    print("\nüéØ FINAL VERDICT:")
    if primary_p < 0.05:
        print(f"  ‚úÖ Based on {primary_test} (p={primary_p:.4f}):")
        print(f"     There IS a statistically significant change over years!")
        print(f"     Effect size: {effect_interpretation}")
        
        # Calculate year-to-year changes
        means = [group.mean() for group in year_groups]
        for i in range(len(years)-1):
            change = means[i+1] - means[i]
            pct_change = (change / means[i] * 100) if means[i] != 0 else 0
            print(f"     {years[i]} ‚Üí {years[i+1]}: {'‚Üë' if change > 0 else '‚Üì'} {abs(change):.3f} ({abs(pct_change):.1f}%)")
    else:
        print(f"  ‚ÑπÔ∏è Based on {primary_test} (p={primary_p:.4f}):")
        print(f"     NO statistically significant change over years")
        print(f"     The observed differences are likely due to random variation")
    
    return results

# Run tests for all variables
all_results = {}
for category, variables in key_variables.items():
    print(f"\n\n{'#'*80}")
    print(f"# {category.upper()}")
    print(f"{'#'*80}")
    
    for var_name, description in variables:
        if var_name in df.columns:
            results = perform_statistical_tests(df, var_name, description)
            if results:
                all_results[var_name] = results



################################################################################
# STUDENT ENGAGEMENT
################################################################################

üìä ANALYZING: How many tests students attempted
Variable: test_completion_rate

üìà Data Overview:
  2021: n=45, mean=0.604, std=0.304
  2022: n=49, mean=0.824, std=0.245
  2023: n=53, mean=0.885, std=0.135

üî¨ STEP 1: Testing for Normal Distribution
(Checking if data follows a bell curve)

  2021: ‚úó Not Normal (p=0.0029)
  2022: ‚úó Not Normal (p=0.0000)
  2023: ‚úó Not Normal (p=0.0000)

üîç STEP 2: ANOVA Test (Parametric)
Tests if the AVERAGE values differ between years

  F-statistic: 18.844
  P-value: 0.0000
  üéØ Result: SIGNIFICANT difference in averages between years!
     ‚Üí The changes over years are statistically meaningful

üîç STEP 3: Kruskal-Wallis Test (Non-parametric)
Tests if the DISTRIBUTIONS differ between years
(More robust when data isn't normally distributed)

  H-statis

## 5. Post-Hoc Analysis: Which Years are Different?

When we find significant differences, we need to determine WHICH specific years differ from each other.

In [6]:
def perform_pairwise_comparisons(df, variable_name, variable_description):
    """Perform pairwise comparisons between years"""
    
    print(f"\n{'='*80}")
    print(f"üîç PAIRWISE COMPARISONS: {variable_description}")
    print(f"{'='*80}")
    
    years = sorted(df['academic_year'].unique())
    
    # Create comparison matrix
    comparison_results = []
    
    for i, year1 in enumerate(years):
        for year2 in years[i+1:]:
            data1 = df[df['academic_year'] == year1][variable_name].dropna()
            data2 = df[df['academic_year'] == year2][variable_name].dropna()
            
            if len(data1) < 3 or len(data2) < 3:
                continue
            
            # Perform t-test
            t_stat, t_p = ttest_ind(data1, data2)
            
            # Perform Mann-Whitney U test
            u_stat, u_p = mannwhitneyu(data1, data2, alternative='two-sided')
            
            # Calculate effect size (Cohen's d)
            pooled_std = np.sqrt(((len(data1)-1)*data1.std()**2 + (len(data2)-1)*data2.std()**2) / (len(data1)+len(data2)-2))
            cohens_d = (data1.mean() - data2.mean()) / pooled_std if pooled_std > 0 else 0
            
            comparison_results.append({
                'Year 1': year1,
                'Year 2': year2,
                'Mean 1': data1.mean(),
                'Mean 2': data2.mean(),
                'Difference': data2.mean() - data1.mean(),
                'T-test p': t_p,
                'Mann-Whitney p': u_p,
                "Cohen's d": abs(cohens_d),
                'Significant': 'Yes' if min(t_p, u_p) < 0.05 else 'No'
            })
    
    if comparison_results:
        results_df = pd.DataFrame(comparison_results)
        results_df = results_df.round(4)
        
        print("\nüìä Comparison Results:")
        print(results_df.to_string(index=False))
        
        # Visual representation
        fig = go.Figure()
        
        for _, row in results_df.iterrows():
            color = 'green' if row['Significant'] == 'Yes' else 'gray'
            symbol = 'star' if row['Significant'] == 'Yes' else 'circle'
            
            fig.add_trace(go.Scatter(
                x=[row['Year 1'], row['Year 2']],
                y=[row['Mean 1'], row['Mean 2']],
                mode='lines+markers',
                line=dict(color=color, width=2),
                marker=dict(size=10, symbol=symbol),
                name=f"{row['Year 1']} vs {row['Year 2']}",
                text=[f"Mean: {row['Mean 1']:.3f}", f"Mean: {row['Mean 2']:.3f}"],
                hovertemplate='%{text}'
            ))
        
        fig.update_layout(
            title=f"Year-to-Year Changes: {variable_description}",
            xaxis_title="Academic Year",
            yaxis_title="Mean Value",
            hovermode='closest',
            showlegend=True
        )
        
        fig.show()
        
        # Interpretation
        print("\nüí° INTERPRETATION:")
        significant_pairs = results_df[results_df['Significant'] == 'Yes']
        
        if len(significant_pairs) > 0:
            print("  Significant differences found between:")
            for _, row in significant_pairs.iterrows():
                direction = "increased" if row['Difference'] > 0 else "decreased"
                print(f"  ‚Ä¢ {row['Year 1']} ‚Üí {row['Year 2']}: {direction} by {abs(row['Difference']):.3f}")
                
                # Interpret Cohen's d
                d = row["Cohen's d"]
                if d < 0.2:
                    effect = "negligible"
                elif d < 0.5:
                    effect = "small"
                elif d < 0.8:
                    effect = "medium"
                else:
                    effect = "large"
                print(f"    Effect size: {effect} (d={d:.3f})")
        else:
            print("  No significant pairwise differences found.")
            print("  Although overall test showed differences, individual year pairs aren't significantly different.")
            print("  This can happen when the overall trend is significant but individual steps are small.")

# Run pairwise comparisons for variables that showed significant differences
print("\nüéØ DETAILED YEAR-TO-YEAR COMPARISONS")
print("="*60)
print("We'll now examine which specific years differ from each other\n")

for category, variables in key_variables.items():
    for var_name, description in variables:
        if var_name in df.columns and var_name in all_results:
            # Check if either test was significant
            anova_p = all_results[var_name]['anova']['p_value']
            kw_p = all_results[var_name]['kruskal']['p_value']
            
            if min(anova_p, kw_p) < 0.05:
                perform_pairwise_comparisons(df, var_name, description)


üéØ DETAILED YEAR-TO-YEAR COMPARISONS
We'll now examine which specific years differ from each other


üîç PAIRWISE COMPARISONS: How many tests students attempted

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022  0.6044  0.8245      0.2200    0.0002          0.0001     0.8006         Yes
   2021    2023  0.6044  0.8849      0.2805    0.0000          0.0000     1.2256         Yes
   2022    2023  0.8245  0.8849      0.0604    0.1219          0.6439     0.3092          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 0.220
    Effect size: large (d=0.801)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 0.281
    Effect size: large (d=1.226)

üîç PAIRWISE COMPARISONS: Average score on attempted tests

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022  0.3834  0.6479      0.2645    0.0000          0.0000     1.2603         Yes
   2021    2023  0.3834  0.6622      0.2788    0.0000          0.0000     1.5804         Yes
   2022    2023  0.6479  0.6622      0.0143    0.7099          0.8591     0.0739          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 0.265
    Effect size: large (d=1.260)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 0.279
    Effect size: large (d=1.580)

üîç PAIRWISE COMPARISONS: Final course grade (1-5 scale)

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022  1.7778  2.2449      0.4671    0.0458          0.0704     0.4181         Yes
   2021    2023  1.7778  2.3137      0.5359    0.0139          0.0168     0.5126         Yes
   2022    2023  2.2449  2.3137      0.0688    0.7692          0.6790     0.0589          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 0.467
    Effect size: small (d=0.418)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 0.536
    Effect size: medium (d=0.513)

üîç PAIRWISE COMPARISONS: Total points earned (0-100 scale)

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022  2.4813  5.6627      3.1813    0.0000          0.0000     1.4727         Yes
   2021    2023  2.4813  5.9538      3.4724    0.0000          0.0000     1.9294         Yes
   2022    2023  5.6627  5.9538      0.2911    0.5166          0.8198     0.1290          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 3.181
    Effect size: large (d=1.473)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 3.472
    Effect size: large (d=1.929)

üîç PAIRWISE COMPARISONS: Percentage of maximum points

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022  0.2481  0.5663      0.3181    0.0000          0.0000     1.4727         Yes
   2021    2023  0.2481  0.5954      0.3472    0.0000          0.0000     1.9294         Yes
   2022    2023  0.5663  0.5954      0.0291    0.5166          0.8198     0.1290          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 0.318
    Effect size: large (d=1.473)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 0.347
    Effect size: large (d=1.929)

üîç PAIRWISE COMPARISONS: Defense score (max 30)

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022 23.3778 22.3673     -1.0104    0.4594          0.8882     0.1534          No
   2021    2023 23.3778 21.1739     -2.2039    0.0198          0.0308     0.4977         Yes
   2022    2023 22.3673 21.1739     -1.1934    0.3520          0.0049     0.1920         Yes



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2023: decreased by 2.204
    Effect size: small (d=0.498)
  ‚Ä¢ 2022 ‚Üí 2023: decreased by 1.193
    Effect size: negligible (d=0.192)

üîç PAIRWISE COMPARISONS: Final exam score (max 25)

üìä Comparison Results:
 Year 1  Year 2  Mean 1  Mean 2  Difference  T-test p  Mann-Whitney p  Cohen's d Significant
   2021    2022 12.2750 14.7209      2.4459    0.0156          0.0248     0.5425         Yes
   2021    2023 12.2750 14.1591      1.8841    0.1018          0.0244     0.3615         Yes
   2022    2023 14.7209 14.1591     -0.5618    0.6055          0.8782     0.1112          No



üí° INTERPRETATION:
  Significant differences found between:
  ‚Ä¢ 2021 ‚Üí 2022: increased by 2.446
    Effect size: medium (d=0.542)
  ‚Ä¢ 2021 ‚Üí 2023: increased by 1.884
    Effect size: small (d=0.361)


## 6. Summary Dashboard: What Did We Learn?

Let's create a comprehensive summary of all our findings.

In [7]:
# Create summary dashboard
def create_summary_dashboard(df, all_results, key_variables):
    """Create a comprehensive summary of all statistical findings"""
    
    print("\n" + "="*80)
    print("üìä COMPREHENSIVE SUMMARY OF YEAR-WISE CHANGES (2021-2023)")
    print("="*80)
    
    # Collect summary data
    summary_data = []
    
    for category, variables in key_variables.items():
        for var_name, description in variables:
            if var_name in df.columns and var_name in all_results:
                anova_p = all_results[var_name]['anova']['p_value']
                kw_p = all_results[var_name]['kruskal']['p_value']
                
                # Calculate overall change
                years = sorted(df['academic_year'].unique())
                first_year_mean = df[df['academic_year'] == years[0]][var_name].mean()
                last_year_mean = df[df['academic_year'] == years[-1]][var_name].mean()
                overall_change = last_year_mean - first_year_mean
                pct_change = (overall_change / first_year_mean * 100) if first_year_mean != 0 else 0
                
                summary_data.append({
                    'Category': category,
                    'Variable': var_name,
                    'Description': description,
                    'ANOVA p': anova_p,
                    'K-W p': kw_p,
                    'Significant': 'Yes' if min(anova_p, kw_p) < 0.05 else 'No',
                    f'{years[0]} Mean': first_year_mean,
                    f'{years[-1]} Mean': last_year_mean,
                    'Change': overall_change,
                    '% Change': pct_change
                })
    
    if summary_data:
        summary_df = pd.DataFrame(summary_data)
        
        # Create individual plots (not subplots to avoid the error)
        
        # 1. Pie chart of significant vs non-significant
        sig_counts = summary_df['Significant'].value_counts()
        
        fig_pie = go.Figure(data=[
            go.Pie(
                labels=['Significant Changes', 'Non-Significant Changes'],
                values=[sig_counts.get('Yes', 0), sig_counts.get('No', 0)],
                marker_colors=['#2ECC71', '#95A5A6'],
                textinfo='label+percent',
                hole=0.3,  # Make it a donut chart for better visibility
                hovertemplate='<b>%{label}</b><br>Count: %{value}<br>Percentage: %{percent}<extra></extra>'
            )
        ])
        
        fig_pie.update_layout(
            title="<b>Statistical Significance Overview</b><br><sub>How many variables showed real changes?</sub>",
            height=400,
            font=dict(size=12)
        )
        
        fig_pie.show()
        
        # 2. Bar chart of percentage changes
        # Sort by absolute percentage change for better visibility
        summary_df_sorted = summary_df.sort_values('% Change', key=abs, ascending=True)
        
        fig_bar = go.Figure(data=[
            go.Bar(
                x=summary_df_sorted['% Change'],
                y=summary_df_sorted['Variable'],
                orientation='h',
                marker_color=['#2ECC71' if sig == 'Yes' else '#95A5A6' 
                             for sig in summary_df_sorted['Significant']],
                text=[f"{pct:.1f}%" for pct in summary_df_sorted['% Change']],
                textposition='outside',
                hovertemplate='<b>%{y}</b><br>Change: %{x:.2f}%<extra></extra>'
            )
        ])
        
        fig_bar.update_layout(
            title="<b>Magnitude of Changes (2021 ‚Üí 2023)</b><br><sub>Green = Statistically Significant | Gray = Not Significant</sub>",
            xaxis_title="Percentage Change (%)",
            yaxis_title="",
            height=400,
            showlegend=False,
            font=dict(size=11),
            xaxis=dict(zeroline=True, zerolinewidth=2, zerolinecolor='black')
        )
        
        fig_bar.show()
        
        # 3. Scatter plot of p-values with significance zones
        fig_scatter = go.Figure()
        
        # Add significance threshold zones
        fig_scatter.add_shape(
            type="rect",
            x0=0, y0=0, x1=0.05, y1=0.05,
            fillcolor="lightgreen",
            opacity=0.2,
            layer="below",
            line_width=0
        )
        
        fig_scatter.add_shape(
            type="rect",
            x0=0.05, y0=0, x1=1, y1=0.05,
            fillcolor="lightyellow",
            opacity=0.2,
            layer="below",
            line_width=0
        )
        
        fig_scatter.add_shape(
            type="rect",
            x0=0, y0=0.05, x1=0.05, y1=1,
            fillcolor="lightyellow",
            opacity=0.2,
            layer="below",
            line_width=0
        )
        
        fig_scatter.add_shape(
            type="rect",
            x0=0.05, y0=0.05, x1=1, y1=1,
            fillcolor="lightcoral",
            opacity=0.2,
            layer="below",
            line_width=0
        )
        
        # Add the data points
        fig_scatter.add_trace(go.Scatter(
            x=summary_df['ANOVA p'],
            y=summary_df['K-W p'],
            mode='markers+text',
            marker=dict(
                size=12,
                color=['#2ECC71' if sig == 'Yes' else '#E74C3C' 
                      for sig in summary_df['Significant']],
                line=dict(color='white', width=1)
            ),
            text=summary_df['Variable'],
            textposition='top center',
            textfont=dict(size=10),
            hovertemplate='<b>%{text}</b><br>ANOVA p: %{x:.4f}<br>Kruskal-Wallis p: %{y:.4f}<extra></extra>'
        ))
        
        # Add reference lines
        fig_scatter.add_shape(
            type="line",
            x0=0.05, y0=0, x1=0.05, y1=1,
            line=dict(color="red", width=2, dash="dash")
        )
        
        fig_scatter.add_shape(
            type="line",
            x0=0, y0=0.05, x1=1, y1=0.05,
            line=dict(color="red", width=2, dash="dash")
        )
        
        # Add annotations for zones
        fig_scatter.add_annotation(
            x=0.025, y=0.025,
            text="Both Tests<br>Significant",
            showarrow=False,
            font=dict(color="darkgreen", size=10),
            bgcolor="white",
            opacity=0.8
        )
        
        fig_scatter.add_annotation(
            x=0.5, y=0.5,
            text="Neither Test<br>Significant",
            showarrow=False,
            font=dict(color="darkred", size=10),
            bgcolor="white",
            opacity=0.8
        )
        
        fig_scatter.update_layout(
            title="<b>Statistical Test P-values</b><br><sub>Points in green zone = significant changes detected</sub>",
            xaxis_title="ANOVA p-value",
            yaxis_title="Kruskal-Wallis p-value",
            height=500,
            showlegend=False,
            xaxis=dict(range=[0, 1], dtick=0.1),
            yaxis=dict(range=[0, 1], dtick=0.1),
            font=dict(size=11)
        )
        
        fig_scatter.show()
        
        # 4. Effect size visualization
        # Calculate effect sizes
        effect_sizes = []
        for _, row in summary_df.iterrows():
            if row['Significant'] == 'Yes':
                # Simple effect size calculation based on percentage change
                if abs(row['% Change']) < 5:
                    effect = 'Small'
                elif abs(row['% Change']) < 15:
                    effect = 'Medium'
                else:
                    effect = 'Large'
                effect_sizes.append({
                    'Variable': row['Variable'],
                    'Effect': effect,
                    'Change': row['% Change']
                })
        
        if effect_sizes:
            effect_df = pd.DataFrame(effect_sizes)
            
            # Create effect size chart
            fig_effect = go.Figure()
            
            for effect_type in ['Small', 'Medium', 'Large']:
                effect_data = effect_df[effect_df['Effect'] == effect_type]
                if len(effect_data) > 0:
                    color_map = {'Small': '#FFA500', 'Medium': '#FF6B6B', 'Large': '#FF0000'}
                    fig_effect.add_trace(go.Bar(
                        x=effect_data['Variable'],
                        y=effect_data['Change'],
                        name=f'{effect_type} Effect',
                        marker_color=color_map[effect_type],
                        hovertemplate='<b>%{x}</b><br>Change: %{y:.1f}%<br>Effect: ' + effect_type + '<extra></extra>'
                    ))
            
            fig_effect.update_layout(
                title="<b>Effect Sizes for Significant Changes</b><br><sub>Larger effects = more meaningful changes</sub>",
                xaxis_title="Variable",
                yaxis_title="Percentage Change (%)",
                barmode='group',
                height=400,
                showlegend=True,
                font=dict(size=11)
            )
            
            fig_effect.show()
        
        # Print detailed summary
        print("\nüìã DETAILED RESULTS TABLE:")
        print("="*60)
        
        for category in summary_df['Category'].unique():
            cat_data = summary_df[summary_df['Category'] == category]
            print(f"\n{category}:")
            for _, row in cat_data.iterrows():
                sig_symbol = "‚úÖ" if row['Significant'] == 'Yes' else "‚ùå"
                direction = "‚Üë" if row['Change'] > 0 else "‚Üì"
                print(f"  {sig_symbol} {row['Variable']}: {direction} {abs(row['% Change']):.1f}% (p={min(row['ANOVA p'], row['K-W p']):.4f})")
        
        # Key findings
        print("\nüéØ KEY FINDINGS:")
        print("="*60)
        
        significant_vars = summary_df[summary_df['Significant'] == 'Yes']
        
        if len(significant_vars) > 0:
            print(f"\n‚úÖ Variables showing SIGNIFICANT changes over years:")
            for _, row in significant_vars.iterrows():
                direction = "improved" if row['Change'] > 0 else "decreased"
                print(f"  ‚Ä¢ {row['Description']}: {direction} by {abs(row['% Change']):.1f}%")
            
            # Find the biggest improvements
            if len(significant_vars[significant_vars['Change'] > 0]) > 0:
                biggest_improvement = significant_vars[significant_vars['Change'] > 0].nlargest(1, '% Change').iloc[0]
                print(f"\nüèÜ Biggest improvement: {biggest_improvement['Description']} (+{biggest_improvement['% Change']:.1f}%)")
            
            # Find any concerning decreases
            decreases = significant_vars[significant_vars['Change'] < 0]
            if len(decreases) > 0:
                print(f"\n‚ö†Ô∏è Areas of concern (significant decreases):")
                for _, row in decreases.iterrows():
                    print(f"  ‚Ä¢ {row['Description']}: decreased by {abs(row['% Change']):.1f}%")
        else:
            print("\n‚ùå No variables showed statistically significant changes over the years.")
            print("   This suggests that performance has remained relatively stable.")
        
        # Non-significant but trending
        non_sig = summary_df[summary_df['Significant'] == 'No']
        trending = non_sig[abs(non_sig['% Change']) > 10]  # More than 10% change but not significant
        
        if len(trending) > 0:
            print(f"\nüìà Variables showing trends (>10% change) but NOT statistically significant:")
            print("   (These might become significant with more data)")
            for _, row in trending.iterrows():
                direction = "increased" if row['Change'] > 0 else "decreased"
                print(f"  ‚Ä¢ {row['Description']}: {direction} by {abs(row['% Change']):.1f}% (p={min(row['ANOVA p'], row['K-W p']):.3f})")
        
        # Create a summary interpretation box
        print("\n" + "="*60)
        print("üí° LAYMAN'S INTERPRETATION:")
        print("="*60)
        
        total_vars = len(summary_df)
        sig_count = len(significant_vars)
        sig_pct = (sig_count / total_vars * 100) if total_vars > 0 else 0
        
        print(f"""
Out of {total_vars} variables analyzed:
‚Ä¢ {sig_count} ({sig_pct:.0f}%) showed statistically significant changes
‚Ä¢ {total_vars - sig_count} ({100-sig_pct:.0f}%) showed no significant changes

What does this mean?
‚Üí Variables with ‚úÖ show REAL changes that are unlikely due to chance
‚Üí Variables with ‚ùå might have changed, but we can't be sure it's not random
‚Üí Green areas in plots = confident the change is real
‚Üí Gray/Red areas in plots = changes might be coincidental

Bottom line: The Flipped Classroom implementation shows {"measurable impact" if sig_count > total_vars/3 else "limited measurable impact"} 
on student performance metrics over the three-year period.
        """)

# Create the summary dashboard
create_summary_dashboard(df, all_results, key_variables)


üìä COMPREHENSIVE SUMMARY OF YEAR-WISE CHANGES (2021-2023)



üìã DETAILED RESULTS TABLE:

Student Engagement:
  ‚úÖ test_completion_rate: ‚Üë 46.4% (p=0.0000)
  ‚úÖ avg_success_rate_per_test: ‚Üë 72.7% (p=0.0000)

Academic Performance:
  ‚úÖ final_grade: ‚Üë 30.1% (p=0.0434)
  ‚úÖ fc_total_points: ‚Üë 139.9% (p=0.0000)
  ‚úÖ percentage_points: ‚Üë 139.9% (p=0.0000)

Assessment Components:
  ‚ùå presentation_points: ‚Üì 6.3% (p=0.4438)
  ‚úÖ defense_points: ‚Üì 9.4% (p=0.0148)
  ‚ùå exam_k2: ‚Üë 13.4% (p=0.3134)
  ‚úÖ exam_k3: ‚Üë 15.3% (p=0.0350)

üéØ KEY FINDINGS:

‚úÖ Variables showing SIGNIFICANT changes over years:
  ‚Ä¢ How many tests students attempted: improved by 46.4%
  ‚Ä¢ Average score on attempted tests: improved by 72.7%
  ‚Ä¢ Final course grade (1-5 scale): improved by 30.1%
  ‚Ä¢ Total points earned (0-100 scale): improved by 139.9%
  ‚Ä¢ Percentage of maximum points: improved by 139.9%
  ‚Ä¢ Defense score (max 30): decreased by 9.4%
  ‚Ä¢ Final exam score (max 25): improved by 15.3%

üèÜ Biggest improvement: Total points earn

## 7. Conclusions and Recommendations

### üéØ What We've Learned:

Based on our statistical analysis of student performance data from 2021-2023, we can draw evidence-based conclusions about the effectiveness of the Flipped Classroom methodology.

### üìä Statistical Evidence Interpretation:

1. **Significant Changes (p < 0.05)**: These are real improvements/changes, not random chance
2. **Non-significant Changes (p ‚â• 0.05)**: Could be random variation or need more data
3. **Effect Sizes**: Tell us if changes are practically meaningful, not just statistically detectable

### üí° Recommendations:

1. **Continue monitoring** variables showing positive trends
2. **Investigate** any areas showing significant decreases
3. **Collect more data** for variables showing trends but not significance
4. **Focus interventions** on areas with largest effect sizes

In [9]:
# Final interpretation helper
print("\n" + "="*80)
print("üìö HOW TO INTERPRET THESE RESULTS")
print("="*80)

print("""
üéØ Quick Reference Guide:

1. P-VALUES:
   ‚Ä¢ p < 0.001: Extremely strong evidence (99.9% confident)
   ‚Ä¢ p < 0.01:  Very strong evidence (99% confident)
   ‚Ä¢ p < 0.05:  Strong evidence (95% confident) ‚Üê Standard threshold
   ‚Ä¢ p ‚â• 0.05:  Insufficient evidence (need more data or no real difference)

2. EFFECT SIZES (Cohen's d):
   ‚Ä¢ d < 0.2:  Negligible (too small to matter)
   ‚Ä¢ d = 0.2-0.5: Small (noticeable but modest)
   ‚Ä¢ d = 0.5-0.8: Medium (clear practical importance)
   ‚Ä¢ d > 0.8:  Large (substantial impact)

3. PERCENTAGE CHANGES:
   ‚Ä¢ < 5%:   Minor change
   ‚Ä¢ 5-10%:  Moderate change
   ‚Ä¢ 10-20%: Substantial change
   ‚Ä¢ > 20%:  Major change

4. WHAT MAKES A FINDING "ACTIONABLE":
   ‚úÖ Statistically significant (p < 0.05)
   ‚úÖ Meaningful effect size (d > 0.2)
   ‚úÖ Consistent direction across multiple measures
   ‚úÖ Aligns with educational goals

5. LIMITATIONS TO REMEMBER:
   ‚ö†Ô∏è Statistical significance ‚â† practical importance
   ‚ö†Ô∏è Correlation ‚â† causation
   ‚ö†Ô∏è Small sample sizes reduce statistical power
   ‚ö†Ô∏è Multiple comparisons increase false positive risk
""")



üìö HOW TO INTERPRET THESE RESULTS

üéØ Quick Reference Guide:

1. P-VALUES:
   ‚Ä¢ p < 0.001: Extremely strong evidence (99.9% confident)
   ‚Ä¢ p < 0.01:  Very strong evidence (99% confident)
   ‚Ä¢ p < 0.05:  Strong evidence (95% confident) ‚Üê Standard threshold
   ‚Ä¢ p ‚â• 0.05:  Insufficient evidence (need more data or no real difference)

2. EFFECT SIZES (Cohen's d):
   ‚Ä¢ d < 0.2:  Negligible (too small to matter)
   ‚Ä¢ d = 0.2-0.5: Small (noticeable but modest)
   ‚Ä¢ d = 0.5-0.8: Medium (clear practical importance)
   ‚Ä¢ d > 0.8:  Large (substantial impact)

3. PERCENTAGE CHANGES:
   ‚Ä¢ < 5%:   Minor change
   ‚Ä¢ 5-10%:  Moderate change
   ‚Ä¢ 10-20%: Substantial change
   ‚Ä¢ > 20%:  Major change

4. WHAT MAKES A FINDING "ACTIONABLE":
   ‚úÖ Statistically significant (p < 0.05)
   ‚úÖ Meaningful effect size (d > 0.2)
   ‚úÖ Consistent direction across multiple measures
   ‚úÖ Aligns with educational goals

5. LIMITATIONS TO REMEMBER:
   ‚ö†Ô∏è Statistical significan