# 🦐 Exuviae Scale Impact Analysis

## ⚖️ Scaling Conversion Effect Assessment Using Known-Length Molted Shells

---

### 🎯 **Research Objective**

This notebook analyzes **how the scaling conversion process affects measurement changes** using exuviae (molted prawn shells) with precisely known lengths to quantify scaling impact on final measurements **across different pond environments**.

### 🏊‍♀️ **Pond Type Classification**

- **Circle Pond**: Images containing 'GX010191' in filename
- **Rectangular Pond**: All other images

### 📊 **Core Scientific Formula**

$$\rho = \frac{\Delta_{mm\%}}{\Delta_{px\%}}$$

**🔍 CRITICAL: ρ measures scaling impact using percentage errors!**

Where:
- **ρ (rho)** = **Scale impact ratio** (how scaling affects measurement changes)
- **Δ_mm%** = Percentage error in total length measurement after scaling
- **Δ_px%** = Percentage error in original pixel measurement

### 🔍 **Interpretation of ρ Values**

| **ρ Value** | **Meaning** | **Scaling Impact** |
|-------------|-------------|------------|
| **ρ ≈ 1** | Neutral scaling | Scaling **preserves percentage error proportions** |
| **ρ > 1** | **Error amplification** | Scaling **magnifies percentage errors** |
| **ρ < 1** | **Error compression** | Scaling **reduces percentage errors** |

---

### 🔬 **Scientific Rationale**

**Why Exuviae Are Perfect for Scale Impact Analysis:**

1. **🎯 Precise Known Lengths (Reference Standards)**
   - Small exuviae: **145mm** (standardized reference)
   - Large exuviae: **180mm** (standardized reference)
   - Rigid chitinous structure provides **exact scaling baseline**

2. **⚖️ Scaling Conversion Assessment**
   - Measured under actual scaling conditions
   - Real-world scaling parameter validation
   - Same conversion process as live prawn measurements

3. **🏊‍♀️ Multi-Environment Analysis**
   - **Circle vs Rectangular pond comparison**
   - Different camera distances and scaling factors
   - Environment-specific scaling behavior patterns

---

### 🏊‍♀️ **Experimental Setup**

| **Parameter** | **Circle Pond (GX010191)** | **Rectangular Pond (Others)** |
|---------------|----------------------------|-------------------------------|
| **Camera Distance (Big)** | 660mm | 370mm |
| **Camera Distance (Small)** | 680mm | 390mm |
| **Image Resolution** | 5312 × 2988 pixels | 5312 × 2988 pixels |
| **Horizontal FOV** | 75° | 75° |
| **Vertical FOV** | 46° | 46° |

---

### 📈 **Analysis Components**

1. **⚖️ Scale Impact Engine**
   - Percentage error calculation for pixels and measurements
   - ρ calculation for each measurement by pond type
   - Scaling amplification/compression detection

2. **📊 Comparative Scaling Analysis**
   - **Circle vs Rectangular pond scaling behavior**
   - Scale impact ratio (ρ) distribution analysis by environment
   - Statistical comparison of scaling patterns

3. **🎯 Reference Standard Validation**
   - Known exuviae lengths vs scaled predictions
   - Environment-specific scaling accuracy assessment
   - Pond-specific statistical scaling characterization

4. **🦐 Size-Class Analysis**
   - Big vs small exuviae scaling behavior
   - Size-dependent scaling effects by pond type
   - Cross-environment size scaling comparison

---

### 🚀 **Expected Outcomes**

- **Quantified scale impact ratios (ρ)** for each pond environment
- **Comparative scaling behavior analysis** between Circle and Rectangular ponds
- **Environment-specific scaling optimization** recommendations
- **Size-class scaling patterns** across different pond geometries

---

*This analysis reveals how pond geometry and camera positioning affect scaling conversion accuracy, enabling environment-specific measurement optimization.*


## 📊 PHASE 1: Data Loading and Pond Classification

This phase loads the exuviae measurement dataset and classifies measurements by pond type based on image names. We establish the foundation for comparative analysis between Circle and Rectangular pond environments.

**Key Operations:**
- Load exuviae measurement data
- Classify pond types: `'GX010191'` → Circle Pond, others → Rectangular Pond  
- Display pond distribution and sample data
- Prepare dataset for scale impact analysis


In [6]:
# 🦐 EXUVIAE SCALE IMPACT ANALYSIS - MULTI-POND ENVIRONMENT
# =========================================================
"""
🎯 **Analysis Objective**: 
Scale Impact Validation using Known-Length Exuviae across Different Pond Types

📐 **Core Formula**: ρ = Δmm% / Δpx% (Scale Impact Ratio)
Where:
- ρ (rho) = Scale impact ratio (percentage error amplification/compression)
- Δmm% = Percentage error in real-world measurements
- Δpx% = Percentage error in pixel measurements

🏊‍♀️ **Pond Classification**:
- Circle Pond: Images containing 'GX010191' in filename
- Rectangular Pond: All other images

🔬 **Scientific Rationale**:
Exuviae provide perfect calibration objects because:
1. Known precise lengths (145mm small, 180mm big)
2. Rigid structure maintains dimensions
3. Natural aquaculture environment context
4. Multiple pond environments for comparative analysis

📊 **Analysis Components**:
- Pond-specific scaling behavior analysis
- Comparative scale impact assessment
- Environment-dependent scaling optimization
- Size-class scaling patterns by pond type
"""

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import math
import warnings
from typing import Dict, List, Tuple, Any, Optional
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print("🚀 Starting Multi-Pond Exuviae Scale Impact Analysis...")
print("📐 Formula: ρ = Δmm% / Δpx% (Scale Impact Validation)")
print("🏊‍♀️ Analyzing Circle vs Rectangular pond scaling behavior")
print("=" * 80)

# 📊 PHASE 1: Data Loading and Pond Classification
# ===============================================

print("📋 Loading exuviae measurement data...")

# Load the processed exuviae dataset
df_exuviae = pd.read_csv('fifty_one and analysis/measurements/exuviae/spreadsheet_files/length_analysis_new_split_shai_exuviae_with_yolo.csv')

print(f"✅ Dataset loaded successfully!")
print(f"📊 Dataset shape: {df_exuviae.shape}")
print(f"🦐 Total exuviae measurements: {len(df_exuviae)}")

# 🏊‍♀️ Pond Type Classification
# =============================

print("\n🏊‍♀️ PHASE 1: Pond Type Classification...")
print("📊 Classifying measurements by pond geometry...")
print("=" * 60)

# Classify pond types based on image names
df_exuviae['pond_type'] = df_exuviae['image_name'].apply(
    lambda x: 'Circle' if 'GX010191' in str(x) else 'Rectangular'
)

# Display pond type distribution
pond_counts = df_exuviae['pond_type'].value_counts()
print(f"🔵 Circle Pond measurements: {pond_counts.get('Circle', 0)}")
print(f"🔲 Rectangular Pond measurements: {pond_counts.get('Rectangular', 0)}")

# Display size distribution by pond type
print(f"\n🦐 Size Distribution by Pond Type:")
size_by_pond = df_exuviae.groupby(['pond_type', 'lobster_size']).size().unstack(fill_value=0)
print(size_by_pond)

# Display sample data with pond classification
print(f"\n📋 Sample Data with Pond Classification:")
sample_cols = ['image_name', 'pond_type', 'lobster_size', 'real_length', 'total_length', 'pixels_total_length', 'Length']
display(df_exuviae[sample_cols].head(10))


🚀 Starting Multi-Pond Exuviae Scale Impact Analysis...
📐 Formula: ρ = Δmm% / Δpx% (Scale Impact Validation)
🏊‍♀️ Analyzing Circle vs Rectangular pond scaling behavior
📋 Loading exuviae measurement data...
✅ Dataset loaded successfully!
📊 Dataset shape: (51, 39)
🦐 Total exuviae measurements: 51

🏊‍♀️ PHASE 1: Pond Type Classification...
📊 Classifying measurements by pond geometry...
🔵 Circle Pond measurements: 39
🔲 Rectangular Pond measurements: 12

🦐 Size Distribution by Pond Type:
lobster_size  big  small
pond_type               
Circle         22     17
Rectangular     3      9

📋 Sample Data with Pond Classification:


Unnamed: 0,image_name,pond_type,lobster_size,real_length,total_length,pixels_total_length,Length
0,undistorted_GX010191_37_1242,Circle,small,145,139.2,719.0,718.51
1,undistorted_GX010191_8_309,Circle,small,145,154.8,790.1,789.582
2,undistorted_GX010191_32_305,Circle,small,145,154.2,786.3,789.795
3,undistorted_GX010193_11_1065,Rectangular,big,180,184.3,1745.4,1740.667
4,undistorted_GX010191_100_1250,Circle,small,145,143.0,739.4,732.732
5,undistorted_GX010193_27_1553,Rectangular,big,180,186.6,1774.1,1766.071
6,undistorted_GX010194_39_513,Rectangular,small,145,144.1,1300.3,1308.391
7,undistorted_GX010191_94_1132,Circle,big,180,156.7,832.0,822.289
8,undistorted_GX010191_35_1167,Circle,small,145,149.2,759.4,770.049
9,undistorted_GX010191_31_283,Circle,big,180,174.8,931.8,945.736


## ⚖️ PHASE 2: Scale Impact Ratio Calculation by Pond Type

This phase calculates the scale impact ratio (ρ) using percentage errors and performs statistical analysis focused on **median values** for robust comparison between pond types. We emphasize median statistics as they are less sensitive to outliers.

**Key Operations:**
- Calculate percentage errors: Δpx% and Δmm%
- Compute scale impact ratio: ρ = Δmm% / Δpx%
- **Focus on median ρ values** for robust pond comparison
- Classify scaling behavior by pond type
- Perform comparative analysis between Circle vs Rectangular ponds

**Why Median?**
- More robust to outliers than mean
- Better represents typical scaling behavior
- Provides reliable pond comparison metrics


In [7]:
# ⚖️ PHASE 2: Scale Impact Ratio (ρ) Calculation by Pond Type
# ===========================================================

print("\n⚖️ PHASE 2: Calculating Scale Impact Ratios by Pond Type...")
print("📊 Computing ρ = Δmm% / Δpx% for Circle vs Rectangular ponds...")
print("=" * 80)

# 🔧 Calculate Scale Impact Ratios (ρ) using percentage errors
# ============================================================

# Calculate pixel percentage differences (Δpx%)
df_exuviae['pixel_difference_pct'] = abs(df_exuviae['pixels_total_length'] - df_exuviae['Length']) / df_exuviae['Length'] * 100

# Calculate measurement percentage differences after scaling (Δmm%)
df_exuviae['measurement_difference_pct'] = abs(df_exuviae['total_length'] - df_exuviae['real_length']) / df_exuviae['real_length'] * 100

# Calculate Scale Impact Ratio: ρ = Δmm% / Δpx%
# Handle division by zero cases
df_exuviae['scale_impact_ratio_rho'] = np.where(
    df_exuviae['pixel_difference_pct'] != 0,
    df_exuviae['measurement_difference_pct'] / df_exuviae['pixel_difference_pct'],
    np.nan  # Set to NaN when pixel difference is zero
)

print(f"✅ Scale impact ratios (ρ) calculated for {len(df_exuviae)} exuviae")
print(f"📊 Using percentage errors: ρ = (Δmm%/real_length) / (Δpx%/Length)")

# 📊 Statistical Analysis by Pond Type
# ===================================

print(f"\n📈 STATISTICAL ANALYSIS BY POND TYPE")
print("=" * 60)

# Analyze each pond type separately
pond_stats = {}

for pond_type in ['Circle', 'Rectangular']:
    pond_data = df_exuviae[df_exuviae['pond_type'] == pond_type]
    rho_values = pond_data['scale_impact_ratio_rho'].dropna()
    valid_measurements = len(rho_values)
    
    if valid_measurements > 0:
        # Calculate statistics
        stats_dict = {
            'count': valid_measurements,
            'mean': rho_values.median(),
            'median': rho_values.median(),
            'std': rho_values.std(),
            'min': rho_values.min(),
            'max': rho_values.max(),
            'q25': rho_values.quantile(0.25),
            'q75': rho_values.quantile(0.75),
            'iqr': rho_values.quantile(0.75) - rho_values.quantile(0.25)
        }
        
        # Scaling behavior classification
        neutral_scaling = (abs(rho_values - 1) < 0.1).sum()
        amplifying_scaling = (rho_values > 1.1).sum()
        compressing_scaling = (rho_values < 0.9).sum()
        
        stats_dict.update({
            'neutral_count': neutral_scaling,
            'amplifying_count': amplifying_scaling,
            'compressing_count': compressing_scaling,
            'neutral_pct': neutral_scaling / valid_measurements * 100,
            'amplifying_pct': amplifying_scaling / valid_measurements * 100,
            'compressing_pct': compressing_scaling / valid_measurements * 100
        })
        
        pond_stats[pond_type] = stats_dict
        
        # Display statistics focused on median
        pond_icon = "🔵" if pond_type == "Circle" else "🔲"
        print(f"\n{pond_icon} {pond_type.upper()} POND ANALYSIS:")
        print(f"   📊 Valid measurements: {valid_measurements}")
        print(f"   📊 **Median ρ: {stats_dict['median']:.4f}** (primary metric)")
        print(f"   📊 IQR: {stats_dict['iqr']:.4f}")
        print(f"   📊 Range: [{stats_dict['min']:.4f}, {stats_dict['max']:.4f}]")
        
        print(f"\n   ⚖️ SCALING BEHAVIOR:")
        print(f"   🔄 Neutral (|ρ-1| < 0.1): {neutral_scaling} ({stats_dict['neutral_pct']:.1f}%)")
        print(f"   📈 Amplifying (ρ > 1.1): {amplifying_scaling} ({stats_dict['amplifying_pct']:.1f}%)")
        print(f"   📉 Compressing (ρ < 0.9): {compressing_scaling} ({stats_dict['compressing_pct']:.1f}%)")
        
        # Interpretation based on median
        median_rho = stats_dict['median']
        print(f"\n   🔍 INTERPRETATION (based on median ρ):")
        if abs(median_rho - 1) < 0.1:
            print(f"   ✅ NEUTRAL scaling (median ρ ≈ 1): Percentage errors preserved")
        elif median_rho > 1.1:
            print(f"   ⚠️ AMPLIFYING scaling (median ρ > 1): Errors magnified by {median_rho:.1f}x")
        elif median_rho < 0.9:
            print(f"   ⚠️ COMPRESSING scaling (median ρ < 1): Errors reduced to {median_rho:.1f}x")
        else:
            print(f"   🔍 MIXED behavior (median ρ = {median_rho:.3f}): Variable scaling impact")
    else:
        print(f"\n{pond_icon} {pond_type.upper()} POND: No valid measurements")

# 📊 Comparative Analysis
# ======================

if len(pond_stats) == 2:
    circle_stats = pond_stats.get('Circle', {})
    rect_stats = pond_stats.get('Rectangular', {})
    
    print(f"\n🔄 COMPARATIVE ANALYSIS: Circle vs Rectangular")
    print("=" * 60)
    
    if circle_stats and rect_stats:
        median_diff = abs(circle_stats['median'] - rect_stats['median'])
        iqr_diff = abs(circle_stats['iqr'] - rect_stats['iqr'])
        
        print(f"📊 **Median ρ difference: {median_diff:.4f}**")
        print(f"📊 IQR difference: {iqr_diff:.4f}")
        
        # Compare median values
        circle_median = circle_stats['median']
        rect_median = rect_stats['median']
        print(f"📊 Circle median ρ: {circle_median:.4f}")
        print(f"📊 Rectangular median ρ: {rect_median:.4f}")
        
        # Determine which pond has better scaling behavior (closer to 1)
        circle_deviation = abs(circle_median - 1)
        rect_deviation = abs(rect_median - 1)
        
        if circle_deviation < rect_deviation:
            print(f"🏆 Circle pond has more neutral median scaling: {circle_median:.4f} vs {rect_median:.4f}")
        elif rect_deviation < circle_deviation:
            print(f"🏆 Rectangular pond has more neutral median scaling: {rect_median:.4f} vs {circle_median:.4f}")
        else:
            print(f"⚖️ Both ponds show similar median scaling behavior")

# 📋 Display sample calculations by pond type
print(f"\n📋 SAMPLE CALCULATIONS BY POND TYPE:")
print("-" * 60)

sample_cols = ['image_name', 'pond_type', 'lobster_size', 'pixel_difference_pct', 'measurement_difference_pct', 'scale_impact_ratio_rho']
display(df_exuviae[sample_cols].head(15))



⚖️ PHASE 2: Calculating Scale Impact Ratios by Pond Type...
📊 Computing ρ = Δmm% / Δpx% for Circle vs Rectangular ponds...
✅ Scale impact ratios (ρ) calculated for 51 exuviae
📊 Using percentage errors: ρ = (Δmm%/real_length) / (Δpx%/Length)

📈 STATISTICAL ANALYSIS BY POND TYPE

🔵 CIRCLE POND ANALYSIS:
   📊 Valid measurements: 39
   📊 **Median ρ: 1.0421** (primary metric)
   📊 IQR: 1.2675
   📊 Range: [0.0340, 103.0210]

   ⚖️ SCALING BEHAVIOR:
   🔄 Neutral (|ρ-1| < 0.1): 4 (10.3%)
   📈 Amplifying (ρ > 1.1): 18 (46.2%)
   📉 Compressing (ρ < 0.9): 17 (43.6%)

   🔍 INTERPRETATION (based on median ρ):
   ✅ NEUTRAL scaling (median ρ ≈ 1): Percentage errors preserved

🔲 RECTANGULAR POND ANALYSIS:
   📊 Valid measurements: 12
   📊 **Median ρ: 1.0946** (primary metric)
   📊 IQR: 0.8797
   📊 Range: [0.1672, 8.7857]

   ⚖️ SCALING BEHAVIOR:
   🔄 Neutral (|ρ-1| < 0.1): 2 (16.7%)
   📈 Amplifying (ρ > 1.1): 6 (50.0%)
   📉 Compressing (ρ < 0.9): 4 (33.3%)

   🔍 INTERPRETATION (based on median ρ):
   

Unnamed: 0,image_name,pond_type,lobster_size,pixel_difference_pct,measurement_difference_pct,scale_impact_ratio_rho
0,undistorted_GX010191_37_1242,Circle,small,0.068197,4.0,58.653878
1,undistorted_GX010191_8_309,Circle,small,0.065604,6.758621,103.020951
2,undistorted_GX010191_32_305,Circle,small,0.44252,6.344828,14.337949
3,undistorted_GX010193_11_1065,Rectangular,big,0.271907,2.388889,8.785675
4,undistorted_GX010191_100_1250,Circle,small,0.910019,1.37931,1.515694
5,undistorted_GX010193_27_1553,Rectangular,big,0.454625,3.666667,8.065256
6,undistorted_GX010194_39_513,Rectangular,small,0.618393,0.62069,1.003714
7,undistorted_GX010191_94_1132,Circle,big,1.180972,12.944444,10.960843
8,undistorted_GX010191_35_1167,Circle,small,1.382899,2.896552,2.09455
9,undistorted_GX010191_31_283,Circle,big,1.473561,2.888889,1.960481


## 📊 PHASE 3: Multi-Pond Visualization Dashboard

This phase creates comprehensive visualizations focused on **median-based comparisons** between pond types. The dashboard emphasizes robust statistical measures and provides clear visual comparison of scaling behavior across different pond environments.

**Key Visualizations:**
- **Median-focused** statistical comparisons between pond types
- Distribution analysis with emphasis on central tendencies
- Size-class analysis by pond type
- Correlation analysis between measurement errors and scaling impact
- Performance metrics highlighting median scaling behavior

**Dashboard Components:**
1. ρ distribution histograms by pond type
2. Box plots emphasizing median comparisons
3. Size-class scatter plots by pond type
4. Error correlation analysis
5. **Median statistical summary** (primary focus)
6. Scaling performance metrics


In [8]:
# 📊 PHASE 3: Comprehensive Multi-Pond Visualization Dashboard
# ===========================================================

print("\n📊 PHASE 3: Creating Multi-Pond Visualization Dashboard...")
print("🎨 Comprehensive ρ analysis across Circle vs Rectangular ponds...")
print("=" * 80)

# Define consistent colors for pond types
POND_COLORS = {
    'Circle': '#4ECDC4',      # Teal
    'Rectangular': '#FF6B6B'  # Coral Red
}

# 🎨 Create comprehensive 6-panel visualization dashboard
fig = make_subplots(
    rows=3, cols=2,
    subplot_titles=[
        '📊 Scale Impact Ratio (ρ) Distribution by Pond Type',
        '⚖️ Scaling Behavior Comparison', 
        '🦐 ρ Values by Exuviae Size and Pond Type',
        '📈 ρ vs Measurement Error Correlation',
        '🔍 Statistical Summary by Pond Type',
        '🏊‍♀️ Pond-Specific Scaling Performance'
    ],
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# 📊 1. Histogram of ρ values by pond type
for pond_type in ['Circle', 'Rectangular']:
    pond_data = df_exuviae[df_exuviae['pond_type'] == pond_type]
    rho_clean = pond_data['scale_impact_ratio_rho'].dropna()
    
    if len(rho_clean) > 0:
        fig.add_trace(
            go.Histogram(
                x=rho_clean,
                nbinsx=15,
                name=f'{pond_type} Pond',
                marker_color=POND_COLORS[pond_type],
                opacity=0.7,
                hovertemplate=f"<b>{pond_type} Pond</b><br>ρ Range: %{{x}}<br>Count: %{{y}}<extra></extra>"
            ),
            row=1, col=1
        )

# Add reference lines for neutral scaling
fig.add_vline(x=1, line_dash="dash", line_color="gray", opacity=0.7, row=1, col=1)
fig.add_vline(x=1.1, line_dash="dot", line_color="red", opacity=0.5, row=1, col=1)
fig.add_vline(x=0.9, line_dash="dot", line_color="blue", opacity=0.5, row=1, col=1)

# 🏊‍♀️ 2. Box plot comparison by pond type
for pond_type in ['Circle', 'Rectangular']:
    pond_data = df_exuviae[df_exuviae['pond_type'] == pond_type]
    pond_rho = pond_data['scale_impact_ratio_rho'].dropna()
    
    if len(pond_rho) > 0:
        fig.add_trace(
            go.Box(
                y=pond_rho,
                name=f'{pond_type} Pond',
                boxpoints='all',
                jitter=0.3,
                pointpos=-1.8,
                marker_color=POND_COLORS[pond_type],
                hovertemplate=f"<b>{pond_type} Pond</b><br>ρ: %{{y:.3f}}<extra></extra>"
            ),
            row=1, col=2
        )

# 🦐 3. Scatter plot by exuviae size and pond type
for pond_type in ['Circle', 'Rectangular']:
    for size in ['big', 'small']:
        subset = df_exuviae[(df_exuviae['pond_type'] == pond_type) & 
                           (df_exuviae['lobster_size'] == size)]
        
        if len(subset) > 0:
            fig.add_trace(
                go.Scatter(
                    x=[f"{pond_type}-{size.title()}"] * len(subset),
                    y=subset['scale_impact_ratio_rho'],
                    mode='markers',
                    name=f'{pond_type} {size.title()}',
                    marker=dict(
                        size=10,
                        opacity=0.8,
                        color=POND_COLORS[pond_type],
                        symbol='circle' if size == 'big' else 'diamond'
                    ),
                    hovertemplate=f"<b>{pond_type} Pond - {size.title()} Exuviae</b><br>" +
                                 "ρ: %{y:.3f}<br>" +
                                 "Image: %{customdata}<extra></extra>",
                    customdata=subset['image_name']
                ),
                row=2, col=1
            )

# 📈 4. ρ vs measurement error correlation by pond type
for pond_type in ['Circle', 'Rectangular']:
    pond_data = df_exuviae[df_exuviae['pond_type'] == pond_type]
    
    fig.add_trace(
        go.Scatter(
            x=pond_data['measurement_difference_pct'],
            y=pond_data['scale_impact_ratio_rho'],
            mode='markers',
            name=f'{pond_type} Correlation',
            marker=dict(
                size=8,
                opacity=0.7,
                color=POND_COLORS[pond_type]
            ),
            hovertemplate=f"<b>{pond_type} Pond</b><br>" +
                         "Measurement Error %: %{x:.2f}%<br>" +
                         "ρ: %{y:.3f}<br>" +
                         "Image: %{customdata}<extra></extra>",
            customdata=pond_data['image_name']
        ),
        row=2, col=2
    )

# Add reference lines for scaling behavior zones
fig.add_hline(y=1, line_dash="dash", line_color="gray", opacity=0.7, row=2, col=2)
fig.add_hline(y=1.1, line_dash="dot", line_color="red", opacity=0.5, row=2, col=2)
fig.add_hline(y=0.9, line_dash="dot", line_color="blue", opacity=0.5, row=2, col=2)

# 🔍 5. Median-focused statistical summary
if len(pond_stats) == 2:
    pond_types = list(pond_stats.keys())
    
    # Focus on median values only
    median_values = [pond_stats[pond]['median'] for pond in pond_types]
    
    fig.add_trace(
        go.Bar(
            x=pond_types,
            y=median_values,
            name='Median ρ',
            marker_color=[POND_COLORS[pond] for pond in pond_types],
            opacity=0.8,
            text=[f'{val:.3f}' for val in median_values],
            textposition='auto',
            hovertemplate="<b>%{x} Pond</b><br>Median ρ: %{y:.3f}<br>Deviation from neutral (|ρ-1|): %{customdata:.3f}<extra></extra>",
            customdata=[abs(val - 1) for val in median_values]
        ),
        row=3, col=1
    )
    
    # Add reference line at ρ = 1 (neutral scaling)
    fig.add_hline(y=1, line_dash="dash", line_color="gray", opacity=0.7, row=3, col=1)

# 🏊‍♀️ 6. Scaling performance by pond type (neutral scaling percentage)
if len(pond_stats) == 2:
    performance_metrics = ['neutral_pct', 'amplifying_pct', 'compressing_pct']
    metric_names = ['Neutral', 'Amplifying', 'Compressing']
    metric_colors = ['green', 'red', 'blue']
    
    for i, (metric, name, color) in enumerate(zip(performance_metrics, metric_names, metric_colors)):
        values = [pond_stats[pond][metric] for pond in pond_types]
        
        fig.add_trace(
            go.Bar(
                x=pond_types,
                y=values,
                name=f'{name} Scaling %',
                marker_color=color,
                opacity=0.7,
                hovertemplate=f"<b>%{{x}} Pond</b><br>{name} Scaling: %{{y:.1f}}%<extra></extra>"
            ),
            row=3, col=2
        )

# Update layout
fig.update_layout(
    title_text="🏊‍♀️ Multi-Pond Scale Impact Analysis Dashboard",
    title_font_size=20,
    height=1200,
    width=1400,
    showlegend=True
)

# Update axes labels
fig.update_xaxes(title_text="ρ Value", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=1, col=1)

fig.update_xaxes(title_text="Pond Type", row=1, col=2)
fig.update_yaxes(title_text="ρ Value", row=1, col=2)

fig.update_xaxes(title_text="Pond-Size Category", row=2, col=1)
fig.update_yaxes(title_text="ρ Value", row=2, col=1)

fig.update_xaxes(title_text="Measurement Error (%)", row=2, col=2)
fig.update_yaxes(title_text="Scale Impact Ratio (ρ)", row=2, col=2)

fig.update_xaxes(title_text="Pond Type", row=3, col=1)
fig.update_yaxes(title_text="Statistical Value", row=3, col=1)

fig.update_xaxes(title_text="Pond Type", row=3, col=2)
fig.update_yaxes(title_text="Percentage (%)", row=3, col=2)

fig.show()

print(f"\n✅ Multi-pond scale impact analysis complete!")
print(f"🎯 Key insight: Median ρ provides robust comparison of scaling effects across pond geometries")
print(f"🏊‍♀️ Median-based analysis reveals reliable environment-specific scaling behaviors")

# 📊 Summary of median findings
if len(pond_stats) == 2:
    print(f"\n📊 MEDIAN SCALING SUMMARY:")
    print("=" * 50)
    for pond_type in pond_stats:
        median_rho = pond_stats[pond_type]['median']
        deviation = abs(median_rho - 1)
        pond_icon = "🔵" if pond_type == "Circle" else "🔲"
        print(f"{pond_icon} {pond_type} Pond: Median ρ = {median_rho:.4f} (deviation from neutral: {deviation:.4f})")
    
    # Determine best pond based on median
    circle_median = pond_stats.get('Circle', {}).get('median', float('inf'))
    rect_median = pond_stats.get('Rectangular', {}).get('median', float('inf'))
    
    circle_dev = abs(circle_median - 1)
    rect_dev = abs(rect_median - 1)
    
    if circle_dev < rect_dev:
        print(f"\n🏆 CONCLUSION: Circle pond shows more neutral median scaling behavior")
    elif rect_dev < circle_dev:
        print(f"\n🏆 CONCLUSION: Rectangular pond shows more neutral median scaling behavior")
    else:
        print(f"\n⚖️ CONCLUSION: Both ponds show similar median scaling behavior")



📊 PHASE 3: Creating Multi-Pond Visualization Dashboard...
🎨 Comprehensive ρ analysis across Circle vs Rectangular ponds...



✅ Multi-pond scale impact analysis complete!
🎯 Key insight: Median ρ provides robust comparison of scaling effects across pond geometries
🏊‍♀️ Median-based analysis reveals reliable environment-specific scaling behaviors

📊 MEDIAN SCALING SUMMARY:
🔵 Circle Pond: Median ρ = 1.0421 (deviation from neutral: 0.0421)
🔲 Rectangular Pond: Median ρ = 1.0946 (deviation from neutral: 0.0946)

🏆 CONCLUSION: Circle pond shows more neutral median scaling behavior


## 📈 PHASE 4: Pixel Measurement Accuracy Analysis

This phase analyzes the accuracy of pixel-based measurements by comparing them against ground truth measurements, categorized by relative error thresholds. This visualization helps identify measurement quality patterns and potential systematic errors in the pixel detection system.

**Key Analysis:**
- **Error Categorization**: Color-coded scatter plot based on relative error thresholds (≤5%, 5-10%, 10-15%, >15%)
- **Perfect Correlation Line**: Diagonal reference line showing ideal 1:1 relationship
- **Comprehensive Hover Information**: Detailed measurement data for each point
- **Quality Assessment**: Visual identification of high vs low accuracy measurements

**Error Thresholds:**
- 🟢 **Green**: Excellent accuracy (≤5% error)
- 🟡 **Yellow**: Good accuracy (5-10% error)  
- 🟠 **Orange**: Moderate accuracy (10-15% error)
- 🔴 **Red**: Poor accuracy (>15% error)

**Purpose:**
- Evaluate pixel measurement system performance
- Identify patterns in measurement errors
- Assess correlation between different measurement methods
- Support quality control and system optimization


In [9]:
# 📈 PHASE 4: Pixel Measurement Accuracy Analysis
# ===============================================

print("\n📈 PHASE 4: Analyzing Pixel Measurement Accuracy...")
print("🎯 Creating error-categorized scatter plot for measurement quality assessment...")
print("=" * 80)

# Create a scatter plot using Plotly
fig = go.Figure()

# Define error thresholds and corresponding colors for plotting
error_thresholds = [5, 10, 15]
colors = ['green', 'yellow', 'orange', 'red']

# Iterate over error thresholds to categorize data points by relative error
for i in range(len(error_thresholds) + 1):
    if i == 0:
        # Mask for errors less than or equal to the first threshold
        mask = df_exuviae['real_length_rel_diff'] <= error_thresholds[0]
        label = f'Error ≤ {error_thresholds[0]}%'
    elif i == len(error_thresholds):
        # Mask for errors greater than the last threshold
        mask = df_exuviae['real_length_rel_diff'] > error_thresholds[-1]
        label = f'Error > {error_thresholds[-1]}%'
    else:
        # Mask for errors between two thresholds
        mask = (df_exuviae['real_length_rel_diff'] > error_thresholds[i-1]) & \
               (df_exuviae['real_length_rel_diff'] <= error_thresholds[i])
        label = f'{error_thresholds[i-1]}% < Error ≤ {error_thresholds[i]}%'
    
    # Add a scatter plot trace for each error category
    fig.add_trace(go.Scatter(
        x=df_exuviae[mask]['Length'],
        y=df_exuviae[mask]['pixels_total_length'],
        mode='markers',
        name=label,
        marker=dict(
            color=colors[i],
            size=10,
            opacity=0.6
        ),
        hovertemplate=(
            '<b>Image:</b> %{customdata[0]}<br>' +
            '<b>Pixels Total Length:</b> %{y:.1f}<br>' +
            '<b>Length:</b> %{x:.1f}<br>' +
            '<b>Total Length:</b> %{customdata[1]:.1f}<br>' +
            '<b>Lobster Size:</b> %{customdata[2]}<br>' +
            '<b>Pixel Rel Diff:</b> %{customdata[3]:.1f}%<br>' +
            '<b>Pixel Abs Diff:</b> %{customdata[4]:.1f}<br>' +
            '<b>Real Length Abs Diff:</b> %{customdata[5]:.1f}<br>' +
            '<b>Real Length Rel Diff:</b> %{customdata[6]:.1f}%<br>'
        ),
        customdata=list(zip(
            df_exuviae[mask]['image_name'], 
            df_exuviae[mask]['total_length'],
            df_exuviae[mask]['lobster_size'],
            df_exuviae[mask]['pixel_rel_diff'],
            df_exuviae[mask]['pixel_abs_diff'],
            df_exuviae[mask]['real_length_abs_diff'],
            df_exuviae[mask]['real_length_rel_diff']
        ))
    ))

# Add a diagonal line to represent perfect correlation between Length and Pixels Total Length
min_val = min(df_exuviae['Length'].min(), df_exuviae['pixels_total_length'].min())
max_val = max(df_exuviae['Length'].max(), df_exuviae['pixels_total_length'].max())
fig.add_trace(go.Scatter(
    x=[min_val, max_val],
    y=[min_val, max_val],
    mode='lines',
    name='Perfect correlation',
    line=dict(color='black', dash='dash', width=1),
    opacity=0.5
))

# Update the layout of the figure
fig.update_layout(
    title='🦐 Exuviae Pixel Measurement Accuracy Analysis<br>Colored by Relative Error Thresholds',
    xaxis_title='Length (pixels)',
    yaxis_title='Pixels Total Length',
    showlegend=True,
    width=800,
    height=800,
    xaxis=dict(range=[min_val, max_val]),
    yaxis=dict(range=[min_val, max_val]),
    plot_bgcolor='white'
)

# Add grid lines to the plot for better readability
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')

# Display the plot
fig.show()

print(f"\n✅ Pixel measurement accuracy analysis complete!")
print(f"🎯 Error categorization reveals measurement quality patterns")
print(f"📊 Perfect correlation line shows ideal 1:1 relationship reference")



📈 PHASE 4: Analyzing Pixel Measurement Accuracy...
🎯 Creating error-categorized scatter plot for measurement quality assessment...



✅ Pixel measurement accuracy analysis complete!
🎯 Error categorization reveals measurement quality patterns
📊 Perfect correlation line shows ideal 1:1 relationship reference
