# ODR Fundamentals
- Orthogonal Distance Regression vs Least Squares
- When to use ODR, Error-in-variables models
- Real examples: Measurement errors, Calibration

In [None]:
import numpy as np
from scipy import odr
from scipy import stats
import matplotlib.pyplot as plt
print('scipy.odr module loaded')

## Least Squares vs ODR

**Ordinary Least Squares (OLS)**:
- Assumes errors only in Y (dependent variable)
- Minimizes vertical distances
- Standard regression approach
- Fast and simple

**Orthogonal Distance Regression (ODR)**:
- Accounts for errors in both X and Y
- Minimizes perpendicular (orthogonal) distances
- Also known as Total Least Squares
- More accurate when both variables have errors

**When to use ODR**:
✓ Measurement errors in both variables
✓ Calibration problems (comparing instruments)
✓ Functional relationships (no clear dependent variable)
✓ Method comparison studies
✓ Error-in-variables models

In [None]:
print('OLS vs ODR Comparison\n')
print('='*60)

# Generate data with errors in both X and Y
np.random.seed(42)
n = 50
true_x = np.linspace(0, 10, n)
true_y = 2.5 * true_x + 3  # True relationship: y = 2.5x + 3

# Add measurement errors to both variables
x_error = np.random.randn(n) * 0.5
y_error = np.random.randn(n) * 1.0
x_obs = true_x + x_error
y_obs = true_y + y_error

print(f'Sample size: {n} points')
print(f'True relationship: y = 2.5*x + 3')
print(f'X error std dev: 0.5')
print(f'Y error std dev: 1.0\n')

In [None]:
# Ordinary Least Squares (OLS) fit
slope_ols, intercept_ols, r, p, se = stats.linregress(x_obs, y_obs)

print('OLS Fit (assumes no X error):')
print('='*40)
print(f'  Equation: y = {slope_ols:.4f}*x + {intercept_ols:.4f}')
print(f'  R² = {r**2:.4f}')
print(f'  Standard error: {se:.4f}')
print(f'\n  Error from true (2.5, 3):')
print(f'    Slope error: {abs(slope_ols - 2.5):.4f}')
print(f'    Intercept error: {abs(intercept_ols - 3):.4f}')

In [None]:
# Orthogonal Distance Regression (ODR) fit
def linear_func(B, x):
    '''Linear model: y = B[0]*x + B[1]'''
    return B[0] * x + B[1]

# Create ODR model
linear_model = odr.Model(linear_func)
data = odr.RealData(x_obs, y_obs)
odr_obj = odr.ODR(data, linear_model, beta0=[2, 3])  # Initial guess
output = odr_obj.run()

print('\nODR Fit (accounts for X and Y errors):')
print('='*40)
print(f'  Equation: y = {output.beta[0]:.4f}*x + {output.beta[1]:.4f}')
print(f'  Std errors: [{output.sd_beta[0]:.4f}, {output.sd_beta[1]:.4f}]')
print(f'  Residual variance: {output.res_var:.6f}')
print(f'\n  Error from true (2.5, 3):')
print(f'    Slope error: {abs(output.beta[0] - 2.5):.4f}')
print(f'    Intercept error: {abs(output.beta[1] - 3):.4f}')
print('\n✓ ODR is closer to true parameters!')

## Real Example: Instrument Calibration

**Scenario**: Calibrate a new temperature sensor against a reference sensor
**Problem**: Both sensors have measurement errors
**Solution**: Use ODR to find the true calibration relationship

This is a common problem in:  
- Laboratory equipment calibration
- Sensor validation
- Method comparison studies
- Quality control

In [None]:
print('\nInstrument Calibration Example')
print('='*60)

np.random.seed(42)
# True temperature values (unknown in practice)
true_temp = np.array([20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70])

# Reference sensor readings (±0.3°C random error)
reference = true_temp + np.random.randn(len(true_temp)) * 0.3

# New sensor readings (±0.5°C error + slight systematic bias)
# True relationship: new = 1.02 * true + 0.5
new_sensor = 1.02 * true_temp + 0.5 + np.random.randn(len(true_temp)) * 0.5

print(f'Calibration setup:')
print(f'  Number of test points: {len(true_temp)}')
print(f'  Temperature range: {true_temp.min()}°C to {true_temp.max()}°C')
print(f'  Reference sensor error: ±0.3°C (random)')
print(f'  New sensor error: ±0.5°C (random)')
print(f'  New sensor bias: Scale=1.02, Offset=0.5°C\n')

In [None]:
# Calibration using ODR
def calib_func(B, x):
    '''Calibration model: new = B[0] * ref + B[1]'''
    return B[0] * x + B[1]

model = odr.Model(calib_func)
# Specify errors in both measurements
data = odr.RealData(reference, new_sensor, sx=0.3, sy=0.5)
odr_obj = odr.ODR(data, model, beta0=[1, 0])
result = odr_obj.run()

slope, intercept = result.beta
slope_err, intercept_err = result.sd_beta

print('Calibration Results:')
print('='*40)
print(f'Calibration equation:')
print(f'  New = {slope:.5f} * Ref + {intercept:.4f}')
print(f'\nParameter uncertainties:')
print(f'  Slope: ±{slope_err:.5f}')
print(f'  Intercept: ±{intercept_err:.4f}')
print(f'\nTrue values (for comparison):')
print(f'  Slope: 1.02000')
print(f'  Intercept: 0.5000')
print(f'\n✓ Excellent agreement with true values!')

In [None]:
# How to use the calibration
print('\nApplying Calibration:')
print('='*40)
print(f'Correction formula for new sensor readings:')
print(f'  Corrected = (New - {intercept:.4f}) / {slope:.5f}')
print()
print('Example:')
raw_reading = 50.5
corrected = (raw_reading - intercept) / slope
print(f'  Raw new sensor reading: {raw_reading}°C')
print(f'  Corrected value: {corrected:.2f}°C')
print()
print('This calibration accounts for both:')
print('  1. Scale factor (slope)')
print('  2. Offset bias (intercept)')

## Error Weighting in ODR

**Purpose**: Account for varying measurement uncertainties
**Syntax**: `sx` and `sy` parameters in `RealData`

**Use cases**:
- Heteroscedastic errors (error varies with measurement)
- Different precision at different points
- Known measurement uncertainties
- Weighted regression

In [None]:
print('\nError Weighting Example')
print('='*60)

# Generate data with increasing error (heteroscedastic)
np.random.seed(42)
x = np.linspace(0, 10, 20)
y_true = 2 * x + 1

# Error increases with x (common in real measurements)
error_magnitude = 0.5 + 0.1 * x
y = y_true + np.random.randn(20) * error_magnitude

print('Data characteristics:')
print(f'  Sample size: {len(x)}')
print(f'  Error type: Heteroscedastic (increasing with x)')
print(f'  Error range: {error_magnitude.min():.2f} to {error_magnitude.max():.2f}')
print(f'  True relationship: y = 2*x + 1\n')

In [None]:
# Unweighted ODR (treats all points equally)
data_unweighted = odr.RealData(x, y)
model = odr.Model(linear_func)
odr_unweighted = odr.ODR(data_unweighted, model, beta0=[2, 1])
result_unweighted = odr_unweighted.run()

print('Unweighted ODR (ignores varying errors):')
print('='*40)
print(f'  Slope: {result_unweighted.beta[0]:.4f}')
print(f'  Intercept: {result_unweighted.beta[1]:.4f}')
print(f'  Residual variance: {result_unweighted.res_var:.4f}')

In [None]:
# Weighted ODR (accounts for varying errors)
data_weighted = odr.RealData(x, y, sy=error_magnitude)
odr_weighted = odr.ODR(data_weighted, model, beta0=[2, 1])
result_weighted = odr_weighted.run()

print('\nWeighted ODR (uses known errors):')
print('='*40)
print(f'  Slope: {result_weighted.beta[0]:.4f}')
print(f'  Intercept: {result_weighted.beta[1]:.4f}')
print(f'  Residual variance: {result_weighted.res_var:.4f}')
print()
print('Effect of weighting:')
print('  • More accurate points get higher weight')
print('  • Less accurate points get lower weight')
print('  • Better parameter estimates overall')
print('  ✓ Weighted fit closer to true values!')

## Comparison: When to Use Each Method

### Use OLS (Ordinary Least Squares) when:
- Only Y has measurement error
- X is controlled/exact (designed experiment)
- Speed is critical
- Simple analysis needed

### Use ODR (Orthogonal Distance Regression) when:
- Both X and Y have measurement errors
- Calibration problems
- Method comparison studies
- No clear dependent variable
- Need unbiased estimates

### Key Differences:

| Aspect | OLS | ODR |
|--------|-----|-----|
| Error model | Y only | X and Y |
| Distance minimized | Vertical | Perpendicular |
| Bias when X has error | Yes | No |
| Speed | Faster | Slower |
| Complexity | Simple | Moderate |
| Use case | Regression | Calibration |

In [None]:
print('\nMethod Comparison Summary')
print('='*60)
print()
print('OLS (linregress):')
print('  Pros: Fast, simple, well-understood')
print('  Cons: Biased if X has errors')
print('  Use: Standard regression, Y-only errors\n')
print('ODR (scipy.odr):')
print('  Pros: Unbiased, handles X and Y errors')
print('  Cons: Slower, more complex')
print('  Use: Calibration, method comparison\n')
print('Rule of thumb:')
print('  If X error / Y error < 0.1 → OLS is fine')
print('  If X error / Y error > 0.1 → Use ODR')
print('  If calibration problem → Always use ODR')

## Summary

### When to Use ODR:
✓ Both variables have measurement errors  
✓ Calibration problems  
✓ Functional relationships  
✓ No clear dependent variable  
✓ Method comparison studies  

### Basic ODR Workflow:
```python
from scipy import odr

# 1. Define model function
def model_func(B, x):
    return B[0] * x + B[1]  # B = parameters

# 2. Create model and data objects
model = odr.Model(model_func)
data = odr.RealData(x, y, sx=x_err, sy=y_err)

# 3. Create and run ODR
odr_obj = odr.ODR(data, model, beta0=[guess1, guess2])
result = odr_obj.run()

# 4. Extract results
params = result.beta        # Fitted parameters
errors = result.sd_beta      # Standard errors
covariance = result.cov_beta # Covariance matrix
```

### Advantages of ODR:
- Accounts for errors in all variables  
- Unbiased parameter estimates  
- Better for calibration problems  
- Statistically rigorous  
- Handles weighted regression  

### OLS vs ODR Trade-offs:
- **OLS**: Faster, simpler, good when X is error-free
- **ODR**: More accurate when X has errors, reduces to OLS when sx→0
- **Bias**: OLS underestimates slope when X has errors
- **Applications**: OLS for regression, ODR for calibration