# Survey Weight Calibration: A Realistic Example

This example demonstrates how to use fairlex for calibrating survey weights in a realistic scenario. We'll simulate a political opinion survey with typical demographic biases and calibrate against US Census benchmarks.

## Scenario

A polling organization conducted a survey with 200 respondents to gauge public opinion. Like most surveys, the sample has demographic biases:
- Over-representation of older, higher-educated respondents
- Under-representation of Hispanic and younger demographics  
- Geographic skew toward certain regions

We'll use leximin calibration to adjust the weights to match known population demographics from the 2023 US Census.

In [None]:
import numpy as np
import pandas as pd

from fairlex import evaluate_solution, leximin_residual, leximin_weight_fair

# Set random seed for reproducibility
np.random.seed(42)

## Step 1: Create Realistic Survey Data

We'll simulate survey respondents with demographic characteristics that exhibit typical survey biases.

In [None]:
n_respondents = 200

# Generate biased survey sample
# Age groups: 18-29, 30-44, 45-64, 65+
age_groups = np.random.choice(
    ['18-29', '30-44', '45-64', '65+'],
    size=n_respondents,
    p=[0.15, 0.20, 0.35, 0.30]  # Skewed toward older respondents
)

# Gender: Male, Female
gender = np.random.choice(
    ['Male', 'Female'],
    size=n_respondents,
    p=[0.48, 0.52]  # Close to population
)

# Race/Ethnicity
race_ethnicity = np.random.choice(
    ['White_NH', 'Black', 'Hispanic', 'Asian', 'Other'],
    size=n_respondents,
    p=[0.70, 0.11, 0.10, 0.06, 0.03]  # Under-representation of Hispanic, over-representation of White
)

# Education: HS_or_less, Some_college, Bachelor_plus
education = np.random.choice(
    ['HS_or_less', 'Some_college', 'Bachelor_plus'],
    size=n_respondents,
    p=[0.25, 0.30, 0.45]  # Over-representation of college educated
)

# Region: Northeast, Midwest, South, West
region = np.random.choice(
    ['Northeast', 'Midwest', 'South', 'West'],
    size=n_respondents,
    p=[0.20, 0.22, 0.35, 0.23]  # Roughly representative
)

# Create DataFrame
survey_data = pd.DataFrame({
    'age_group': age_groups,
    'gender': gender,
    'race_ethnicity': race_ethnicity,
    'education': education,
    'region': region
})

print("Survey Sample Demographics:")
print("===========================")
for col in survey_data.columns:
    print(f"\n{col.replace('_', ' ').title()}:")
    print(survey_data[col].value_counts(normalize=True).round(3))

## Step 2: Define Population Benchmarks

These targets are based on 2023 US Census data and represent the true population distributions we want to match.

In [None]:
# Population benchmarks from 2023 US Census data
population_targets = {
    # Age distribution (approximate from Census data)
    'age_18_29': 0.18,
    'age_30_44': 0.25,
    'age_45_64': 0.32,
    'age_65_plus': 0.25,

    # Gender distribution
    'male': 0.495,
    'female': 0.505,

    # Race/Ethnicity distribution
    'white_nh': 0.582,
    'black': 0.120,
    'hispanic': 0.190,
    'asian': 0.058,
    'other_race': 0.050,

    # Education distribution (adults 25+, approximate)
    'hs_or_less': 0.38,
    'some_college': 0.28,
    'bachelor_plus': 0.34,

    # Regional distribution
    'northeast': 0.17,
    'midwest': 0.21,
    'south': 0.38,
    'west': 0.24
}

print("Population Targets (2023 US Census):")
print("====================================")
for category, target in population_targets.items():
    print(f"{category.replace('_', ' ').title()}: {target:.1%}")

## Step 3: Construct Membership Matrix

The membership matrix A defines which respondents belong to each demographic group. Each row represents a demographic category, and each column represents a respondent.

In [None]:
# Create membership matrix A
membership_indicators = []
target_totals = []
margin_labels = []

# Age groups
for age_group, target_prop in zip(['18-29', '30-44', '45-64', '65+'],
                                  [population_targets['age_18_29'], population_targets['age_30_44'],
                                   population_targets['age_45_64'], population_targets['age_65_plus']]):
    indicator = (survey_data['age_group'] == age_group).astype(float)
    membership_indicators.append(indicator)
    target_totals.append(target_prop * n_respondents)
    margin_labels.append(f'Age {age_group}')

# Gender
for gender_val, target_prop in zip(['Male', 'Female'],
                                   [population_targets['male'], population_targets['female']]):
    indicator = (survey_data['gender'] == gender_val).astype(float)
    membership_indicators.append(indicator)
    target_totals.append(target_prop * n_respondents)
    margin_labels.append(f'Gender {gender_val}')

# Race/Ethnicity
race_mapping = {
    'White_NH': population_targets['white_nh'],
    'Black': population_targets['black'],
    'Hispanic': population_targets['hispanic'],
    'Asian': population_targets['asian'],
    'Other': population_targets['other_race']
}
for race_val, target_prop in race_mapping.items():
    indicator = (survey_data['race_ethnicity'] == race_val).astype(float)
    membership_indicators.append(indicator)
    target_totals.append(target_prop * n_respondents)
    margin_labels.append(f'Race {race_val}')

# Education
edu_mapping = {
    'HS_or_less': population_targets['hs_or_less'],
    'Some_college': population_targets['some_college'],
    'Bachelor_plus': population_targets['bachelor_plus']
}
for edu_val, target_prop in edu_mapping.items():
    indicator = (survey_data['education'] == edu_val).astype(float)
    membership_indicators.append(indicator)
    target_totals.append(target_prop * n_respondents)
    margin_labels.append(f'Education {edu_val}')

# Region
region_mapping = {
    'Northeast': population_targets['northeast'],
    'Midwest': population_targets['midwest'],
    'South': population_targets['south'],
    'West': population_targets['west']
}
for region_val, target_prop in region_mapping.items():
    indicator = (survey_data['region'] == region_val).astype(float)
    membership_indicators.append(indicator)
    target_totals.append(target_prop * n_respondents)
    margin_labels.append(f'Region {region_val}')

# Population total constraint
total_indicator = np.ones(n_respondents)
membership_indicators.append(total_indicator)
target_totals.append(n_respondents)
margin_labels.append('Total Population')

# Convert to arrays
A = np.array(membership_indicators, dtype=float)
b = np.array(target_totals, dtype=float)

print(f"Membership matrix shape: {A.shape}")
print(f"Number of demographic margins: {len(margin_labels)}")
print(f"Number of respondents: {n_respondents}")

## Step 4: Set Up Base Weights

We start with equal base weights representing a simple random sample design.

In [None]:
# Base weights (equal weights for simple random sample)
w0 = np.ones(n_respondents)

print(f"Base weights: {n_respondents} equal weights of {w0[0]:.1f}")
print(f"Base weight total: {w0.sum():.1f}")

## Step 5: Analyze Pre-Calibration Bias

Let's examine the demographic bias in our sample before calibration.

In [None]:
# Calculate current sample proportions
current_totals = A @ w0
current_props = current_totals / n_respondents
target_props = b / n_respondents

# Create comparison DataFrame
bias_analysis = pd.DataFrame({
    'Demographic': margin_labels,
    'Sample_%': current_props * 100,
    'Target_%': target_props * 100,
    'Difference': (current_props - target_props) * 100
})

print("Pre-Calibration Demographic Bias:")
print("==================================")
print(bias_analysis.round(1))

# Highlight largest biases
largest_biases = bias_analysis.iloc[:-1].sort_values('Difference', key=abs, ascending=False).head(5)
print("\nLargest Demographic Biases:")
print("===========================")
for _, row in largest_biases.iterrows():
    direction = "over" if row['Difference'] > 0 else "under"
    print(f"{row['Demographic']}: {abs(row['Difference']):.1f}pp {direction}-represented")

## Step 6: Apply Leximin Calibration

We'll apply both calibration methods available in fairlex:
1. **Residual leximin**: Minimizes the worst margin error
2. **Weight-fair leximin**: Balances margin accuracy with weight stability

In [None]:
# Method 1: Residual leximin calibration
result_residual = leximin_residual(
    A, b, w0,
    min_ratio=0.2,  # Allow weights to be as low as 0.2x original
    max_ratio=5.0   # Allow weights to be as high as 5.0x original
)

print("Residual Leximin Results:")
print("========================")
print(f"Optimization status: {result_residual.status} ({result_residual.message})")
print(f"Maximum absolute residual (epsilon): {result_residual.epsilon:.4f}")
print(f"Weight range: [{result_residual.w.min():.3f}, {result_residual.w.max():.3f}]")
print(f"Weight mean: {result_residual.w.mean():.3f}")

# Method 2: Weight-fair leximin calibration
result_weight_fair = leximin_weight_fair(
    A, b, w0,
    min_ratio=0.2,
    max_ratio=5.0,
    slack=0.001  # Allow small additional margin error for better weight stability
)

print("\nWeight-Fair Leximin Results:")
print("============================")
print(f"Optimization status: {result_weight_fair.status} ({result_weight_fair.message})")
print(f"Maximum absolute residual (epsilon): {result_weight_fair.epsilon:.4f}")
print(f"Maximum relative weight change (t): {result_weight_fair.t:.4f}")
print(f"Weight range: [{result_weight_fair.w.min():.3f}, {result_weight_fair.w.max():.3f}]")
print(f"Weight mean: {result_weight_fair.w.mean():.3f}")

## Step 7: Evaluate Calibration Quality

Let's assess how well each method performed using comprehensive diagnostics.

In [None]:
# Evaluate both methods
metrics_residual = evaluate_solution(A, b, result_residual.w, base_weights=w0)
metrics_weight_fair = evaluate_solution(A, b, result_weight_fair.w, base_weights=w0)

# Create comparison DataFrame
comparison = pd.DataFrame({
    'Metric': list(metrics_residual.keys()),
    'Residual_Method': list(metrics_residual.values()),
    'Weight_Fair_Method': list(metrics_weight_fair.values())
})

print("Calibration Method Comparison:")
print("=============================\n")
print(comparison.round(4))

# Interpret key metrics
print("\n\nKey Insights:")
print("=============")
print(f"• Margin Accuracy: Residual method achieves max error of {metrics_residual['resid_max_abs']:.4f}")
print(f"                   Weight-fair method achieves max error of {metrics_weight_fair['resid_max_abs']:.4f}")
print(f"• Weight Stability: Residual method ESS = {metrics_residual['ESS']:.1f} (design effect = {metrics_residual['deff']:.2f})")
print(f"                    Weight-fair method ESS = {metrics_weight_fair['ESS']:.1f} (design effect = {metrics_weight_fair['deff']:.2f})")
print(f"• Weight Changes: Residual method max change = {metrics_residual['max_rel_dev']:.2%}")
print(f"                  Weight-fair method max change = {metrics_weight_fair['max_rel_dev']:.2%}")

## Step 8: Analyze Post-Calibration Demographics

Let's verify that our calibration successfully corrected the demographic biases.

In [None]:
# Calculate post-calibration demographics for weight-fair method
calibrated_totals = A @ result_weight_fair.w
calibrated_props = calibrated_totals / n_respondents

# Create final comparison
final_comparison = pd.DataFrame({
    'Demographic': margin_labels,
    'Original_%': (A @ w0 / n_respondents) * 100,
    'Target_%': (b / n_respondents) * 100,
    'Calibrated_%': calibrated_props * 100,
    'Final_Error': abs(calibrated_props - target_props) * 100
})

print("Post-Calibration Results (Weight-Fair Method):")
print("===============================================\n")
print(final_comparison.round(2))

# Summary statistics
max_error = final_comparison['Final_Error'].iloc[:-1].max()  # Exclude total row
mean_error = final_comparison['Final_Error'].iloc[:-1].mean()

print("\nCalibration Summary:")
print("===================")
print(f"Maximum demographic error: {max_error:.3f} percentage points")
print(f"Average demographic error: {mean_error:.3f} percentage points")
print(f"All margins calibrated within: ±{max_error:.3f}pp of targets")

## Step 9: Practical Interpretation

Understanding what these results mean for survey analysis.

In [None]:
# Weight distribution analysis
weight_stats = pd.DataFrame({
    'Statistic': ['Min', 'Q25', 'Median', 'Q75', 'Max', 'Mean', 'Std Dev'],
    'Original_Weights': [w0.min(), np.percentile(w0, 25), np.median(w0),
                         np.percentile(w0, 75), w0.max(), w0.mean(), w0.std()],
    'Calibrated_Weights': [result_weight_fair.w.min(), np.percentile(result_weight_fair.w, 25),
                           np.median(result_weight_fair.w), np.percentile(result_weight_fair.w, 75),
                           result_weight_fair.w.max(), result_weight_fair.w.mean(),
                           result_weight_fair.w.std()]
})

print("Weight Distribution Analysis:")
print("============================\n")
print(weight_stats.round(3))

print("\n\nPractical Implications:")
print("=======================")
print(f"1. Survey Representativeness: Calibration corrected {len([x for x in final_comparison['Final_Error'].iloc[:-1] if abs(x) > 0.001])} demographic biases")
print(f"2. Effective Sample Size: Reduced from {n_respondents} to {metrics_weight_fair['ESS']:.0f} due to weighting")
print(f"3. Design Effect: {metrics_weight_fair['deff']:.2f} (variance inflation factor)")
print(f"4. Weight Variability: Coeffient of variation = {result_weight_fair.w.std() / result_weight_fair.w.mean():.3f}")

print("\nMethod Recommendation:")
print("=====================")
if metrics_weight_fair['ESS'] > metrics_residual['ESS']:
    print("✓ Weight-fair method recommended: Better preserves effective sample size")
else:
    print("✓ Residual method recommended: Achieves better margin accuracy")

print(f"\nCalibration achieves population-representative results with {metrics_weight_fair['ESS']:.0f} effective respondents.")

## Summary

This example demonstrated realistic survey weight calibration using fairlex:

**Key Features Demonstrated:**
- Realistic survey biases (age, education, race/ethnicity skews)
- Multiple demographic margins (18 categories + total)
- US Census population benchmarks
- Comparison of residual vs. weight-fair methods
- Comprehensive quality assessment

**Typical Use Cases:**
- Political polling calibration
- Market research weight adjustment
- Social survey representativeness correction
- Post-stratification in complex surveys

**Method Selection Guidelines:**
- **Residual leximin**: Use when margin accuracy is paramount
- **Weight-fair leximin**: Use when both accuracy and weight stability matter
- Consider design effect and effective sample size in your choice

The leximin approach ensures that no single demographic group bears a disproportionate burden in achieving representativeness, making it particularly suitable for surveys with multiple important demographic targets.