Conversation
|
Overall Assessment ⛔ Blocker Static review only: I could not execute the test suite in this sandbox because the available Executive Summary
Methodology Affected methods: weighted least squares (WLS), Taylor Series Linearization (TSL), survey degrees of freedom, and weighted TWFE residualization.
Code Quality
Performance No material new performance findings beyond the correctness issues above. Maintainability
Tech Debt
Security No material security findings in the changed files. Documentation/Tests
Path to Approval
|
P0-1: Fix double-weighting in vcov — solve_ols() now computes vcov on original-scale data with weights applied once (not on sqrt(w)-transformed data with weights applied again). Fix HC1 meat to use X'diag(w*u²)X instead of (X*w*u)'(X*w*u) which gave w² in the meat. P0-2: Replace one-shot weighted within-transformation with iterative alternating projections (max_iter=100, tol=1e-8) for correct weighted FWL residualization in TWFE. P0-3: Add NaN-vcov guard when no stratum contributes variance (all singletons skipped), preventing se=0 → t=±inf instead of NaN. P1: FPC validation now checks against PSU count (not obs count), enforces constancy within stratum, and rejects fpc-only designs. P2: Add survey_metadata to MultiPeriodDiDResults. Replace placeholder R reference tests with exact manual oracle tests. P3: Fix cluster comparison to use partition equivalence. Add docstrings for survey_design/weights/weight_type parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not execute Executive Summary
Methodology Affected methods: survey-weighted WLS, Taylor Series Linearization variance, and survey degrees of freedom. The survey references and official implementations treat weighted estimating equations as survey totals and compute design-based/model-robust linearized SEs from those totals, with customary degrees of freedom based on PSUs minus strata. (stata.com)
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- Rename needs_tsl_vcov to needs_survey_vcov and always route through survey vcov path when a SurveyDesign is resolved (P1-1) - Guard MultiPeriodDiD robust=False override so it does not overwrite survey vcov when survey weights are present (P1-2) - Validate FPC constancy for unstratified PSU-only designs (P1-3) - Replace loose ratio check with exact oracle assertions for PSU-only and weights-only survey vcov tests, add negative FPC test (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR ReviewOverall AssessmentStatic review only: I could not execute the new survey test suite in this sandbox because the available Python environment is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…w (round 3) P1-1: Use sum(w) - k instead of n - k for fweight degrees of freedom in compute_survey_vcov(), LinearRegression.fit() (classical MSE and df_ storage), and MultiPeriodDiD.fit(). P1-2: Fix weighted rank-deficient fits producing all-NaN residuals by using only identified columns for fitted-value computation in solve_ols() back-transform, and computing vcov on the reduced system then expanding. P2: Add regression tests for fweight SE oracle match, LinearRegression fweight df, and weighted rank-deficiency at both solver and estimator levels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
…timator integration Implements SurveyDesign for complex survey structures (stratification, clustering, weights, FPC) with Taylor Series Linearization variance estimation. Adds weighted OLS via sqrt(w) transformation, survey-aware sandwich estimator with three weight types (pweight/fweight/aweight), and survey degrees of freedom (n_PSU - n_strata). Integrates with DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD via survey_design parameter on fit(). Includes weighted demeaning for absorb/within-transformation, SurveyMetadata in results, and 42 tests across 5 tiers (analytical, R cross-validation, consistency, integration, Monte Carlo coverage). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0-1: Fix double-weighting in vcov — solve_ols() now computes vcov on original-scale data with weights applied once (not on sqrt(w)-transformed data with weights applied again). Fix HC1 meat to use X'diag(w*u²)X instead of (X*w*u)'(X*w*u) which gave w² in the meat. P0-2: Replace one-shot weighted within-transformation with iterative alternating projections (max_iter=100, tol=1e-8) for correct weighted FWL residualization in TWFE. P0-3: Add NaN-vcov guard when no stratum contributes variance (all singletons skipped), preventing se=0 → t=±inf instead of NaN. P1: FPC validation now checks against PSU count (not obs count), enforces constancy within stratum, and rejects fpc-only designs. P2: Add survey_metadata to MultiPeriodDiDResults. Replace placeholder R reference tests with exact manual oracle tests. P3: Fix cluster comparison to use partition equivalence. Add docstrings for survey_design/weights/weight_type parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename needs_tsl_vcov to needs_survey_vcov and always route through survey vcov path when a SurveyDesign is resolved (P1-1) - Guard MultiPeriodDiD robust=False override so it does not overwrite survey vcov when survey weights are present (P1-2) - Validate FPC constancy for unstratified PSU-only designs (P1-3) - Replace loose ratio check with exact oracle assertions for PSU-only and weights-only survey vcov tests, add negative FPC test (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…w (round 3) P1-1: Use sum(w) - k instead of n - k for fweight degrees of freedom in compute_survey_vcov(), LinearRegression.fit() (classical MSE and df_ storage), and MultiPeriodDiD.fit(). P1-2: Fix weighted rank-deficient fits producing all-NaN residuals by using only identified columns for fitted-value computation in solve_ols() back-transform, and computing vcov on the reduced system then expanding. P2: Add regression tests for fweight SE oracle match, LinearRegression fweight df, and weighted rank-deficiency at both solver and estimator levels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not run the new survey tests in this sandbox because the available Python environment is missing Executive Summary
Methodology Affected methods: Weighted Least Squares (WLS), Taylor Series Linearization (TSL), and survey/frequency-weight degrees of freedom.
Code Quality
Performance No material findings in the changed code. Maintainability No additional material findings beyond the weighted rank-deficiency regression above. Tech Debt
Security No security findings in the changed files. Documentation/Tests
Path to Approval
|
|
/ai-review |
…s from PR #218 review (round 4) P0: Replace scores'scores (w²*e²) with correct X'diag(w*e²)X in no-structure survey vcov branch. P1: Handle NaN coefficients in survey vcov callers (LinearRegression.fit, MultiPeriodDiD.fit) by computing on kept columns and expanding with _expand_vcov_with_nan. P2: Fix oracle test and add fweight oracle + rank-deficiency tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the sandbox Python environment here is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…Regression weight propagation from PR #218 review (round 5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the local sandbox here is missing Executive Summary
Methodology Affected methods: weighted least squares and Binder/Lumley-style survey linearization as wired into diff_diff/estimators.py:L237-L320, diff_diff/twfe.py:L124-L208, diff_diff/estimators.py:L961-L1082, and diff_diff/linalg.py:L1465-L1671, against the Registry entry at docs/methodology/REGISTRY.md:L1835-L1886.
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…218 review (round 6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the local environment here is missing Executive Summary
Methodology Affected methods cross-checked against the new Registry section: weighted least squares, Binder/Lumley Taylor Series Linearization, and Korn/Graubard-style survey d.f.
Code Quality
Performance No material findings in the changed code. Maintainability No material findings beyond the two blocking API/variance-path inconsistencies above. Tech Debt No mitigating Security No security findings in the changed files. Documentation/Tests
Path to Approval
|
… review (round 7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the local environment here is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… fix fweight df rounding from PR #218 review (round 8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
…abel consistency from PR #218 review (round 15) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: this environment does not have Executive Summary
Methodology Affected methods reviewed: Weighted Least Squares, Taylor Series Linearization, survey degrees of freedom, and weighted absorbed/within transformations.
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… late PSU injection from PR #218 review (round 16) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the local Executive Summary
Methodology Affected methods reviewed: WLS, Taylor Series Linearization variance/FPC handling, survey degrees of freedom, and weighted absorbed/within transformations.
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… structure from PR #218 review (round 17) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; local execution was not possible here because the current environment lacks Executive Summary
Methodology Affected methods reviewed: WLS via
Code Quality No separate findings. Performance No separate findings. Maintainability No separate findings. Tech Debt
Security No material security or secret-handling findings in the changed files. Documentation/Tests
Path to Approval
|
…obs from PR #218 review (round 18) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; I couldn’t run the suite in this environment because the runner is missing Executive Summary
Methodology Affected methods reviewed: WLS via
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…rom PR #218 review (round 19) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; runtime verification wasn’t possible in this environment because Python is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…st fixtures from PR #218 review (round 20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; I could not run the new survey suite here because Executive Summary
Methodology Cross-check performed against the new registry plus the cited Lumley/Binder survey references and the
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…n from PR #218 review (round 21) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; I couldn’t run Executive Summary
Methodology High-level cross-check: the new survey section is directionally consistent with the cited survey sources on linearisation-based variance, PSU nesting checks, and design-based degrees of freedom tied to
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…tion) from PR #218 review (round 22) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; the local environment here is missing Executive Summary
Methodology High-level cross-check: the new survey registry section is directionally consistent with the
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only; I could not execute the added tests locally because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
SurveyDesignclass for specifying complex survey structures (stratification, clustering, weights, FPC)solve_ols()with three weight types (pweight/fweight/aweight)survey_designparameter intoDifferenceInDifferences.fit(),TwoWayFixedEffects.fit(), andMultiPeriodDiD.fit()demean_by_group()andwithin_transform()for correct absorb/within-transformation with survey weightsSurveyMetadatatoDiDResultswith effective sample size, design effect, and survey d.f.docs/survey-roadmap.md) for future standalone estimator integrationMethodology references (required if estimator / math changes)
survey). Coefficients identical; documented in REGISTRY.md.lonely_psudefaults to "remove" (not R's "fail"), matching common applied practice. Documented in REGISTRY.md.Validation
tests/test_survey.py— 42 tests across 5 tiers:tests/reference/survey_reference.R— R script for generating reference valuestests/reference/survey_reference.csv— Pre-computed R reference outputsSecurity / privacy
Generated with Claude Code