# Reference Guide

Author: Hanan Ather

**Objective**: To create a clear, practical guide for my future self on the evolution of survey estimators, from classical methods to modern prediction-based techniques. This is framed from the perspective of producing official statistics at a National Statistical Organization (NSO).

---

## 1. Core Concepts

At an NSO, our goals are not just precision, but also **consistency**, **robustness**, and **explainability**. The estimators we use are all part of a family designed to leverage auxiliary information (like census or administrative data) to meet these goals. They all share a "predict-and-correct" logic.

### 1.1 The Difference Estimator

This is the textbook starting point. It's a powerful idea but relies on a very strict assumption.

* While not often used in its pure form, its logic underpins how we think about estimating *change* over time.
* It uses a simple linear model with **known** coefficients to predict a large part of the total, then uses the sample to estimate the leftover "error" or "difference."
* **Formula**:
    $$
    \hat{t}_{y, \text{diff}} = t'_{y} + \hat{t}_{E', \pi}
    $$
    where:
    * $t'_{y} = \sum_{k \in U} \mathbf{z}_k^T \mathbf{B}_0$ is the total of the proxy values, calculated using a **known** coefficient vector $\mathbf{B}_0$.
    * $\hat{t}_{E', \pi} = \sum_{k \in S} d_k (y_k - \mathbf{z}_k^T \mathbf{B}_0)$ is the sample-based estimate of the total prediction error.

### 1.2 The GREG Estimator

The GREG is the practical and robust evolution of the difference estimator and the cornerstone of many NSO production systems (like Statistics Canada's GES).

* The GREG is used for two key properties:
    1.  **Calibration**: It forces the sample estimates of auxiliary totals ($\hat{\mathbf{t}}_{z, \pi}$) to match the known population totals ($\mathbf{t_z}$). This ensures consistency across different surveys and with official figures.
    2.  **Robustness**: It is "model-assisted," not "model-dependent." It remains *asymptotically design-unbiased* even if the linear model is misspecified.
* **Formula**:
    $$
    \hat{t}_{y, \text{GREG}} = \hat{t}_{y, \pi} + ( \mathbf{t_z} - \hat{\mathbf{t}}_{z, \pi} )^T \mathbf{\hat{B}}
    $$
* **Key Characteristic**: It **estimates** the coefficient vector $\mathbf{\hat{B}}$ from the sample data using weighted least squares based on a *linear* model.

### 1.3 Model-Calibrated Estimators (Generalizing the GREG)

This is the next logical step, moving beyond the linear model assumption of the GREG while retaining the crucial property of calibration.

* **NSO Context**: This approach is attractive because it can improve precision by using better-fitting non-linear models (e.g., logistic regression for binary outcomes) while still ensuring the final weights are consistent with known population totals. It's a key technique for handling specific survey variables where a linear model is clearly inappropriate.
* **Method**:
    1.  Fit a (potentially non-linear) model to the sample data to obtain predictions $\hat{y}_k = m(\mathbf{x}_k, \hat{\beta})$ for all units *k* in the sample.
    2.  Use these same model parameters to generate predictions $\hat{y}_k$ for all units in the **entire population**. This requires that the auxiliary variables $\mathbf{x}_k$ are known for every unit in the population frame.
    3.  Find new weights $w_k$ that are close to the design weights $d_k$ but satisfy the calibration constraint: $\sum_{k \in S} w_k \hat{y}_k = \sum_{k \in U} \hat{y}_k$.
* **Formula**: The final estimator is the weighted sum using these new weights:
    $$
    \hat{t}_{y, \text{MC}} = \sum_{k \in S} w_k y_k
    $$
* **Key Characteristic**: The GREG is a special case of model calibration where the prediction model $m(\cdot)$ is linear. Model calibration generalizes this by allowing more flexible, non-linear models to serve as the basis for calibration.

### 1.4 Prediction-Powered Inference (PPI)

PPI is the modern (machine-learning based) extension of this logic, offering the potential for even greater precision.

* **NSO Context**: PPI is a powerful research frontier. Its potential for precision is immense, but NSOs are still evaluating its robustness for multipurpose surveys and its explainability for official statistics.
* **Method**: It uses a potentially complex model `f(x)` (e.g., a random forest) to generate predictions and then applies the same "predict-and-correct" structure as the difference estimator.
* **Formula**:
    $$
    \hat{t}_{y, \text{PPI}} = t_{\hat{y}} + \hat{t}_{e, \pi}
    $$
    where:
    * $t_{\hat{y}} = \sum_{i \in U} \hat{y}_i$ is the total of the model predictions.
    * $\hat{t}_{e, \pi} = \sum_{i \in S} d_i (y_i - \hat{y}_i)$ is the sample-based estimate of the total prediction error.

## 2. Head-to-Head

| Feature                | Difference Estimator                                                                | GREG Estimator                                                                      | Model-Calibrated Estimator                                                               | PPI Estimator                                                                              |
| :--------------------- | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------- |
| **Prediction Model** | $y'_k = \mathbf{z}_k^T\mathbf{B}_0$. <br> Simple **linear** model.                   | $\hat{y}_k = \mathbf{z}_k^T\mathbf{\hat{B}}$. <br> Simple **linear** model.           | $\hat{y}_k = m(\mathbf{x}_k, \hat{\beta})$. <br> Can be a **non-linear** parametric model. | $\hat{y}_i = f(\mathbf{x}_i)$. <br> **Any function**, often a complex ML model.                |
| **Model Parameters** | $\mathbf{B}_0$ is **known** and pre-specified.                                      | $\mathbf{\hat{B}}$ is **estimated** from the sample via weighted least squares.       | $\hat{\beta}$ is **estimated** from the sample (e.g., via Maximum Likelihood).        | Model $f(\mathbf{x})$ and its parameters are **estimated** (e.g., via cross-fitting).   |
| **Key Advantage** | Design-unbiased and very simple when a known, stable $\mathbf{B}_0$ exists.           | Calibrated to auxiliary totals ($\mathbf{t_z}$); robust to model misspecification.        | More efficient than GREG if the model is better than linear, while retaining calibration on predictions ($t_{\hat{y}}$). | Potential for very high precision if the ML model is highly predictive.                      |
| **NSO Challenge** | Assumption of *known* $\mathbf{B}_0$ is rarely met in practice.                       | May be inefficient if the true relationship is highly non-linear.                     | Requires auxiliary variables $\mathbf{x}_k$ for the *entire population* to compute $t_{\hat{y}}$. | A model optimized for one $y$ may not be robust for others. Lacks explicit calibration to known $\mathbf{t_z}$. |

## 3. The Fundamental Distinctions

My understanding of the core evolution is based on two key shifts:

1.  **Model Complexity (Linear vs. Any Function)**: The shift from the rigid linear model of the classical estimators (Difference, GREG) to the more flexible frameworks (Model-Calibration, PPI) that accommodate sophisticated, non-linear relationships.
2.  **Parameter Source (Known vs. Estimated)**: The shift from assuming model parameters are **known** (Difference Estimator) to **estimating** them from the sample data (GREG, Model-Calibration, PPI). This data-driven approach is what makes the latter methods so much more practical and powerful.

## 4. When is Difference Estimator used?

The Difference Estimator is the right tool when its strong assumption of a known $\mathbf{B}_0$ is met through reliable, external information.

1.  **Estimating Change in Longitudinal Surveys**: This is the classic use case at an NSO. To estimate total business revenue for the current year ($y_k$), we use the known total from last year's administrative data ($t_z$) and set $B_0=1$. The estimator becomes $\hat{t}_{y, \text{diff}} = t_z + \sum d_k(y_k - z_k)$, which efficiently estimates the *total year-over-year change*.
2.  **Using Scientific Constants**: For environmental or agricultural surveys, a known physical constant can serve as $\mathbf{B}_0$. For example, using an established scientific coefficient to convert timber volume (the auxiliary variable, $z$) to biomass ($y$).
3.  **Post-Censal Surveys**: When conducting a smaller survey shortly after a full census, a highly precise regression coefficient ($\mathbf{B}_0$) calculated from the *entire population* in the census can be treated as known. This allows the smaller survey to efficiently measure deviations from the established census pattern.


## 5. Calibration vs. Variance Reduction

The choice between these estimators isn't just technical; it reflects a philosophical choice between two different priorities.


* **GREG & Model-Calibration Prioritize CALIBRATION.**
    * Their primary goal is to produce a single set of weights, $w_k$, that perfectly reproduce known population totals, $\mathbf{t_z}$. The mathematical constraint they are built to solve is:
        $$
        \sum_{k \in S} w_k \mathbf{z}_k = \mathbf{t_z}
        $$
    * For an NSO, this is a critical feature. It guarantees that published survey estimates are consistent with official benchmarks (e.g., census population counts), ensuring coherence across the statistical system. Using a model is the *method* to achieve this calibration intelligently.

* **Prediction-Powered Inference Prioritizes VARIANCE REDUCTION.**
    * Its primary goal is to minimize the variance of the final estimate by making the predictions $\hat{y}_i$ as accurate as possible.
    * Its structure, $\hat{t}_{y, \text{PPI}} = t_{\hat{y}} + \hat{t}_{e, \pi}$, is not designed to enforce consistency with external totals like $\mathbf{t_z}$. It gives up this automatic consistency in exchange for the flexibility to use any model to achieve the lowest possible variance.

### 5.2 Augmented Calibration Estimators (needs more Literature review)


* **The Motivation**: NSOs cannot abandon the non-negotiable requirement of calibration, but they want the powerful variance reduction offered by machine learning.

* **The Method: Augmented GREG**:
    1.  **Prediction Step**: Use a powerful ML model to generate predictions, $\hat{y}_k = f(\mathbf{x}_k)$, for every unit $k$ in the population frame.
    2.  **Augmentation Step**: Create a new, *augmented* auxiliary vector, $\mathbf{z}_k^*$, that includes both the traditional administrative variables, $\mathbf{z}_k$, and the new ML prediction, $\hat{y}_k$.
        $$
        \mathbf{z}_k^* = \begin{bmatrix} \mathbf{z}_k \\ \hat{y}_k \end{bmatrix}
        $$
    3.  **Calibration Step**: Run a standard GREG procedure on this augmented vector. The known population total for this vector is $\mathbf{t}_{\mathbf{z}^*} = [\mathbf{t_z}^T, t_{\hat{y}}]^T$. The final estimator is:
        $$
        \hat{t}_{y, \text{aug}} = \hat{t}_{y, \pi} + ( \mathbf{t}_{\mathbf{z}^*} - \hat{\mathbf{t}}_{\mathbf{z}^*, \pi} )^T \mathbf{\hat{B}}^*
        $$

* **The "Double Robustness" Benefit**: This hybrid approach is powerful because it is protected from two angles:
    * If the ML model is highly accurate, the estimator will be extremely precise (the PPI benefit).
    * Even if the ML model is misspecified, the estimator is still calibrated to the reliable totals $\mathbf{t_z}$, which controls bias and ensures the final figures are consistent and coherent (the GREG benefit).







