# 

# Appendix C: Generalized Degrees of Freedom

This appendix provides detailed information on degrees of freedom adjustment for constrained regression models.

## C.1 The Critical Issue

> **Important**
>
> “ZMPE users do not adjust the degrees of freedom (DF) to account for constraints included in the regression process. As a result, fit statistics for the ZMPE equations, e.g., the standard percent error (SPE) and generalized R² (GRSQ), can be incorrect and misleading.” — Hu ([2010](#ref-hu2010gdf))

When constraints are imposed on regression coefficients, the effective degrees of freedom must be adjusted. Without this adjustment:

-   Standard errors are underestimated
-   Confidence intervals are too narrow
-   Statistical tests are invalid
-   S-curves in cost uncertainty analysis are artificially tightened

## C.2 Hu’s GDF Formula

Hu ([2010](#ref-hu2010gdf)) defines Generalized Degrees of Freedom as:

$$\text{GDF} = n - p - (\text{\# Constraints}) + (\text{\# Redundancies})$$

where:

-   $n$ = sample size
-   $p$ = number of estimated parameters (coefficients)
-   Constraints = number of restrictions imposed
-   Redundancies = constraints that can be derived from others

**Interpretation**: One restriction is equivalent to a loss of one DF.

### Redundancies Definition

If two constraints are specified but one can be derived from the other, count only a loss of one DF rather than two. Additionally:

-   If a parameter is known (e.g., startup cost is given), this amounts to a **gain** of one DF
-   For ZMPE CERs (except simple factor CERs), DF should be subtracted by one because the solution uses the constraint alone

### Example

Consider estimating $Y = T_1 \cdot X_1^b \cdot X_2^c$ with:

-   $n = 10$ observations
-   $p = 3$ parameters ($T_1$, $b$, $c$)
-   Constraints: $b \leq 0$, $c \leq 0$ (2 inequality constraints)

If both constraints are binding: $$\text{GDF} = 10 - 3 - 2 + 0 = 5$$

## C.3 Gaines et al. Formula

Gaines, Kim, and Zhou ([2018](#ref-gaines2018constrained)) derive degrees of freedom for constrained Lasso as:

$$\text{df} = |\text{Active predictors}| - (\text{\# equality constraints}) - (\text{\# binding inequality constraints})$$

**Key difference from Hu’s formula**: Only counts binding inequality constraints.

### Implications

-   **Loose bounds** ($\beta \leq 0$) may not bind—no DF loss if inactive at the solution
-   **Tight bounds** ($0.85 < \beta < 0.95$) are more likely to bind

## C.4 Comparing the Two Formulations

| Aspect                 | Hu’s Formula   | Gaines’ Formula      |
|------------------------|----------------|----------------------|
| Counts all constraints | Yes            | No                   |
| Counts binding only    | No             | Yes                  |
| Best for               | ZMPE/MUPE CERs | Penalized regression |
| Conservative           | Yes            | No                   |

### Open Question

When applying Penalized-Constrained regression changes the signs of coefficients even if the constraints don’t explicitly bind at the solution, which formulation is appropriate?

-   **Hu’s formulation** would decrease DF for all specified constraints
-   **Gaines’ formulation** would not count non-binding constraints

Future empirical work will evaluate which approach produces more accurate uncertainty quantification through simulation.

## C.5 GDF-Adjusted Fit Statistics

### Standard Error of Estimate (SEE)

$$\text{SEE} = \sqrt{\frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\text{GDF}}}$$

### Standard Percent Error (SPE)

$$\text{SPE} = \sqrt{\frac{\sum_{i=1}^{n}\left(\frac{y_i - \hat{y}_i}{\hat{y}_i}\right)^2}{\text{GDF}}}$$

### Adjusted R²

$$R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{\text{GDF}}$$

## C.6 Impact on Uncertainty Analysis

> **Warning**
>
> “Using ZMPE CERs in cost uncertainty analysis may unduly tighten the S-curve because their SPEs underestimate the CER error distribution.” — Hu ([2010](#ref-hu2010gdf))

**Practical implications**:

1.  Cost estimates may appear more precise than they actually are
2.  Risk analysis may understate uncertainty
3.  Decision-makers may have false confidence in point estimates

**Recommendation**: Always report which DF adjustment method was used and the resulting fit statistics alongside unadjusted statistics for comparison.

## C.7 Implementation in `penalized_constrained`

The package provides both adjusted and unadjusted statistics:

-   `gdf_`: Generalized degrees of freedom (Hu’s formula by default)
-   `see_adjusted_`: GDF-adjusted standard error of estimate
-   `spe_adjusted_`: GDF-adjusted standard percent error
-   `active_constraints_`: Boolean array indicating which constraints are binding

Users can override the DF calculation by specifying which constraints should count toward the adjustment.

``` markdown
# Appendix C: Generalized Degrees of Freedom {#sec-appendix-gdf .unnumbered}

This appendix provides detailed information on degrees of freedom adjustment for constrained regression models.

## C.1 The Critical Issue {.unnumbered}

::: {.callout-important}
"ZMPE users do not adjust the degrees of freedom (DF) to account for constraints included in the regression process. As a result, fit statistics for the ZMPE equations, e.g., the standard percent error (SPE) and generalized R² (GRSQ), can be incorrect and misleading." --- @hu2010gdf
:::

When constraints are imposed on regression coefficients, the effective degrees of freedom must be adjusted. Without this adjustment:

- Standard errors are underestimated
- Confidence intervals are too narrow
- Statistical tests are invalid
- S-curves in cost uncertainty analysis are artificially tightened

## C.2 Hu's GDF Formula {.unnumbered}

@hu2010gdf defines Generalized Degrees of Freedom as:

$$\text{GDF} = n - p - (\text{\# Constraints}) + (\text{\# Redundancies})$$

where:

- $n$ = sample size
- $p$ = number of estimated parameters (coefficients)
- Constraints = number of restrictions imposed
- Redundancies = constraints that can be derived from others

**Interpretation**: One restriction is equivalent to a loss of one DF.

### Redundancies Definition

If two constraints are specified but one can be derived from the other, count only a loss of one DF rather than two. Additionally:

- If a parameter is known (e.g., startup cost is given), this amounts to a **gain** of one DF
- For ZMPE CERs (except simple factor CERs), DF should be subtracted by one because the solution uses the constraint alone

### Example

Consider estimating $Y = T_1 \cdot X_1^b \cdot X_2^c$ with:

- $n = 10$ observations
- $p = 3$ parameters ($T_1$, $b$, $c$)
- Constraints: $b \leq 0$, $c \leq 0$ (2 inequality constraints)

If both constraints are binding:
$$\text{GDF} = 10 - 3 - 2 + 0 = 5$$

## C.3 Gaines et al. Formula {.unnumbered}

@gaines2018constrained derive degrees of freedom for constrained Lasso as:

$$\text{df} = |\text{Active predictors}| - (\text{\# equality constraints}) - (\text{\# binding inequality constraints})$$

**Key difference from Hu's formula**: Only counts binding inequality constraints.

### Implications

- **Loose bounds** ($\beta \leq 0$) may not bind---no DF loss if inactive at the solution
- **Tight bounds** ($0.85 < \beta < 0.95$) are more likely to bind

## C.4 Comparing the Two Formulations {.unnumbered}

| Aspect | Hu's Formula | Gaines' Formula |
|--------|--------------|-----------------|
| Counts all constraints | Yes | No |
| Counts binding only | No | Yes |
| Best for | ZMPE/MUPE CERs | Penalized regression |
| Conservative | Yes | No |

### Open Question

When applying Penalized-Constrained regression changes the signs of coefficients even if the constraints don't explicitly bind at the solution, which formulation is appropriate?

- **Hu's formulation** would decrease DF for all specified constraints
- **Gaines' formulation** would not count non-binding constraints

Future empirical work will evaluate which approach produces more accurate uncertainty quantification through simulation.

## C.5 GDF-Adjusted Fit Statistics {.unnumbered}

### Standard Error of Estimate (SEE)

$$\text{SEE} = \sqrt{\frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\text{GDF}}}$$

### Standard Percent Error (SPE)

$$\text{SPE} = \sqrt{\frac{\sum_{i=1}^{n}\left(\frac{y_i - \hat{y}_i}{\hat{y}_i}\right)^2}{\text{GDF}}}$$

### Adjusted R²

$$R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{\text{GDF}}$$

## C.6 Impact on Uncertainty Analysis {.unnumbered}

::: {.callout-warning}
"Using ZMPE CERs in cost uncertainty analysis may unduly tighten the S-curve because their SPEs underestimate the CER error distribution." --- @hu2010gdf
:::

**Practical implications**:

1. Cost estimates may appear more precise than they actually are
2. Risk analysis may understate uncertainty
3. Decision-makers may have false confidence in point estimates

**Recommendation**: Always report which DF adjustment method was used and the resulting fit statistics alongside unadjusted statistics for comparison.

## C.7 Implementation in `penalized_constrained` {.unnumbered}

The package provides both adjusted and unadjusted statistics:

- `gdf_`: Generalized degrees of freedom (Hu's formula by default)
- `see_adjusted_`: GDF-adjusted standard error of estimate
- `spe_adjusted_`: GDF-adjusted standard percent error
- `active_constraints_`: Boolean array indicating which constraints are binding

Users can override the DF calculation by specifying which constraints should count toward the adjustment.
```

Gaines, Brian R., Juhyun Kim, and Hua Zhou. 2018. “Algorithms for Fitting the Constrained Lasso.” *Journal of Computational and Graphical Statistics* 27 (4): 861–71. <https://doi.org/10.1080/10618600.2018.1473777>.

Hu, Shu-Ping. 2010. “Generalized Degrees of Freedom for Constrained CERs.” PRT-191. Tecolote Research.